Compare commits

...

1008 Commits

Author SHA1 Message Date
f0b066eae8 Release v0.13.0 (#2093) 2024-09-25 13:09:08 +02:00
8f39708650 ENH: Better DoRA check in mixed adapter batch inference (#2089)
This is a bit of an edge case, but I noticed this while working on
something else.

PEFT allows mixed batch adapter inference, i.e. when predicting, the
same batch can use different adapters by passing the adapter_names
argument. However, this is not supported for DoRA (yet), so there is a
check that raises an error if DoRA is used.

Previously, this check would check all adapters for DoRA, even if those
adapters are not being used in adapter_names. This was unnecessarily
strict and with this PR, we only check the adapters that are actually
being used.
2024-09-24 10:16:31 +02:00
f4cf170a9c DOC Docstring of load_adapter, type annotation (#2087) 2024-09-23 11:18:24 +02:00
b67c9b64fd FIX: Bug in find_minimal_target_modules (#2083)
This bug was reported by Sayak and would occur if a required suffix had
itself as suffix a string that was already determined to be required, in
which case this required suffix would not be added.

The fix consists of prefixing a "." to the suffix before checking if it is
required or not.

On top of this, the algorithm has been changed to be deterministic.
Previously, it was not deterministic because a dictionary that was
looped over was built from a set, and sets don't guarantee order. This
would result in the loop being in arbitrary order.

As long as the algorithm is 100% correct, the order should not matter.
But in case we find bugs like this, the order does matter. We don't want
bugs to be flaky, therefore it is best to sort the dict and remove
randomness from the function.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-09-23 11:16:29 +02:00
5efeba1856 ENH: Add default target layers for gemma2 architecture (#2078)
Google's gemma 2 models have a slightly different architecture than
gemma 1 and thus a different model_type attribute. This PR adds default
target_layer for gemma 2 that correspond to the default target_layer of
gemma 1.

LayerNorm tuning adds one more LN layer.
2024-09-23 11:15:08 +02:00
af275d2d42 ENH: Allow empty initialization of adapter weight (#1961)
This PR allows to initialize the adpater weights as empty, i.e. on meta
device, by passing low_cpu_mem_usage=True.

Why would this be useful? For PEFT training, it is indeed not useful, as
we need the real weights in order to train the model. However, when
loading a trained PEFT adapter, it is unnecessary to initialize the
adapters for real, as we override them with the loaded weights later.

In the grand scheme of things, loading the base model will typically be
much slower, but if the user loads, say, dozens of adapters, the
overhead could add up. Of course, besides loading the model, this has no
performance impact and is thus not a high priority feature.

For the time being, this is completely opt in. However, it should be safe to
make this default for loading adapters. Therefore, in the future we may change
the default there.

---------

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
2024-09-23 11:13:51 +02:00
9bc670eafb MNT Update author email in setup.py (#2086) 2024-09-23 10:43:57 +02:00
5d944589d2 ENH Expose bias of ModulesToSaveWrapper (#2081) 2024-09-20 19:35:24 +02:00
152ed70b00 ENH PiSSA/OLoRA: Preserve original config on save (#2077)
Resolves #2075

When saving PiSSA or OLoRA with the option to convert to normal LoRA,
the LoRA weight shapes change, which means that some values like r and
alpha need to be adjusted in the saved PEFT config. However, these
modifications should be limited to the saved config, while the loaded
config should stay the same.

This PR implements this change by creating a copy of the config before
modifying it.
2024-09-20 12:11:24 +02:00
f5dd2acfed TST Skip some quantization tests on XPU (#2074)
Eetq/hqq/aqlm don't support XPU yet.
2024-09-18 11:27:19 +02:00
3b2ebf1ba1 FIX Bug that prevents BOFT from loading 2 adapters (#2068)
There was a bug in BOFT that made it impossible in some circumstances to
load more than one adapter (creating more than 1 adapter was possible
though). This was because a code path that adjusts
boft_n_butterfly_factor was only visited when creating a fresh adapter,
but not when updating with the 2nd adapter. This was fixed by moving
this code path from the BOFT layer's __init__ method to update_layer.

A test for loading multiple adapters was added. Since this was a gap in
our test suite, this test will be applied to all appropriate PEFT
methods, not only BOFT, but the others methods are all passing without
needing further changes.

For good measure, I also added BOFT to the test suite that checks
multiple active adapters. These tests would have also passed without the
fix in this PR, since these tests do not load multiple adapters but
instead create them, which always worked. Still it's better to have
these tests as well.
2024-09-18 11:19:16 +02:00
adf0a1dc96 ENH Multi adapters in same batch: modules_to_save (#1990)
Extend the functionality of having different adapters in the same batch to also
work with `modules_to_save`.
2024-09-17 13:50:47 +02:00
18f3efe5c0 MNT Update deprecated evaluation_strategy (#1664)
In docs and examples, use eval_strategy instead of evaluation_strategy, which is
deprecated.
2024-09-13 18:01:26 +02:00
4a8dedb2a7 FIX Command line args in PiSSA preprocess (#2053)
Fix bug in parsing command line arguments in the PiSSA preprocess.py script from
the PiSSA example.
2024-09-13 13:59:27 +02:00
25202271bc ENH BOFT don't save boft_P buffer (#2050)
The buffer does not need to be part of the checkpoint, by making it
non-persistent, the file size can be greatly reduced.
2024-09-13 13:56:47 +02:00
214f891cd2 MAINT: Give stale bot permissions for PRs too (#2064) 2024-09-12 12:18:20 -04:00
7868d0372b MNT Permission for GH token in stale.yml (#2061) 2024-09-11 12:36:25 +02:00
734ea9a014 TST Make X-LoRA tests faster (#2059)
After some recent optimizations, the X-LoRA tests are now the slowest
ones. Part of that is that the lora adapters are re-created for each
test. By changing the fixture scope, they're now only created once. I
think this should be safe, as these files are not modified in the tests.

I also enabled test_scalings_logging_methods with the latest
transformers to ensure that this test also passes.
2024-09-11 12:13:24 +02:00
54be5a3db6 TST Speed up vision model tests (#2058)
The HRA vision model test is extremely slow on CI (> 600 sec, 50% of
total time). This change speeds up the test by using a smaller ResNet
model to run the tests.

It's still not clear why HRA was so slow specifically -- LoRA is 40x
faster -- but that can be fixed separately.
2024-09-10 16:15:51 +02:00
b180ae46f8 TST Fewer inference steps for stable diffusion (#2051)
Reduce the number of inference steps for stable diffusion tests. These
tests are the slowest ones on CI, this should help (~3 min on average).
2024-09-06 09:57:56 +02:00
31fbbd2203 FIX TST Scalings logging test latest transformers (#2042)
Fix test for latest transformers, skip for earlier versions.
2024-09-05 14:50:46 +02:00
c9f7240afc FEAT Add VB-LoRA (#2039)
Implements "VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector
Banks"

https://arxiv.org/abs/2405.15179
2024-09-04 11:02:34 +02:00
95b39642fb FIX: Small numerical discrepancy for p-tuning after loading the model (#2047)
There is a small numerical discrepancy between the outputs of a p-tuning
model before and after loading. Even though it is small, it can still
affect generations, so this PR eliminates it.

As an example, without the fix, this is the difference in logits for
opt-125m:

>       torch.testing.assert_close(output_loaded, output_peft)
E       AssertionError: Tensor-likes are not close!
E
E       Mismatched elements: 30 / 10557120 (0.0%)
E       Greatest absolute difference: 1.1086463928222656e-05 at index (0, 9, 9314) (up to 1e-05 allowed)
E       Greatest relative difference: 0.00021288332936819643 at index (0, 9, 9314) (up to 1.3e-06 allowed)

Details about how this comes about are explained here:

https://github.com/huggingface/peft/issues/2043#issuecomment-2321522577

The gist of it is that if we take a single sample, repeat it X times,
and then forward it through a model (which is the training path in
p-tuning), we would expect the same output as if we forwarded this
sample only once and repeated the output X times (the inference path for
p-tuning). However, for sufficiently large models, the two approaches
can have tiny differences.

With the fixed approach, there is no difference between training and
inference code paths when it comes to this. The new code should also be
slightly more compute efficient, but in practice will not make a
noticeable difference.
2024-09-03 16:52:06 +02:00
37b9c5c74b FIX: Error with OLoRA init when using bnb (#2011) 2024-09-03 14:08:25 +02:00
01275b4cb3 ENH: Faster adapter loading if there are a lot of target modules (#2045)
This is an optimization to reduce the number of entries in the
target_modules list. The reason is that in some circumstances,
target_modules can contain hundreds of entries. Since each target module
is checked against each module of the net (which can be thousands), this
can become quite expensive when many adapters are being added. Often,
the target_modules can be condensed in such a case, which speeds up the
process.

A context in which this can happen is when diffusers loads non-PEFT
LoRAs. As there is no meta info on target_modules in that case, they are
just inferred by listing all keys from the state_dict, which can be
quite a lot. See: https://github.com/huggingface/diffusers/issues/9297

As shown there the speed improvements for loading many diffusers LoRAs
can be substantial. When loading 30 adapters, the time would go up from
0.6 sec per adapter to 3 sec per adapter. With this fix, the time goes
up from 0.6 sec per adapter to 1 sec per adapter.

As there is a small chance for undiscovered bugs, we apply this
optimization only if the list of target_modules is sufficiently big.
2024-09-02 12:59:51 +02:00
679bcd8777 ENH Warn if using tied target modules (#2025)
When users are targetting tied weights (e.g. embedding and LM head),
merging the adapter will lead to errors. Now users are warned about the
possibility when they create such a PEFT model and also when they try to
merge.
2024-08-29 10:51:13 +02:00
850eeb5c3a FIX Pre-commit version in config (#2034) 2024-08-26 11:50:02 +02:00
5996d39408 TST Enable more tests in XPU (#2036) 2024-08-26 11:49:18 +02:00
900f96c40d [Add] DoRA Embedding (#2006) 2024-08-23 20:20:42 +02:00
c3b63ce2c4 ENH Test and DoRA compatibility with XPU 2024-08-23 16:01:50 +02:00
1a5d0f8151 FIX: Don't target the classification head when using target_modules="all-linear" (#2033)
Fixes #2027

When using a transformers sequence classification model,
target_modules="all-linear" should not wrap the classification head with
LoRA. This is because it is already wrapped with ModulesToSave, i.e. it
will be fully fine-tuned, which is the generally desired behavior.

Before this bug fix, the classification head would be double-wrapped.
With #2028, this now raises an error. With this PR, it is avoided
completely. Still, keeping #2028 is good because it helps prevent other
situations where double-wrapping might occur due to misconfiguration.

Note that there is no fool-proof method to detect the classification
head, we have to rely on transformers convention.
2024-08-23 16:00:43 +02:00
f3c7c6e5c1 ENH Raise error when applying modules_to_save on tuner layer (#2028)
Relates to #2027

Normally, when selecting the layers for fine-tuning, PEFT already
ensures that the same layer is not targeted for both parameter-efficient
fine-tuning (e.g. LoRA layer) and full fine-tuning (via
modules_to_save), as that makes no sense.

However, there is a loophole when the modules_to_save is applied ex
post. This happens for instance when having a task type like sequence
classification, where PEFT will automatically add the classfication head
to modules_to_save for the user. This loophole is now closed by adding a
check to ModulesToSaveWrapper that validates that the targeted layer is
not a tuner layer.

This does not fully resolve #2027 but will raise an early error in the
future to avoid confusion.

On top of this, the error message inside of
ModulesToSaveWrapper.check_module has been slightly adjusted.
Previously, the class name would be used, which can be confusing. E.g.
for LoRA, the class name of the linear LoRA layer is just "Linear",
which looks the same as nn.Linear. Therefore, the full name is now
shown.
2024-08-22 17:10:39 +02:00
8fcb1951a5 MAINT: Update ruff version to ~0.6.1 (#1965)
Moving to ruff ~0.6.1. Changes:

- type comparisons now require is: str is str
- remove overridden class attribute active_adapter
- remove secondary import of fbd_cuda

Omit jupyter notebooks for now. We can think about adding that in a
separate PR.
2024-08-22 15:23:23 +02:00
fa218e1942 TST test_mixed_adapter_batches_lora_opt_timing on XPU (#2021) 2024-08-21 15:10:19 +02:00
6c832c1dd4 TST Make TestModelAndLayerStatus device-agnostic (#2026) 2024-08-21 12:43:35 +02:00
95821e5ce4 ENH: Better error msg for replace_lora_weights_loftq when using a local model. (#2022)
Resolves #2020

If users want to use a local model, they need to pass the model_path
argument. The error message now says so.
2024-08-21 10:10:54 +02:00
25ab6c9bb2 TST Enable regression tests on XPU (#2019) 2024-08-20 16:13:59 +02:00
b4cf1b3c46 CI Remove regression tests from BNB CI (#2024)
This is a test to see if the BNB CI for multi-backend single-GPU passes
if regression tests are disabled.
2024-08-20 14:15:37 +02:00
eb5eb6efb5 TST Enable test_vera_dtypes on XPU with bf16 (#2017) 2024-08-20 11:25:44 +02:00
f71e89f771 FIX Deprecated params/funcs in X-LoRA example (#2010) 2024-08-20 11:24:38 +02:00
e8ba7de573 CI Activate single core multi backend bnb tests (#2008)
See #1866 for context.

Let's check if this issue has resolved itself by now.
2024-08-16 17:19:20 +02:00
0222450f44 TST: Potentially Skip 8bit bnb regression test if compute capability is too low (#1998)
* TST Potentially Skip 8bit bnb regression test

The 8bit bnb LoRA regression test results are dependent on the
underlying compute capability. The logits are slightly different
depending on the version (up to 0.5 abs diff). Therefore, we now check
the compute capability for this test and skip it if it's too low. This
check may require updating if the hardware of the CI worker is updated.

Note that I have already invalidated the old regression artifacts and
created a new one.

* Fix pytest skip to work without cuda

* Instead of skipping, add a comment to explain

After internal discussion, we think this is the most practical solution
for the time being.
2024-08-16 17:18:25 +02:00
4c3a76fa68 FIX DOC Update X-LoRA docs, some bugfixes (#2002)
Bugs with dtype and loading of LoRA adapters.
2024-08-15 15:29:32 +02:00
670d0fac31 FIX CI Correctly report outcome of bnb import test (#2007) 2024-08-14 20:14:15 +02:00
22f042a107 ENH: Warn when a user provided model name in the config renamed (#2004)
Resolves #2001

In PEFT, users can provide a custom base_model_name_or_path argument to
the PEFT config. However, this value is overridden by the model's
name_or_path attribute. This can be surprising for users. Therefore,
there is now a warning about this.

To see why that can be relevant, check the original issue.
2024-08-14 15:42:58 +02:00
d6e772f192 TST Add LNTuningConfig and LoKrConfig to tests (#2005)
These two configs were missing in test_config.py. Also, reordered the
list of all config classes to be sorted, which makes it easier to spot
missing configs.
2024-08-14 15:42:32 +02:00
042123465c DOC Fix typos in lora.md (#2003) 2024-08-13 15:15:03 +02:00
41c274ecac FIX Import error in BOFT half precision test (#1995) 2024-08-08 15:15:47 +02:00
9988cb9d00 FIX BOFT, OFT saving merged layers (#1994)
Error occurred with safetensors when weights are not contiguous.
2024-08-07 19:26:33 +02:00
fcac30bef5 MAINT Default to loading weights_only for torch (#1993)
The torch.load function allows to pass weights_only=True, which is more
secure but may break some code that is more than just weights. For PEFT,
this should not be the case, so the switch should just work.

By making the switch now, we can find out early if there are any
problems, as torch.load will default to True in the future.
2024-08-07 19:16:55 +02:00
2a5d3132e9 ENH Small updates to helper.rescale_adapter_scale (#1989)
Some renaming, better docs.
2024-08-07 14:51:35 +02:00
c869664891 FIX BOFT mixed precision (#1925) 2024-08-07 14:12:34 +02:00
4611034ff8 FIX: Adjust transformers version check for bloom (#1992)
The fix to the bloom architecture was not actually released in
transformers 4.43.3, which makes the version check invalid. Instead, now
checking an attribute on the BloomPreTrainedModel.
2024-08-06 13:40:14 +02:00
b9260305e3 FIX Docker build CI (#1987)
Signed-off-by: Adrien <adrien@huggingface.co>
2024-08-02 16:51:48 +02:00
f51428313f DOC Docs and examples for X-LoRA (#1970) 2024-08-02 12:35:14 +02:00
9a087823c6 DOC Small fixes for HQQ and section title (#1986)
Changed:

- Helper section had placeholder title
- `device` is not a valid argument to `from_pretrained`
- Excess empty lines
- Move helpers section
2024-08-02 12:33:29 +02:00
46f78978f1 FEAT Context manager for scaling LoRA (#1951) 2024-08-01 17:21:55 +02:00
269aba5303 ENH AdaLoRA: warn when user use r argument (#1981)
For AdaLoRA, init_r is the correct one to use.
2024-08-01 12:24:42 +02:00
52a4ac9c2f ENH Faster bf16 merging on CPU (#1978)
Cast to fp32, as bf16 can be very slow on some CPUs.

This is already done for fp16.
2024-07-31 17:51:46 +02:00
c874ba3f1b CHORE Update CI configuration for workflows (#1985)
Signed-off-by: Adrien <adrien@huggingface.co>
2024-07-31 16:08:58 +02:00
f13d860e9f FIX Loading adapter honors offline mode (#1976)
HF_HUB_OFFLINE=1 was not honored when trying to load an adapter. This is
now fixed.
2024-07-30 16:11:27 +02:00
f6d3e38601 FIX active_adapters for transformers models (#1975)
Fixes the error reported here:

https://github.com/huggingface/transformers/pull/30790#issuecomment-2253808249

Unfortunately, transformers models have an active_adapters method but
it's 1) not a property and 2) calling it fails because the base
model (usually) has no loaded adapter. The base model can be a
transformers model for prompt learning, where the base model is not
wrapped in a LoraModel or similar. Therefore, this special case needs to
be handled separately.
2024-07-30 15:14:28 +02:00
7e7b55880e FIX: lora+: include lr in optimizer kwargs (#1973) 2024-07-30 14:20:04 +02:00
1b16753a6a ENH Update VeRA preconfigured models (#1941)
Some pre-configured models like mistral used not to work with VeRA
because the weight shapes were not identical. However, since #1817, this
is no longer a requirement. Therefore, this commented code can now be
uncommented.

I have tested mistral and gemma and they worked. I haven't tested btlm
and mixtral but with the update, I'm pretty sure they will work too.
2024-07-30 08:15:53 +05:30
27833a2e60 FIX: New bloom changes breaking prompt learning (#1969)
Bloom had two dimensions of the attention layer transposed (compared to
all other transformers models), which was fixed by:

https://github.com/huggingface/transformers/pull/31445

Therefore, for future transformers versions, skip the special handling
in PEFT.

There is also an issue that prompt injection did not take place when
past_key_values was a Cache object that is empty. This should now
hopefully work as expected.
2024-07-29 18:25:41 +02:00
273acf059e FEAT: Add LoRA+ (#1915)
Add LoRA+: Efficient Low Rank Adaptation of Large Models

https://arxiv.org/abs/2402.12354

Call create_loraplus_optimizer to initialize an optimizer with optimizer
parameters that are especially effective for LoRA training.

Builds upon this code base:

https://github.com/nikhil-ghosh-berkeley/loraplus

---------

Co-authored-by: moghadas76 <s.m.moghadas2012@gmail.com>
Co-authored-by: Chris Hua <stillmatic@users.noreply.github.com>
2024-07-29 12:50:30 +02:00
296fbcde3e FIX Prefix tuning if past_key_values is passed (#1942)
There was an error with prefix tuning when some models like Llava passed
past_key_values explicitly, even if it was None, because it resulted in
that argument passed twice (once explicit, once via kwargs). This is now
fixed.
2024-07-29 12:46:54 +02:00
f2b6d13f1d CI Fix Windows permission error on merge test (#1952)
For some reason, Windows CI suddenly started throwing permission
errors on test_merge_layers. These errors occur when using the
TempDirectory() context manager, which raises a PermissionError on
Windows when it tries to clean up after itself. Therefore, this context
manager is now avoided in favor of manual clean up.

More context:

I investigated this issue first in #1947. My suspicion that this could
be caused by a new pytest version was not confirmed. Maybe the reason is
that GH rolled out a new Windows worker, not sure.

Also note that this is not the first time that this workaround is
required, e.g. also here:

e6cd24c907/tests/test_custom_models.py (L1465)
2024-07-25 14:02:34 +02:00
8aacb993e7 Bump version to 0.12.1.dev0 (#1950) 2024-07-25 13:39:39 +02:00
e6cd24c907 Release v0.12.0 (#1946)
Also: Fix small error in doc: mentions wrong version
2024-07-24 13:13:40 +02:00
05f57e94ef PiSSA, OLoRA: Delete initial adapter after conversion instead of the active adapter (#1933)
Resolves #1860

As discussed in that issue, it's not user friendly to delete the default
adapter of a PiSSA/OLoRA model after calling save_pretrained with weight
conversion. Instead, it is much more intuitive to delete the initial
adapter instead, since it is loaded inside the method and not by the
user, so it's really an implementation detail.

Apart from this, I made the following related changes:

- Put everything in a try ... finally to ensure that the initial adapter
  does not hang around if there is an error (thus not hogging memory).
- Renamed initial_adapter to initial_adapter_name, to make it clear that
  this is the name and not the adapter itself.
2024-07-24 12:55:56 +02:00
2ce83e05c1 FIX Decrease memory overhead of merging (#1944) 2024-07-23 20:24:05 +02:00
ebcd0792b8 [WIP] ENH Add support for Qwen2 (#1906)
* [WIP] ENH Add support for Qwen2

Add Qwen2 to default target modules, use tiny Qwen2 in tests.

* Add target_modules for FourierFT

* Skip Qwen2 + weighted combination test

It fails when SVD is involved. See:
https://github.com/huggingface/peft/pull/1901#issuecomment-2235731685

---------

Co-authored-by: BenjaminBossan <b.bossan@gmail.com>
2024-07-23 15:04:13 +05:30
ba75bb14d1 FIX: More VeRA tests, fix tests, more checks (#1900)
* FIX More VeRA tests, fix tests, more checks

- Fixes incorrect config for VeRA in a test
- Add VeRA to multi-adapter tests
- Add more checks on the VeRA A/B shapes

The latter becomes necessary when we add more than one VeRA adapter. The
shapes for VeRA A and B are only determined once, when the first VeRA
adapter is created. After that, they are fixed. However, users may add a
second VeRA adapter. As long as that adapter targets the same layers and
has the same rank, we're good. But if it targets other, bigger layers,
or if it has increased rank, the shapes of VeRA A and/or VeRA B will be
too small, resulting in an error during the forward pass. To prevent
this, we already check the shapes during initialization of the new
adapter and raise an error right away.

* Revier feedback: wording, better error message

* Reviewer feedback: Clarify tests

---------

Co-authored-by: BenjaminBossan <b.bossan@gmail.com>
2024-07-22 19:12:15 +05:30
6472061a76 FIX Prefix tuning Grouped-Query Attention (#1901)
Fix prefix tuning when GQA is being used.
2024-07-22 11:46:24 +02:00
e02b938e02 FIX PiSSA & OLoRA with rank/alpha pattern, rslora (#1930)
* FIX PiSSA & OLoRA with rank/alpha pattern, rslora

See https://github.com/huggingface/peft/issues/1929#issuecomment-2230780802

At the moment, when using PiSSA or OLoRA with weight conversion to
restore the original base weights, there is an error when either of
rank_pattern, alpha_pattern, or rslora is being used. This PR fixes
this.

The issue is that we need to double the rank of the LoRA adapter. Right
now, this is done by simply doubling r and alpha. But if rank_pattern
and alpha_pattern are being used, those need to be doubled too.

Furthermore, when using rslora, the scaling is again different, namely
alpha / sqrt(r). This also needs to be adjusted.

Unfortunately, when using rslora with rank_pattern and alpha_pattern,
this gets way more complicated. Since we don't store the scaling in the
state_dict, we would have to resolve all the patterns here to determine
the correct scaling, i.e. reimplement the whole matching and init logic.
This is a lot of work for a very edgy edge case.

Therefore, I opted to raise an error instead. This is not super nice, as
the error is only raised when trying to save the model, i.e. a lot of
time may already have been spent to train the model. But we cannot know
this earlier, so not much can be done.

Overall, this fix is ugly because it further couples unrelated code. For
instance, if we add new init methods that affect the scaling, we need to
remember to change the saving logic accordingly. If anyone has a better
idea, LMK.

* Make style

* Also warn during init if there is a potential

... for saving not to work

* Ensure that GPU tests are run for PiSSA+OLoRA

* Use renamed argument name

* Make style

* Reviewer feedback: Better document the change

* Add clarifying comments to tests
2024-07-19 14:53:38 +05:30
5268495213 FEAT Add HRA: Householder Reflection Adaptation (#1864)
Implements method from https://arxiv.org/abs/2405.17484.
2024-07-16 14:37:32 +02:00
2aaf9cedbb ENH Sync LoRA tp_layer methods with vanilla LoRA (#1919) 2024-07-16 10:39:36 +02:00
a019f8690d FIX sft script print_trainable_parameters attr lookup (#1928) 2024-07-15 17:09:14 +02:00
2a6402f4b2 DOC Fix typo of encoder_reparameterization_type (#1926) 2024-07-15 12:06:12 +02:00
e72a96f7cf FEAT Add FourierFT Support (#1838)
Add Parameter-Efficient Fine-Tuning with Discrete Fourier Transform

https://arxiv.org/abs/2405.03003

---------

Co-authored-by: zqgao22 <zgaoat@connect.ust.hk>
Co-authored-by: Chaos96 <wangqch7@mail2.sysu.edu.cn>
Co-authored-by: DSAILatHKUST <dsailathkust@163.com>
2024-07-09 12:20:01 +02:00
48e136d9bd FIX: Flaky multitask prompt tuning test fixed by setting the seed (#1908)
Set the seed for test test_generate_text_with_other_init and 
test_generate_text_with_random_init because otherwise they are
flaky and fail with ~5% probability. Explanation in comment.
2024-07-09 10:05:10 +02:00
58afb34ea0 FEAT Integrate X-LoRA (#1491)
Implements X-LoRA: Mixture of Low-Rank Adapter Experts
Paper: https://arxiv.org/abs/2402.07148
2024-07-05 12:38:18 +02:00
01f1b992eb Example: DNA Language Model. (#1873) 2024-07-05 11:55:26 +02:00
09358aad30 Chore: Docs markdown formatting (#1899) 2024-07-03 18:12:53 +02:00
31c0d85755 FIX DeepSpeed recursion error (#1892)
Happened when accessing attribute before init.
2024-07-03 18:07:31 +02:00
018a1f49c4 FIX TEST Even higher tolerance for AdaLoRA in test (#1898)
See #1897 for more context. The test is still flaky, increasing
tolerance further.
2024-07-02 12:36:03 +02:00
1e2258d7f7 ENH Ephemeral GPU offload support for DoRA (#1857)
Adds the concept of ephemeral GPU offloading, i.e. where data in compute
intense operations is copied onto the GPU before the operation is
performed, after which the result is put back on CPU memory.

This PR adds support in the DoRA initialization code, but the approach
can be applied in a number of places: when the size of the data compared
to the time to perform the operation on CPU memory is heavily time
dominant, using ephemeral transfers has a fairly small VRAM overhead
(depending on the size of the model/adapter) with orders of magnitude
speed-up in certain operations.

For example, a Llama3-8B DoRA adapter with r=64 would put an overhead of
2 x (64 x 4096 x 2 + 4096 x 4096) bytes (assuming fp16), i.e. 33 MB or
so. A Llama3-70B adapter with r=32 would have 2 x (32 x 8192 x 2 + 8192
x 8192) bytes =130 MB.

By making use of ephemeral GPU offloading, more efficient juggling of
data between GPU and CPU may become possible, i.e. where instead of
always loading as much as we can onto the GPU and then endure the CPU
slowness for whatever happens to not fit in there, we intentionally
leave a (modest) chunk of VRAM for optimizations like these, and the end
result is a much (MUCH) faster experience.
2024-07-02 12:17:45 +02:00
1e5227ff90 TST Bump absolute tolerance for test (#1891)
The test test_4bit_lora_mixed_adapter_batches_lora allclose can fail on
some systems, even though it passes on others (like CI). Increase the
tolerance slightly to get rid of this.
2024-07-02 11:37:43 +02:00
62122b5add FIX TEST Higher tolerance for AdaLoRA in test (#1897)
The test is flaky on CI, so this PR increases the tolerance to hopefully
fix the flakines. I cannot reproduce the error locally (neither on GPU
nor CPU), so I'm not 100% sure if this tolerance is enough to make the
test reliable.
2024-07-01 15:42:10 +02:00
9dc53b8fd5 CI Don't fail fast in test matrix (#1896)
Currently, we have fail-fast enabled (the default). Although this is
generally reasonable -- if a test fails in one setting, we probably get
the same failure in other settings -- it is currently an impediment.
This is because we get occasional timeouts when loading models from the
Hub. With fail-fast enabled, if a single setting fails because of
timeouts, all other runs are cancelled, even if they would have passed.
Then we need to retrigger all of them again, creating even more pressure
on the Hub. With fail-fast disabled, we give those other runs a chance
to pass successfully.
2024-07-01 15:04:02 +02:00
db8b76fdb5 DOC DoRA example script & notebook (#1885) 2024-06-28 12:05:53 +02:00
7ffa43b16e FIX Avoid early import of torch extension by BOFT (#1879) 2024-06-26 17:25:26 +02:00
27bc3054a3 FIX sft script: only print trainable params if peft (#1888) 2024-06-26 12:02:35 +02:00
184beaf1d6 FIX Make special LoRA inits DeepSpeed compatible (#1887)
Resolves https://github.com/huggingface/accelerate/issues/2886

Possibly resolves
https://github.com/huggingface/peft/issues/896#issuecomment-2172455458

Some LoRA init methods need to access the base layer weight. Getting
this access can fail or stall in distributed settings. For DeepSpeed,
the weight is now gathered before trying to access it.

Note: Without DeepSpeed, this is a no-op and should thus not have any
disadvantage. We don't have DS in our CI, so this is not tested.

I also made some small changes to OLoRA init to use
self.get_base_layer() instead of self.base_layer.
2024-06-26 11:25:54 +02:00
c9b19bb8f3 FIX Init AdaLoRA to be identity transform (#1884)
Resolves #1836

There was an accidental change in a previous PR that initialized lora_E
as normal, when it should be zeros.
2024-06-25 13:33:28 +02:00
ef23712b13 ENH: LoRA support for dynamically dispatching to custom layers (#1875)
Description

This is an experimental feature with a private API for now. If this
feature finds adoption, I will work on adding an official API.

With this PR, we allow users to register their own LoRA layer types.
This way, they can add their own support for hitherto unsupported layer
types, say nn.Conv3d or nn.LSTM. Without this PR, they can only do that
by creating a PR on PEFT with support for this new type and getting it
merged.

The custom dispatch mechanism also allows users to override existing
layer type mapping. This way, they can, for instance, provide their own
lora.Linear layer type, instead of using the one from PEFT, to adapt
nn.Linear layers.

Implementation

The implementation required only very few changes because we already
have a mechanism for dynamic dispatching for LoRA. It is currently used,
for instance, to dynamically add quantized target layers in case the
right quantization library is installed.

This existing mechanism is now extended to include user provided LoRA
layers if those were passed. These are checked first before checking the
default PEFT supported layers.

What's missing for this to become an official API?

Right now, the main reason why this cannot be an official API is the
question of how to persist the config. In the current implementation, we
add an attribute that is a mapping from target layer type to LoRA layer
type:

config._custom_modules == {CustomBaseLayer: CustomLoraLayer}

The entries of this dict are Python classes. Therefore, they cannot be
json-serialized. We could think of possible solutions how to serialize
and deserialize custom Python objects, but this is not trivial and
potentially a security risk. Thus I would only really start working on
this if the demand is sufficiently high. At that point, I would also add
a public API instead of requiring the use of a private API.

As is, users can still save and load PEFT models with custom LoRA
layers, they only need to add two lines of code to their scripts, as
documented.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-06-25 11:02:43 +02:00
d716adf31c Update bug-report.yml (#1882) 2024-06-21 16:45:44 +02:00
d37dde61e1 FIX Error when using VeRA with fp16 or bf16 (#1874)
The issue was that we didn't consider BufferDict when auto-casting the
adapter weights to float32 in PR #1706. This has now been addressed.

As #1706 was merged after the latest release, this bug should only
affect users who install from main, so a patch release should not be
needed.

As part of this PR, I also moved the buffer_dict.py up from
peft/tuners/vera to peft/tuners/ (with underscore to make it super clear
that this is not for public usage). This is because we need to use it
several times on a higher level than VeRA.
2024-06-19 13:21:17 +02:00
5364351446 CI Downgrade numpy to <2.0 for Mac and Windows (#1871) 2024-06-18 13:47:29 +02:00
717db6e1c2 CI testing BNB: remove single GPU tests (#1866)
CI testing BNB: remove single GPU tests

In #1859, we checked removing the import checks, but the single-GPU BNB
multi-backend branch is still stuck. Therefore, check commenting the next
step instead.

Also, add timeout of 60 min. Successful jobs currently take ~30 min. Default
timeout is 360 minutes.
2024-06-18 10:34:24 +02:00
5194aef509 Attempt to fix the red messages (#1868) 2024-06-17 15:34:31 +02:00
25c0fe9a55 FIX fix multitask prompt tuning paper link (#1862) 2024-06-17 10:57:34 +02:00
e0e8204bc3 Update lora_based_methods.md (#1861)
fixed typo in instructions for peft inference
2024-06-17 10:57:27 +02:00
076561bbd3 CI Testing: Remove bnb import check (#1859) 2024-06-14 18:02:27 +02:00
efda766f51 DOC Move helpers section to dev developer guide (#1856)
It was in the "Adapters" section, which doesn't really fit.
2024-06-13 12:44:25 +02:00
d608f8329a DOC FIX Comment about init of LoRA Embedding (#1855)
Fixes #1728
2024-06-13 11:58:26 +02:00
19461353aa Update nightly-bnb.yml (#1854) 2024-06-13 11:40:40 +02:00
3831e06ab5 FIX: Adalora ranknum loaded on wrong device (#1852)
Locally, multiple AdaLoRA-related tests are failing. We did not catch
this in the nightly run because the tests were missing the necessary
pytest marker.

The issue is related to the change in #1742, which made it possible to
load different adapters on different devices.

Although that PR itself was sound, the issue is that for AdaLoRA, one of
its parameters, ranknum, was not listed in the other_param_names and was
thus not moved to the correct device. This oversight is now fixed and
the GPU tests are now passing locally for me.

This PR also adds the missing pytest marker to the test class that was
missing it, so that these errors should be caught by our nightly CI in
the future.
2024-06-13 10:47:49 +02:00
2f5360a7da FEAT Add OLoRA initialization strategy to LoRA (#1828) 2024-06-12 17:46:43 +02:00
8843a767da MNT Upgrade ruff version to ~0.4.8 (#1851)
We currently use ruff v0.2.2, which is quite far behind the latest
version. This has the disadvantage that new contributors will often
install the latest version of ruff and then get CI errors, even though
they ran `make style`.

Here is the full list of changes:

- bump ruff version to ~0.4.8
- update the ruff commands in Makefile (ruff foo/ -> ruff check foo/)
- update coding style of two files that changed with the new ruff
  version
2024-06-12 15:01:45 +02:00
b6af7feb34 DOC Fix PeftMixedModel docstring example #1824 (#1850) 2024-06-12 14:27:14 +02:00
47b3d7422a CI Activate env to prevent bnb import error (#1845)
All bitsandbytes nightly CI runs are currently failing with:

Run python3 -m bitsandbytes
/opt/conda/bin/python3: No module named bitsandbytes

This fix should hopefully solve this, but it's untested.
2024-06-11 10:59:32 +02:00
7b1c08d2b5 ENH Support different layer shapes for VeRA (#1817) 2024-06-10 17:10:56 +02:00
a8286a7bff DOC Describe torch_device in from_pretrained docs (#1843) 2024-06-10 16:01:00 +02:00
683db0fa2c feat(ci): add trufflehog secrets detection (#1841)
* feat(ci): add trufflehog secrets detection

* fix(ci): remove unnecessary permissions
2024-06-10 11:40:36 +02:00
0f89d34d82 Fix broken messages (#1842) 2024-06-10 11:21:48 +02:00
0b40d1a304 Workflow / Bnb: Add a mechanism to inform us if the import fails (#1830)
* Update nightly-bnb.yml

* Update nightly-bnb.yml

* Update .github/workflows/nightly-bnb.yml

* Update .github/workflows/nightly-bnb.yml
2024-06-07 16:38:10 +02:00
03798a9143 FIX Failing Llama tests due to new kv cache (#1832)
The issue is that past_key_values can now be an instance of
DynamicCache. Therefore, just indexing into it won't work anymore. The
solution is to check the type and if it's not a tuple/list, use the methods
on the cache object instead.
2024-06-06 15:49:59 +02:00
d33c1f118e fix doc typo (#1833) 2024-06-06 15:34:10 +02:00
63a536b18e TST Make tests pass on Cambricon MLUs (#1747)
Small adjustments to tests to make them pass on Cambricon MLUs (mostly
tolerances). Note that we have no MLU test runners for PEFT, so have to
rely on others to run these tests.
2024-06-06 10:44:03 +02:00
ad8f7cb59e Update build_docker_images.yml (#1823) 2024-06-04 13:34:37 +02:00
3538e8ac7d FIX CI: Install pytest-reportlog package (#1822) 2024-06-04 13:09:09 +02:00
b213ea5fb9 Update tests-main.yml (#1821) 2024-06-04 12:31:31 +02:00
7ed94f3269 FIX CI: Remove potentially problematic git command (#1820)
See if this fixes the error in the workflow.

> fatal: detected dubious ownership in repository at '/__w/peft/peft'
2024-06-04 12:18:37 +02:00
a0788a3f92 Refactor to make DoRA and QDoRA work with FSDP (#1806)
This PR moves all the DoRA functionality into a separate module class.
Essentially, this is necessary because otherwise, the DoRA parameter
lives on the lora.Linear layer as a parameter, not a separate module.
Since FSDP auto wrap policy operates on the level of modules, not
parameters, there is no way to modify the auto wrap policy to wrap the
DoRA parameter, it must be its own module.

If not for this reason, #1797 would be preferable, since the number of
code changes is smaller overall. In this PR, there are more numerous
changes, but the majority only involves moving code around, not any
actual code changes.

Since we introduce a new submodule, an extra steps are required to
ensure that old DoRA state dicts can still be loaded correctly. This
involves a fairly trivial extra remapping step in
set_peft_model_state_dict. The test for this is performed via the new
regression DoRA tests introduced in #1792.

Similarly, there is a remapping step involved in
get_peft_model_state_dict to ensure that when new state dicts with DoRA
are saved, they still conform to the old format.

An additional required change was to make a defensive copy of the base
layer before dequantizing its weight in order to calculate the weight
norm for DoRA. Without this defensive copy, some side-effect is
triggered in FSDP that results in

> ValueError: Cannot flatten integer dtype tensors

even though the compute dtype of bnb is correctly set to float.

Creating a fully functioning deepcopy does currently not work with 8bit
BNB but there is a fix. Once the next BNB release is out, 8bit BNB will
be tested and enabled.

While working on this, I also noticed a small bug that dropout was not
correctly applied when using QDoRA. This is now also fixed.

This PR was tested successfully with FSDP and (Q)DoRA using the scripts
in examples/sft/ with a modification to enable DoRA.
2024-05-31 16:56:21 +02:00
cb0bf07774 MNT Remove deprecated use of load_in_8bit (#1811)
Don't pass load_in_8bit to AutoModel.from_pretrained, instead use
BitsAndBytesConfig.

There was already a PR to clean this up (#1552) but a slightly later
PR (#1518) re-added this usage.

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-05-30 15:39:26 +02:00
8cd2cb613b CI Make torch compile tests run on GPU (#1808)
Many of these tests require a GPU to run, so use custom runners.

Code was mostly copied from existing workflows.

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-05-30 12:37:18 +02:00
e7b75070c7 TST: Add simple BNB regression tests (#1602)
These are very basic and simplistic regression tests for bnb. Their
purpose is to ensure that there is no unnoticed change in bnb that leads
to different outputs. There is no check for "correctness", just that the
results haven't changed.

Eventually, this workflow should be improved and moved to the bnb repo.

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-05-28 11:36:38 +02:00
1b262167f3 Docs / LoRA: Add more information on merge_and_unload docs (#1805)
* put back lora merging diagram

* push

* Update docs/source/developer_guides/lora.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-05-28 11:13:44 +02:00
39c60ffca9 TST Add regression test for DoRA, VeRA, BOFT, LNT (#1792)
These new methods were added but the regression tests were not extended
yet. This PR adds regression tests for these methods. The regression
artifacts have been pushed based on PEFT v0.11.1. The new tests pass
locally.
2024-05-27 12:00:47 +02:00
8304017a9a FIX BOFT device error after PR 1742 (#1799)
PR #1742 introduced the feature that adapters of the same layer can be
on different devices. A new method was introduced that is responsible
for moving the parameters related to a specific adapter in a consistent
way.

In BOFT, however, one parameter was overlooked, boft_P. This parameter
is not stored inside a ParameterDict or ModuleDict, hence it was not
moved. The reason is (presumably) that this parameter is shared between
all BOFT adapters, as it's always identical. However, this clashes with
having different adapters on different devices.

To solve this, the parameter is now moved on the fly to the correct
device during the forward pass.
2024-05-27 10:12:22 +02:00
b2922565c4 TST Install bitsandbytes for compile tests (#1796)
Also, remove outdated comment.
2024-05-23 16:12:57 +02:00
3cf5359f11 FIX Allow same layer adapters on different devices (#1742)
The issue is that so far, we made the assumption in PEFT that all
adapter weights of a specific layer are on the same device. There can be
cases where it is useful to have adapters on different devices. E.g.
when a user loads a lot of LoRA adapters and wants to offload those not
currently in use to CPU, they would not currently be able to do so.

With this PR, we add this possibility. To achieve this, when we update
an adapter layer with a new adapter, we only move that specific adapter
to the device of the base layer, will not touching the other loaded
adapters.

While working on this, I discovered a small bug in VeRA when adding
multiple adapters, which is now also fixed.
2024-05-23 10:54:40 +02:00
cb7aedd9ba fix docs (#1793) 2024-05-23 11:37:30 +05:30
47745d57c2 FIX Use correct attribute name for HQQ in merge (#1791)
Without this fix, test_hqq_lora_model_outputs currently fails.
2024-05-22 16:35:27 +02:00
1fec23152a DOC TST Reproducibility of models using batch norm (#1734)
Fixes #1732

After loading a model that was trained with PEFT on a base model with
some kind of batch norm layer, the loaded model should produce the same
output. Right now, this does not happen.

The reason is that during training, buffers for running mean etc. are
updated, but they are not saved when calling save_pretrained on the
PeftModel instance. Normally in PEFT, we assume that during training,
the base model parameters are kept constant, which is not the case with
batch norm. We only save the PEFT parameters and assume that when the
user loads the base model, all parameters are restored exactly. That
way, the information in the buffers is lost completely.

The fix is to add the batch norm layers to modules_to_save. This fix is
now documented and tested.
2024-05-22 10:43:29 +02:00
bc6a99906c FIX Warning abt config.json when the base model is local. (#1668)
Fix incorrect warning when loading local model.
2024-05-21 15:45:06 +02:00
691bc22ea6 ENH Layer/model status shows devices now (#1743)
For each adapter, show all the devices of this adapter's parameters.

Also, while working on this, found a very minor bug in VeRA as its
linear layer didn't implement its own __repr__.
2024-05-21 15:35:51 +02:00
fb7f2796e5 Add add_weighted_adapter to IA3 adapters (#1701)
* Add add_weighted_adapter to IA3 adapters

* Refactor to simplify code

* refactor test

* Add IA3 merging docs

* Update docs/source/developer_guides/model_merging.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Update docs/source/developer_guides/model_merging.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address PR feedback

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-05-17 22:29:22 +05:30
4e32679f37 TST: torch compile tests (#1725)
Right now, we don't have specific tests for torch.compile. Instead, we
have a "hack" that allows to run _all_ tests with torch.compile if we
set the environment variable PEFT_DEBUG_WITH_TORCH_COMPILE=1.

This is not very practical because it takes a lot of time to run all
these tests with compilation enabled. Also, currently hundreds of tests
are failing, which makes it impossible to understand more closely what
does or does not work.

This PR removes the aforementioned "hack" and instead replaces it with a
list of explicit torch.compile tests. Currently, these tests cover
training/inference with a bunch of different tuner types, as well as
more advanced features with LoRA (e.g. quantization, multiple adapters,
etc.).

Some of these tests pass and some of them fail. This is documented now,
so that users can quickly look up if their use case would be compatible
with torch.compile. This is very useful to have, because sometimes
torch.compile may appear to work but actually returns the wrong result.
For users, it's not immediately obvious when this happens.

The test suite is not exhaustive, there are many combinations of
features that could be added. However, it should be a good starting
point and can be extended later.

The test suite does _not_ cover whether torch.compile actually
accelerates the code. This may not be the case even if it works
correctly (e.g. because of graph breaks). Testing this would require
bigger models and more data, which is prohibitively slow to test.

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-05-17 18:03:27 +02:00
3f7aacd601 Bump version to 0.11.2.dev0 (#1741)
After patch release of 0.11.1.
2024-05-17 15:37:30 +02:00
e3eeabfad2 FIX BOFT setting env vars breaks C++ compilation (#1739)
Resolves #1738
2024-05-17 12:43:03 +02:00
ae1ae20b76 Autocast adapter weights if fp16/bf16 (#1706)
As discussed internally, we want to automatically cast the weights of
the adapter to float32 when using float16. Float16 is not conducive to
stable training and raises errors when used with AMP.

Previously, we had to recommend to users to manually cast the weights
if they loaded the base model in float16, because PEFT would choose the
same dtype for the adapter as for the base weights. Forgetting this is a
common source of errors, so we choose to automate this.

If this causes trouble, users can prevent the behavior by passing
autocast_adapter_dtype=False to get_peft_model,
PeftModel.from_pretrained, or PeftModel.load_adapter.

This PR should be reviewed carefully, as it has the potential to break
existing code if something important was missed. We also need to add a
note for the upcoming release text about this change in behavior.
2024-05-16 17:11:36 +02:00
2535036c24 ENH Save and load base model with revision (#1658) 2024-05-16 16:27:53 +02:00
e003ae7850 Bump version to 0.11.1.dev0 (#1736) 2024-05-16 12:34:29 +02:00
0649947396 Release: v0.11.0 (#1733) 2024-05-16 11:41:41 +02:00
b5acf5d6be Add PiSSA as an initialization method of LoRA (#1626)
Implements https://huggingface.co/papers/2404.02948.
2024-05-15 11:35:39 +02:00
748f7968f3 FIX Allow DoRA init on CPU when using BNB (#1724)
Resolves #1674

For some users, it is necessary to initialize the model on CPU, even
when using BitsAndBytes, which requires a GPU eventually. Since DoRA
requires to dequantize the BNB weights at initialization, we need to
temporarily move the model corresponding weights to GPU. After
dequantization, the weights are moved back to CPU.
2024-05-14 17:10:23 +02:00
47b3712898 DOC Document the PEFT checkpoint format (#1717)
Description of the PEFT checkpoint format and what it takes to convert
to it.

---------

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-05-14 11:38:02 +02:00
2558dd872d Workflow: Add slack messages workflow (#1723)
* add slack messages workflow

* Update .github/workflows/build_docker_images.yml

* Update .github/workflows/build_docker_images.yml

* fix
2024-05-13 17:27:01 +02:00
6f41990da4 FIX Trailing ws in revise run_peft_multigpu.sh (#1722) 2024-05-10 11:47:52 +02:00
d8fec400c7 DOC Fix incorrect method name (#1719) 2024-05-09 12:19:50 +02:00
32f3878870 DOC Some small cleanups in docstrings, copyright note (#1714) 2024-05-07 12:50:19 +02:00
cb08d095a5 support Cambricon MLUs device (#1687)
* support mlu device

* rollback

* up

* add version check for mlu

* better accelerate version check for mlu device

* fix error with make style
2024-05-07 12:40:46 +02:00
86d086ec37 FEAT Helper to check if a model is a PEFT model (#1713) 2024-05-07 11:06:03 +02:00
02ae6bcb37 Add LoRA support to HQQ Quantization (#1618)
* Add HQQ Lora

* fix error weight load

* Remove unused

* Add quantized lora

* fix make HQQLinear

* Fix dtype

* Revert back quantize lora

* Add prepare training for hqq quantization

* Forget revert hqq

* Remove warnings

* Other ways to check hqq quantization

* Add unit test for training

* change bfloat16 to float16

* Fix load weight when applied dora

* Move import hqq inside if clause

* Naming using CamelCase

* Remove unused function and fix naming convention

* Pop offload_meta

* Add use_dora params

* Remove confusing comments

* Additional test for checking output from HQQ

* Add license notice

* Add parameter decorator

* Redundant calling get_base_layer

* do make style

* Remove unused comments

* Move dispatch_hqq out of if clause

* make style all scripts

* Add comment for explanation

* Mention HQQ to docs

* Add HQQ to Dockerfile

* Fix styling

* Styling scripts

* Comply with transformers HQQ integration

* Test fully using transformers

* Add comments handling HQQ

* Fix naming problem
2024-05-03 15:43:26 +02:00
77b7238b90 fix the fsdp peft autowrap policy (#1694)
* fix the fsdp peft autowrap policy

* address comment wrt backwards compatibility
2024-05-01 09:08:55 +05:30
3edcebf713 Set experimental dynamo config for compile tests (#1698)
Right now, a lot of tests fail when applying torch.compile to PEFT
models. One of the main reasons is that attribute checks (self.foo) on
nn.Modules are not correctly considered.

This PR sets an experimental flag that should fix this. However, this is
not public PyTorch API (yet) and incurs a performance penalty. Still,
it's interesting to see how this affects our tests.

More context:
https://github.com/pytorch/pytorch/issues/124717#issuecomment-2083235776
2024-04-30 14:32:20 +02:00
e0cb15e2ee FIX Use different doc builder docker image (#1697)
Same as in:

c712d05aa8/.github/workflows/build_documentation.yml (L19)
2024-04-30 13:30:07 +02:00
3ec55f4ac4 FEAT Add LayerNorm tuning (#1301)
LN tuning based on: https://arxiv.org/abs/2312.11420
2024-04-30 12:21:38 +02:00
608a90ded9 TST: Skiping AWQ tests for now .. (#1690)
* Update test_gpu_examples.py

* Update tests/test_gpu_examples.py
2024-04-29 18:27:13 +02:00
e19f7bf424 FIX Doc error prompt tuning seq len calc (#1686)
Signed-off-by: Jiri Podivin <jpodivin@gmail.com>
2024-04-29 16:23:46 +02:00
250b7eb85f FEAT Show adapter layer and model status (#1663)
This PR adds a new feature to PEFT models that allows to better
understand the status of adapter(s) on the model. Quoting from the doc
entry that I added:

Sometimes, the PEFT model can end up in a bad state, especially when
handling multiple adapters. There can be some confusion around what
adapters exist, which one is active, which one is merged, etc. To help
investigate this issue, you can call the
get_layer_status and the
get_model_status methods. The first one gives you a
detailed overview about the adapters for each targeted layer. The latter
one gives you a high-level overview about the model status.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-04-29 13:31:23 +02:00
f5f7b67d60 FIX Issues with AdaLora initialization (#1652)
Resolves #1647

- AdaLoraConfig now converts target_modules to set, same as LoRA
- AdaLoraConfig now raises when used with DoRA
- AdaLoraConfig now raises when used with LoftQ
- AdaLoraModel now raises when trying to call add_weighted_adapter
- Add tests for those in test_initialization.py
- Small clean ups in test_initialization.py
2024-04-29 13:09:34 +02:00
7a22b7daf0 FIX bf16 dtype issue for IA3 (#1634)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-04-29 11:50:42 +02:00
e7b47ac01d FIX Init DoRA weights in float32 if float16 used (#1653)
When DoRA weights are initialized in float16 on CPU and when an older
PyTorch version is being used (<2.2), there is an error because the the
operation is not supported for float16 on CPU. This commit temporarily
converts the LoRA weights to float32 beforehand if they're in float16.

Of course, when the user tries to train or predict with this model on
CPU, they will still encounter errors. However, in certain situations,
only the initialization might be on CPU and later it is moved to GPU.
This could be some framework code that the user has no control over, as
in #1597. Therefore, it's good to have this safety hatch.

Note that since our CI uses the latest PyTorch version, we cannot run a
test for this, as the latest PyTorch runs no matter what.
2024-04-29 11:35:47 +02:00
8bc3c0861d Update Dockerfile (#1684) 2024-04-26 15:49:02 +02:00
383e1fab0e Update build_docker_images.yml (#1682) 2024-04-26 10:48:05 +02:00
d0fa70aeb6 FEAT: Add EETQ support in PEFT (#1675)
* v1

* fix tests'

* fix unneeded change

* fix unneeded change

* fix unneeded change

* fix

* fix CI

* fix docker image

* fix docker image

* add docs

* lazy import

* raise when merge

* raise when merge

* Update eetq.py

* merge

* style

* add unmerge

* indent

* Update docs/source/developer_guides/quantization.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* add details about transformers

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-04-26 10:20:18 +02:00
b1d6c77108 FIX Don't eagerly import bnb for LoftQ (#1683)
We accidentally added code in loftq_utils.py that eagerly imports bnb,
which we want to avoid to prevent CUDA from being initialized too early.
2024-04-25 20:35:16 +02:00
f0d3c6b892 FIX Use trl version of tiny random llama (#1681)
Using the version from HuggingFaceM4 broke our tests because it was
updated. Although the update is reverted, we still better switch to this
version, which is explicitly for testing and should be stable.
2024-04-25 15:15:57 +02:00
3d9529d190 FIX / Workflow: Fix Mac-OS CI issues (#1680)
* Update helpers.py

* Update tests.yml

* Update src/peft/helpers.py
2024-04-25 14:28:03 +02:00
835181460c ENH: Add multi-backend tests for bnb (#1667)
* add multi-backend tests for bnb

* Create README.md

* Update build_docker_images.yml
2024-04-25 13:08:33 +02:00
3d6520e2eb DOC DeepSpeed and QLoRA compatibility (#1679) 2024-04-25 11:43:42 +02:00
5a4b9cade6 VeRA (Vector Based Random Matrix Adaption) (#1564)
Implements VeRA: https://huggingface.co/papers/2310.11454

VeRA is similar to LoRA but even more parameter efficient, while promising to
keep the same performance. In its current implementation, it has a few
limitations compared to LoRA:

- All targeted parameters must have the same shape.
- Only `nn.Linear` layers are supported.
- Quantized layers are not supported.

This PR is based on, and supersedes, #1039.

---------

Co-authored-by: Alex McKinney <alex.f.mckinney@gmail.com>
Co-authored-by: Dawid <20214809+dkopi@users.noreply.github.com>
2024-04-19 10:55:58 +02:00
144b7345c2 ENH Support safetensor in multitask_prompt_tuning (#1662)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-04-18 13:48:50 +02:00
bdb856786e MNT Remove dreambooth git submodule (#1660)
Leftover that was not removed in BOFT PR.
2024-04-18 13:47:47 +02:00
ed865e2812 FIX Bug with handling of active adapters (#1659)
There was a bug for some models like IA3, LoHa, etc., where calling
set_adapter would not correctly update the active_adapter. This is now
fixed.

Note that this is not about the active_adapter attribute on PeftModel or
layers, which are handled separately.

This PR also ensures that LoraModel, IA3Model, etc. consistently use
self.active_adapters, not self.active_adapter. The latter should be
treated more like a private attribute (but this isn't changed for
backwards compatibility).
2024-04-17 13:39:33 +02:00
56773b9a92 ENH Float fmt in print_trainable_parameters (#1648)
This PR replaces `trainable%: 0.5916145025956931` with `trainable%:
0.5916` as it's done already in `src/peft/mixed_model.py`.
2024-04-15 13:51:05 +02:00
c8974c5880 DOC Update figure assets of BOFT (#1642) 2024-04-12 15:23:17 +02:00
7671926243 FIX Errors in the transformers integration docs (#1629)
- Typo: wrong variable name
- Typo: disable_adapter -> disable_adapters
- Remove link to PeftModel.disable_adapter, because it's not the same
  method
- Mention the enabble_adapters method too
2024-04-12 14:43:16 +02:00
9f0cfc9919 Don't use deprecated Repository anymore (#1641)
* Don't use deprecated Repository anymore

* oops
2024-04-12 13:51:37 +02:00
811169939f BOFT: Orthogonal Finetuning via Butterfly Factorization (#1326)
Implements https://hf.co/papers/2311.06243.

---------

Co-authored-by: Zeju Qiu <zeju.qiu@gmail.com>
Co-authored-by: Yuliang Xiu <yuliangxiu@gmail.com>
Co-authored-by: Yao Feng <yaofeng1995@gmail.com>
2024-04-12 13:04:09 +02:00
b0f1bb468c add deepspeed support for adalora finetune (#1625)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-04-12 15:02:33 +05:30
31c884e934 FEAT Allow load_adapter to use different device (#1631) 2024-04-10 11:39:02 +02:00
88875f1cf5 FIX Correctly call element_size (#1635)
Should fix the error introduced by #1630.

AFAICT, element_size should be called on the parameter, not the dtype.
Unfortunately, I had issues getting older PyTorch versions to work with
bnb, so I haven't tested the initial issue.

To be safe, I also re-added the previous code path using itemsize,
although it might be unnecessary (we would have to check the PyTorch
code to verify when the different attributes/methods were added).
2024-04-09 15:38:57 +02:00
0d283ae0e6 FIX Multiple adapters and modules_to_save (#1615)
Previously, we had the bug that if we had multiple adapters, some with
modules_to_save and others without, when trying to switch to an adapter
without modules_to_save, the ModulesToSaveWrapper would raise an error
because it cannot find that adapter. Now, when it detects this, it is
just disabled (so it uses the original weight).

Moreover, we had the issue that when we were using classes such as
PeftModelForSequenceClassification, we implicitly added the classifier
layers to model.modules_to_save. However, this would only add a new
ModulesToSaveWrapper instance for the first adapter being initialized.
When initializing a 2nd adapter via add_adapter, this information was
ignored. To fix this, I now update the peft_config.modules_to_save to
explicitly add the classifier layers. This is a departure from how this
worked previously, but I'm couldn't find a better way to ensure that
this bug was fixed.

Finally, there was a bug in add_weighted_adapters when we were merging
multiple adapters with modules_to_save. Previously, when we called
model.add_weighted_adapter, the LoRA weights were merged and a new
ModulesToSaveWrapper was added for the new adapter based on the first
LoraConfig of the two adapters. This ModulesToSaveWrapper is just a copy
of the original weights. Thus, when we switch to the newly merged
adapter, we just use the original weights for modules_to_save. This
doesn't make a lot of sense and is probably surprising for the user.
Now, we raise an error when we detect this to alert the user to this
fact.

Note that when only one of the adapters to be merged has a
modules_to_save, this does not raise an error, instead that module is
being used.
2024-04-09 12:59:25 +02:00
e07095a654 itemsize is torch>=2.1, use element_size() (#1630) 2024-04-08 16:37:31 +02:00
5b60ec0204 FEAT: Allow ignoring mismatched sizes when loading (#1620)
When users pass ignore_mismatched_sizes=True to
PeftModel.from_pretrained, the mismatched tensors will be ignored
instead of raising an error. This is in line with how transformers
handles this argument.
2024-04-08 12:23:35 +02:00
dfac641c63 Update nightly-bnb.yml (#1628) 2024-04-08 11:23:09 +02:00
26726bf1dd FIX Make DoRA work with Conv1D layers (#1588)
Previously, code was not transposing weights when calculating the norm.

In addition, update some DoRA tests to use BitsAndBytesConfig.
2024-04-05 15:26:37 +02:00
16ec3f995a FIX: bnb config wrong argument names (#1603)
Several tests were using bnb_4bit_compute_type but the argument should
be called bnb_4bit_compute_dtype. Now the correct name is used.

This change should not affect the tests, because they were passing the
default value anyway. Therefore, the fact that this argument was passed
incorrectly (and thus, presumably, ignored) should not affect the
results.

Also, fix another incorrect argument to bnb config. These were caused by an
incorrect search and replace operation in #1552.
2024-04-05 12:26:52 +02:00
ca6bbb594f FIX Remove duplicated import in notebook (#1622) 2024-04-05 12:00:31 +02:00
2e821c1dc8 FIX Use correct model in image clf notebook (#1624) 2024-04-05 11:59:58 +02:00
8452d71e14 fix the torch_dtype and quant_storage_dtype (#1614)
* fix the torch_dtype and quant_storage_dtype

Co-Authored-By: Gabriel Altay <gabriel.altay@gmail.com>

* quality

---------

Co-authored-by: Gabriel Altay <gabriel.altay@gmail.com>
2024-04-05 00:14:56 +05:30
02b5aeddf9 MNT: Update GH bug report template (#1600)
* Update GH bug report template

* Formatting
2024-03-28 16:14:44 +05:30
c4c826c0c0 FIX deepspeed zero3+prompt tuning bug (#1591)
Fixes error

> word_embeddings.weight shape is torch.Size([0])

See https://github.com/huggingface/optimum-habana/pull/758 for context.

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-03-27 12:21:53 +01:00
d582b68c7f FEAT Add cache to import_utils calls (#1584)
Otherwise, repeated calls with big PYTHONPATH can lead to considerably
slow downs, as there is no caching.
2024-03-26 14:28:22 +01:00
4537317961 Extend from_pretrained to models with disk-offloaded modules (#1431)
This PR extends PeftModel.from_pretrained() to be compatible with disk
offloading. The approach is to re-make the offload_index map to the
original (offloaded) model's safetensors files and then save the LoRA
parameters with renamed model module parameters to new safetensors
folders, updating the offload_index accordingly.

This is a complement to PRs #1190 and transformers #27412, and is
designed to allow for the loading of PeftModels using only the memory
equivalent of one safetensors file.
2024-03-26 11:19:04 +01:00
ffb8512396 DOC Section on using transformers pipeline (#1587) 2024-03-25 16:00:28 +01:00
78daa4cf76 Update lora.md (#1582) 2024-03-22 13:44:24 +01:00
65b75a6798 FIX Minor issues in docs, re-raising exception (#1581)
- Typos in docs
- Add mention that DoRA now supports Conv2d
- Re-raise a caught exception (to not hide original error)
2024-03-22 16:57:46 +05:30
a62b337940 Bump version to 0.10.1.dev0 (#1578) 2024-03-21 12:02:50 +01:00
8221246f2f Release: v0.10.0 (#1573)
Besides updating versions, removed 2 deprecations.
2024-03-21 10:16:25 +01:00
8e979fc732 More convenient way to initialize LoftQ (#1543)
Related to #1532

At the moment, using LoftQ is quite cumbersome, as shown in this
example:

7e84dec20b/examples/loftq_finetuning

Essentially, users have to:

1. Load the non-quantized model with LoftQ (which can be quite huge)
2. Modify the PEFT config
3. Save the adapter
4. Unwrap the base model
5. Save the base model with modified weights (i.e. a whole copy of the
   base model)
6. Load the base model from step 5 with bnb quantization
7. Load the adapter from step 3

Yes, there is a helper script to do this, but this still has the
advantage that we need to load the non-quantized model and that we have
to create a completely new model checkpoint with the modified weights.

This PR aims to make this process more convenient by adding a single
function replace_lora_weights_loftq. This function takes the
bnb-quantized LoRA model as input. Then it goes through each module with
LoRA weights, lazily loads the corresponding non-quantized weights one
at a time using safetensors, computes the quantization error, and
replaces the LoRA weights with LoftQ-initialized LoRA weights.

This is much more convenient because we only require very little extra
memory thanks to lazy loading, and we don't have to keep an extra copy
of the weights.

While working on this, I still found that LoftQ initialization often did
not seem to help a lot, as mentioned in #1532. I measured this by
creating (1) logits with the base model, (2) with the quantized+LoRA
model, and (3) with the quantized+LoRA+LoftQ model. The expectation is
that (1) should be closer to (3) than to (2). This was often not the
case.

I therefore added the possibility to run a check each time that we
replace a LoRA weight with the LoftQ weights. If this check returns
True, we proceed to the next weight, otherwise we discard the change.
That way, we only make the replacement with LoftQ weights if we see a
real improvement. Of course, this is only a form of greedy optimization,
but it seems to work in practice. And since it's optional, users can
choose not to use it.

This doesn't support 8bit quantization and the num_iter arguments of LoftQ.
However, the replace_lora_weights_loftq function can be called multiple
times in a row for slightly improved results.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-03-20 11:16:07 +01:00
a86b29a217 Fix LoftQ docs and tests (#1532)
The docs on how to apply LoftQ have been fixed. In contrast to what they
claimed earlier, it is quite a bit more involved to get LoftQ working,
requiring a complete roundtrip first loading a non-quantized model with
LoftQ, saving the LoRA weights and the modified base model, loading the
just stored base model again but this time with quantization, and
finally loading the LoftQ-initialized adapter on top. The docs now link
to the example which demonstrates how to move through these steps, and
give some tips on how to achieve best outcomes.

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2024-03-20 10:37:35 +01:00
8dd45b75d7 FIX [CI] Fix test docker CI (#1535)
* Update test-docker-build.yml

* Update test-docker-build.yml

* dummy push

* final push

* Update .github/workflows/test-docker-build.yml
2024-03-18 16:07:52 +01:00
91e4b0879d FEAT Mixing different LoRA adapters in same batch (#1558)
This PR revives the work by Sourab in #903. The core logic is
the same between the two PRs. This one should be more complete.

The main idea is to allow the user to mix different LoRA adapters in the
same batch. This is useful when the user wants perform inference with a
batch that uses different LoRA adapters. Without this, each batch would
have to be restricted to the same LoRA adapter(s).

This PR should encompass:

- all task types
- all LoRA layer types
- bnb layers

Extensive tests were added, as well as documentation.

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-03-18 15:50:00 +01:00
a18734d87a Update style with ruff 0.2.2 (#1565)
This is necessary to add to main fast, or else all branches from main
will require these changes to pass the quality checks.
2024-03-15 10:20:41 +01:00
6008f272a5 Changes to support fsdp+qlora and dsz3+qlora (#1550)
* changes to support fsdp+qlora and dsz3+qlora

* address comments

* add example and start docs

* quality

* deepspeed fixes

* dsz3+qlora docs

* section link fix

* add fsdp+qlora docs

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* address comments

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-03-13 15:23:09 +05:30
a9425d1409 TST Report slowest tests (#1556) 2024-03-12 18:02:12 +01:00
3b63996964 Feat: Support for Conv2D DoRA (#1516) 2024-03-12 16:23:17 +01:00
3eb6bbacee QDoRA: Support DoRA with BnB quantization (#1518)
Adds support for DoRA on 4bit and 8bit quantized models with BnB.
Merging also works, with the usual caveats for quantized weights
(results are not 100% identical), but it's not worse than vanialla LoRA.
2024-03-12 12:44:59 +01:00
5471c9a1be Add support for layer replication in LoRA (#1368)
* Add support for layer replication in LoRA

* Add test and update docs

* Address review comments

* Code cleanup and additional model support

* Add docs, address comments

* Add link to example model

* Improve test and fix typos

* Update src/peft/tuners/tuners_utils.py

Fix typo in doc string.

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-03-12 14:52:09 +05:30
d28fffb917 Add Support for Mistral Model in Llama-Adapter Method (#1433)
* Support Mistral For llama-adapter

* Update src/peft/tuners/adaption_prompt/layer.py

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* Update src/peft/tuners/adaption_prompt/layer.py

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* corrected logic and added test

* removed commented out code

* Added seperate test functions for mistral

* missed self.assert

* ruff formatting

---------

Co-authored-by: Prakhar Saxena <prakharsxena11111@gmail.com>
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2024-03-12 14:51:42 +05:30
6dca6d2292 MNT: Use BitsAndBytesConfig as load_in_* is deprecated (#1552)
Don't pass load_in_4bit or load_in_8bit to AutoModel*.from_pretrained,
as it is deprecated. Instead, pass the appropriate BitsAndBytesConfig to
the quantization_config argument of from_pretrained.
2024-03-11 15:24:48 +01:00
a1fe368bfc FIX: Make adaptation prompt CI happy for transformers 4.39.0 (#1551)
* fix for transformers 4.39.0

* Update src/peft/tuners/adaption_prompt/utils.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-03-11 12:53:05 +01:00
234bbabd9b FIX Allow AdaLoRA rank to be 0 (#1540)
Resolves #1539

Additionally, a small fix to the AdaLoRA training script was added.
2024-03-11 11:35:50 +01:00
e3840c249e Update prompt_based_methods.md (#1548)
remove duplicate commas
2024-03-11 10:58:40 +01:00
7e84dec20b Optimize levenshtein algorithm in scripts (#1527)
This commit refines the levenshtein_distance algorithm implemented in peft_lora_seq2seq_accelerate_ds_zero3_offload.py to improve its space
complexity from O(n^2) to O(n). Additionally, thorough testing has been
conducted to ensure the correctness and reliability of the revised
implementation.
Also update peft_lora_clm_accelerate_ds_zero3_offload.py
2024-03-07 11:44:22 +01:00
e7e95c004b Fixed minor grammatical and code bugs (#1542)
Line 59 - trainning to training
Line 80 - LoraConfig missing a comma after Lora Dropout
Line 141 - quantizaion to quantization
2024-03-07 08:44:25 +01:00
e597388305 FIX Check requires args for prompt tuning config (#1519) 2024-03-05 15:07:50 +01:00
7662f342e0 DOC: extend docs for get_nb_trainable_parameters() (#1531)
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
2024-03-05 12:53:29 +01:00
b58b13b528 Expose bias attribute on tuner layers (#1530)
Expose bias attribute on tuner layers

See #1524

This is similar to how we already expose the weight attribute.
2024-03-05 11:59:40 +01:00
98f4db2c79 FIX [Docs/ bnb / DeepSpeed] Add clarification on bnb + PEFT + DS compatibilities (#1529)
* add clarification on bnb + PEFT + DS

* more clarification

* clarifications

* more clarification
2024-03-05 06:55:20 +01:00
c43cc5028e Update test-docker-build.yml (#1534) 2024-03-05 03:40:47 +01:00
84abf5a5ab FIX [CI / Docker] Follow up from #1481 (#1487)
* Update test-docker-build.yml

* Update test-docker-build.yml

* Update Dockerfile

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update Dockerfile

* Update .github/workflows/test-docker-build.yml

* Update .github/workflows/test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update .github/workflows/test-docker-build.yml

* Update Dockerfile

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* Update test-docker-build.yml

* revert

* Update .github/workflows/test-docker-build.yml

Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>

---------

Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>
2024-03-05 03:32:07 +01:00
34f3fba2b3 Fix for "leaf Variable that requires grad" Error in In-Place Operation (#1372)
Avoid in-place operations for LoRA forward and merging.
2024-03-04 13:42:36 +01:00
9119b780eb Bump version to 0.9.1.dev0 (#1517) 2024-02-29 10:41:07 +01:00
7e5335d093 Release: v0.9.0
Note that we did not set a dev version for 0.8.2, so this PR goes
directly from 0.8.2 to 0.9.0.

No deprecated code or similar to remove.
2024-02-28 11:19:24 +01:00
096fe53737 FEAT Implement DoRA (#1474)
Add DoRA (Weight-Decomposed Low-Rank Adaptation).

https://arxiv.org/abs/2402.09353

To use this with LoRA, add use_dora=True to the LoraConfig.

Currently only supports nn.Linear layers, not other types or
quantized linear layers like bnb.
2024-02-27 12:02:11 +01:00
90aa2c1e05 ENH: [Docker] Notify us when docker build pass or fail (#1503)
* Update build_docker_images.yml

* Update build_docker_images.yml

* Update build_docker_images.yml

* Update build_docker_images.yml
2024-02-27 10:12:39 +01:00
01732176e0 FIX Safe merging with LoHa and LoKr (#1505)
There was a small bug when merging the LoHa and LoKr tuners with
safe_merge=True due to a missing clone call. This is now fixed.

Furthermore, the test coverage for merging with LoHa and LoKr has been
extended, as there were a few tests where these methods were excluded
unnecessarily.
2024-02-26 10:37:36 +01:00
aa2ca83ca7 add example and update deepspeed/FSDP docs (#1489)
* add example and update deepspeed docs

* fixes

* fixes and update FSDP docs

* fixes and addressing commentsa

* fixes

* resolve comments

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address comments

* Update fsdp.md

* Update docs/source/accelerate/fsdp.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* addressing comments

* address comments

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-02-26 11:05:27 +05:30
1b3b7b5b2a FIX Bug in prompt learning after disabling adapter (#1502)
There was a big that after using the disable_adapter context, the
prepare method was not correctly restored, meaning that generations were
incorrect once the context was exited. This is now fixed.
2024-02-23 10:54:10 +01:00
bc9426f10b Add default LoRA and IA3 target modules for Gemma (#1499) 2024-02-22 08:18:20 +01:00
3967fcc8ea Allow trust_remote_code for tokenizers when loading AutoPeftModels (#1477)
* feat: Allow tokenizer remote code when loading AutoPeftModels

* style: Merge arguments into one line
2024-02-22 05:08:19 +01:00
23213cad8d AQLM support for LoRA (#1476)
* aqlm

* Style and copied tests

* aqlm import guadr

* docs

* correct model in tests

* Update docs/source/developer_guides/quantization.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Update docs/source/developer_guides/quantization.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* moved aqlm install and added >=

* Removed `quant_linear_module`

* AqlmLoraLinear

* docs update

* transformers version check

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-02-22 02:31:04 +01:00
2efc36ccdf Raise error on wrong type for to modules_to_save (#1496)
Resolves #1492

This PR is for user convenience. When they try to pass the wrong type to
modules_to_save, they will now get an early error message, instead of
getting an obscure error when calling forward later.

Note:

The reason why ModulesToSaveWrapper cannot support ModuleDict et al. is
because it tries to call forward on the original or copied module, but
these modules don't implement a forward method.
2024-02-21 13:18:36 +01:00
KUN
cc27cfd478 covert SVDLinear dtype (#1495) 2024-02-21 09:54:34 +01:00
b74c2f644d Update peft_bnb_whisper_large_v2_training.ipynb: Fix a typo (#1494) 2024-02-21 09:41:56 +01:00
470b66c639 FIX: [CI / Adaptation Prompt] Fix CI on transformers main (#1493)
* fix CI on transformers main

* better fix
2024-02-21 07:14:18 +01:00
f81147268e FIX Correctly unload double wrapped modules (#1490)
Resolves #1485, but note that some additional solutions are mentioned in
thet issue.

This checks that when unloading a PEFT model, if the
ModulesToSaveWrapper contains a tuner module, it is correctly unloaded.
The unloaded model should not have PEFT layers at the end.
2024-02-20 15:12:34 +01:00
37dd675f91 ENH: [CI / Docker]: Create a workflow to temporarly build docker images in case dockerfiles are modified (#1481)
* test workflow

* Update Dockerfile

* build docker images

* Update .github/workflows/test-docker-build.yml

* Update .github/workflows/test-docker-build.yml

Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>

---------

Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>
2024-02-20 03:55:53 +01:00
7b7e4b2194 Better respect result dtype in LoRA layers (#1010) 2024-02-19 15:33:49 +01:00
043d5c0bd6 FIX: Multitask prompt tuning with other tuning init (#1144)
Resolves #1082.

Also, adding tests for prompt_tuning_init != RANDOM.

---------

Co-authored-by: Mayank Mishra <32954280+mayank31398@users.noreply.github.com>
2024-02-19 13:53:39 +01:00
8a0dce2fb9 FIX [PromptTuning] Simple fix for transformers >= 4.38 (#1484)
* fix for transformers >= 4.38

* style
2024-02-19 13:33:24 +01:00
ede3c7df22 FIX [CI / bnb] Fix failing bnb workflow (#1480)
* Update nightly-bnb.yml

* Update nightly-bnb.yml

* Update nightly-bnb.yml
2024-02-19 12:41:35 +01:00
cf467d8aa0 add paths to run tests only when relevant files are modified (#1482) 2024-02-19 12:24:15 +01:00
47c4d9578c Add pre-commit configuration (#1467) 2024-02-19 12:11:01 +01:00
65513e5db4 FEAT: add awq suppot in PEFT (#1399)
* add awq suppot in PEFT

* fix

* fux

* Update src/peft/tuners/lora/awq.py

* style & fix tests

* forward contrib credits from PR14084

* forward contrib credits from autoawq PR

* change name

* fix

* change to peft internal testing

* fix

* fix

* add multi-GPU tests

* add to dockerfile

* fix todo

* raise error only at the dispatch level

* quality

* fix test

* fix dockerfile

* fix

* fix

* update dockerfile and tests

---------

Co-authored-by: s4rduk4r <s4rduk4r@users.noreply.github.com>
2024-02-19 01:31:21 +01:00
f5a95930c2 Update docstring at peft_types.py (#1475)
* Update docstring at peft_types.py

Docstring shows a missing underscore at TaskType "Causal LM"

* Update peft_types.py

Using correct capital letters
2024-02-18 14:43:58 +01:00
963e3128ed [CI] Fix adaptation prompt CI on transformers main (#1465)
* fix adaptation prompt CI

* add fix

* forward contrib credits from discussion

* add docstring

---------

Co-authored-by: BenjaminBossan <BenjaminBossan@users.noreply.github.com>
2024-02-18 14:38:24 +01:00
8db74d42c4 TST Make tests more work with MPS (#1463) 2024-02-16 12:16:49 +01:00
a564779b67 Add files via upload (#1471) 2024-02-16 11:35:14 +05:30
cde8f1af2b [docs] Model merging (#1423)
* content

* code snippets

* api reference

* update

* feedback

* feedback
2024-02-15 08:13:54 -08:00
25dec602f3 add magnitude_prune merging method (#1466)
* add `magnitude_prune` merging method

* Update model.py

* 😅
2024-02-15 17:59:39 +05:30
83de1af281 Add default IA3 target modules for Mixtral (#1376)
* Add default LoRA target modules for Mixtral

* Add IA3 modules for Mixtral

* Address comments
2024-02-15 04:15:45 +01:00
5f2084698b TST Use plain asserts in tests (#1448)
Use pytest style asserts instead of unittest methods.

Use `pytest.raises` and `pytest.warns` where suitable.
2024-02-14 16:43:47 +01:00
e95dc1360b [CI] Add CI tests on transformers main to catch early bugs (#1461)
* add new test

* Update .github/workflows/tests-main.yml
2024-02-14 10:41:42 +01:00
234774345b fix llama rotary embedding issue (#1459) 2024-02-13 12:26:07 +01:00
7d0c0a33d3 [core / get_peft_state_dict] Ignore all exceptions to avoid unexpected errors (#1458)
* ignore all exceptions

* Update src/peft/utils/other.py

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2024-02-13 12:25:56 +01:00
7716dd86e9 [docs] Docstring typo (#1455)
* fix typo

* fix
2024-02-12 09:31:15 -08:00
60ec4d8502 remove iframe embed (#1456) 2024-02-12 09:28:33 -08:00
6e953810af FIX Honor HF_HUB_OFFLINE mode if set by user (#1454)
Resolves #1452

If users enable offline mode, don't perform checks for files on HF Hub,
as they would fail.
2024-02-12 19:11:35 +05:30
a1c472f08f Support modules_to_save config option when using DeepSpeed ZeRO-3 with ZeRO init enabled. (#1450)
* Update other.py

* Update other.py

* fix quality

* Update other.py
2024-02-09 18:35:33 +05:30
055e4dbe1e FIX Loading with AutoPeftModel.from_pretrained (#1449)
Fixes #1430

When Using AutoPeftModel.from_pretrained, there is a check to see if a
tokenizer can be found. This check will include a search for the
tokenizer on HF Hub. However, when the model is stored locally, the path
may not be a valid HF Hub repo ID. In that case, an error is raised by
huggingface_hub.

This PR consists of catching that error, and assuming that if the error
occurs, the tokenizer does not exist. This resolves the issue.
2024-02-09 11:00:03 +01:00
c1a83fd692 Add new merging methods (#1364)
* add code

* update docstring

* quality

* fix test

* fix test

* fix svd embedding layer merging

* fixes

* fixes

* Update model.py

* Add test and example

* quality

* fix tests

* update the example

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* address comments

* address comments and add co-authors

Co-Authored-By: Prateek Yadav <15224633+prateeky2806@users.noreply.github.com>
Co-Authored-By: Yu Le <55241218+yule-buaa@users.noreply.github.com>
Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* quality

* Update merge_utils.py

* revert

* address comments

* address comment

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Prateek Yadav <15224633+prateeky2806@users.noreply.github.com>
Co-authored-by: Yu Le <55241218+yule-buaa@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-02-09 12:10:04 +05:30
7da7f85188 DOC How to freeze adapter after set_adapter call (#1447) 2024-02-08 14:39:46 +01:00
eba459553c [docs] IA3 (#1373)
* first draft

* add to toctree

* feedback

* feedback
2024-02-07 11:49:09 -08:00
9bb83ed1a5 [docs] Lora-like guides (#1371)
* loras

* review

* fix

* feedback

* feedback
2024-02-07 11:22:07 -08:00
b5492db514 Update Dockerfile to reflect how to compile bnb from source (#1437)
* Update Dockerfile

* Update Dockerfile

* Update build_docker_images.yml

* Update build_docker_images.yml

* Update Dockerfile

* Update Dockerfile

* add cmake to dockerfile

* use pip install instead

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* final fix

* Update .github/workflows/build_docker_images.yml
2024-02-07 18:53:10 +01:00
ddf90a8b2f TST Improve test coverage by skipping fewer tests (#1445)
Many of the common tests are skipped because of lines such as:

if config_cls not in (LoraConfig, IA3Config):
    return

These lines were often added before we had more PEFT methods like OFT,
LoHa, etc. However, these new methods should also pass the common tests.
Therefore, I relaxed many of these conditions so that they would not
skip the new methods.

Note:

There were a handful of test cases that failed. I added TODO comments
for those, as it was unclear to me why they failed. As investigating
this could take some time, I chose not to fix those cases in this PR.
2024-02-07 14:59:48 +01:00
97f3ed577e MNT Check only selected directories with ruff (#1446)
In PR #1421, ruff was extended to check all directories. This is fine
for those directories that come with PEFT. However, developers may have
other local directories that they do not want to be checked. Therefore,
it is better to list the directories to be checked rather than checking
all.
2024-02-07 14:00:13 +01:00
17273aa4bf [Docs] call set_adapters() after add_weighted_adapter (#1444) 2024-02-07 12:59:49 +01:00
fc78a2491e MNT Move code quality fully to ruff (#1421) 2024-02-07 12:52:35 +01:00
497bbeafbd [core/TPLinear] Fix breaking change (#1439)
* fix breaking change

* add comment

* add todo
2024-02-06 19:31:35 +01:00
21d8d467dc [docs] Doc maintenance (#1394)
* improvements

* fix name

* feedback

* fix typos

* feedback
2024-02-06 09:01:05 -08:00
e805a3173a [docs] README update (#1411)
* update

* feedback

* feedback
2024-02-06 08:58:42 -08:00
9350ab8a9d FIX Saving models that don't have _name_or_path in config (#1440) 2024-02-06 16:33:15 +01:00
912ad41e96 Fix typos (#1435) 2024-02-06 01:54:06 +01:00
ce925d844a Update bnb.py (#1425) 2024-02-02 08:30:50 +01:00
d1be2696fd poly api (#1422) 2024-02-01 08:47:49 -08:00
4da2876a10 Release patch version 0.8.2 (#1428) 2024-02-01 19:42:43 +05:30
a30e006bb2 fix critical bug in diffusers (#1427) 2024-02-01 13:21:29 +01:00
fff24008eb FIX: Make merging of adapter weights idempotent (#1355)
* FIX Make merging of adapter weights idempotent

Right now, merging of adapters weights such as LoRA and IA³ is not
idempotent. This means that if a user calls merge multiple times, the
resulting weights will be different each time because the delta weights
are added again and again.

This fix checks that only those adapters are merged that are not yet
merged. Also, gives a more precise warning:

- Say when there is nothing to merge.
- If there are some adapters to merge, only mention those

This bug is more subtle than it may seem at first, since we sometimes
merge implicitly without the user necessarily being aware of it (e.g.
when calling merge_and_unload). Therefore, this bug can occur quite
easily, even if the user does not explicitly call merge twice in a row.

* Make style
2024-01-31 07:52:04 +01:00
dffde4537f fix: subfolder existence check (#1417) 2024-01-31 11:38:13 +05:30
9d943672c0 Add positional args to PeftModelForCausalLM.generate (#1393)
* add positional args

* update tests
2024-01-30 17:02:39 +05:30
1a7f3e3478 Update custom_models.md (#1409)
In the example of timm models, there is a misuse of MLP( ), which should be model.

Co-authored-by: boyufan24 <buaafby@126.com>
2024-01-30 17:01:20 +05:30
68b90a14d7 Add IA3 Modules for Phi (#1407)
* Add IA3 Modules for Phi

* Address comments
2024-01-30 17:00:40 +05:30
75e4ef3536 Release v0.8.2.dev0 (#1416) 2024-01-30 16:41:52 +05:30
5e4aa7eb92 Patch Release v0.8.1 (#1415) 2024-01-30 16:11:09 +05:30
5eb5b492af Fix breaking change (#1414)
* fix

* Update src/peft/utils/save_and_load.py

* Update src/peft/utils/save_and_load.py
2024-01-30 10:43:47 +01:00
a2d96d097a Release 0.8.1.dev0 (#1412) 2024-01-30 13:51:27 +05:30
30889ef260 Release: v0.8.0 (#1406) 2024-01-30 11:17:42 +05:30
67918efb49 Fix LoftQ docs (#1408) 2024-01-30 10:09:30 +05:30
189a9a666d add peft type constructor (#1398) 2024-01-29 11:55:01 +05:30
bfc102c0c0 [docs] Task guides (#1332)
* soft prompt guides

* small edits

* feedback

* feedback
2024-01-27 13:39:20 +05:30
1c1c7fdaa6 Fix LoRA module mapping for Phi models (#1375) 2024-01-24 19:24:38 +01:00
4a15595822 Improve documentation for the all-linear flag (#1357)
* added docs for all-linear

* added doc in quantization section

* added doc in lora section

* minor edit

* minor edit
2024-01-22 15:47:45 +01:00
bb2471d926 save the embeddings even when they aren't targetted but resized (#1383) 2024-01-22 20:16:42 +05:30
54ca31153d add mixtral in mapping (#1380) 2024-01-22 09:22:34 +01:00
ebbff4023a account for the new merged/unmerged weight to perform the quantization again (#1370) 2024-01-18 15:39:09 +01:00
62237dc9b1 Handle resizing of embedding layers for AutoPeftModel (#1367)
* handle resizing of embedding layers for AutoPeftModel

* fixes

* add test
2024-01-17 21:02:16 +05:30
eaa5eef28e Added missing getattr methods for mixed model (#1365) 2024-01-17 19:55:49 +05:30
bf54136a79 [docs] Docstring link (#1356)
* fix format

* hmm
2024-01-12 09:00:08 -08:00
a43ec59762 FEAT Add Poly Adapter (#1129)
Implement the Poly (Polytropon) adapter.

Papers:

- https://arxiv.org/abs/2202.13914
- https://arxiv.org/abs/2211.03831

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-01-12 17:19:12 +01:00
0089ebd272 DOC Add PeftMixedModel to API docs (#1354) 2024-01-12 17:29:52 +05:30
fe01d6de85 [Docs] make add_weighted_adapter example clear in the docs. (#1353)
* make add_weighted_adapter example clear in the docs.

* Apply suggestions from code review
2024-01-12 17:25:30 +05:30
f9b673ea37 DOC Extending the vocab and storing embeddings (#1335)
Resolves #1300

Sourab added the feature to store the embedding layers alongside the
adapter in #1147. This PR adds an entry to the documentation to explain
the new feature.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2024-01-12 12:38:49 +01:00
dc28a61e82 FIX Setting active adapter for quantized layers (#1347)
Resolves #1345

See also #1294 for a similar (but incomplete) fix.

This commit fixes the setting of the adapter name on a couple of
quantized layers that was accidentally removed in #1106. This affects
users who use a non-default adapter name when they want to train these
layers.

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2024-01-12 11:55:46 +01:00
71585d611f New transformers caching ETA now v4.38 (#1348)
See #1252 and #1352 for more context.

The initial idea was for transformers 4.37 to add the new caching to all
architectures, but this was postponed to 4.38. The code needs to be
adapted for prompt tuning not to break when transformers 4.37 is
released.
2024-01-12 11:54:53 +01:00
c6bcf91ca1 QOL improvements and doc updates (#1318)
* improve docs and add small utils

* quality

* fix typo

* updates

* quality

* Update src/peft/utils/other.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address comments

* quality

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-01-12 16:18:55 +05:30
4354a7d496 fix prepare_inputs_for_generation logic for Prompt Learning methods (#1352)
* fix `prepare_inputs_for_generation` logic for Prompt Learning methods

* 😅
2024-01-12 16:18:42 +05:30
f36f50acb4 DOC: Update docstring for the config classes (#1343)
* DOC Update docstring for the config classes

Over time, the docstrings of the numerous config classes have not kept
up to date with changes in the code. This PR updates the docstrings to
reflect the current state of the code.

On top of that, multiple small updates have been made:

- Correct wrong or imprecise type annotations.
- More neutral wording of the docstring. E.g., say "adapter" instead of
  "LoRA". This makes it easier to copy&paste the docstrings between
  classes.
- Use same wording for shared arguments.
- Add missing arguments.
- Uniform formatting: Always a line break after the first line of the
  docstring (not mixed, as that can be confusing).
- Fix line lengths to be consistently at 120 characters.
2024-01-12 11:29:39 +01:00
777c0b6ad7 DOC AdaLoraModel.update_and_allocate usage (#1341)
Clarify that this method needs to be called explicitly.
2024-01-11 14:52:14 +01:00
6451cbd70c Fix logic in target module finding (#1263)
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-01-10 15:08:00 +01:00
7d28536b18 DOC Correct help for CLI args in script (#1338) 2024-01-10 11:44:25 +01:00
eb2c12d99a ENH Add attribute to show targeted module names (#1330)
This is just a tiny convenience feature to help users understand
which modules are being targeted by the adapter. This can be useful
to quickly check if a complex regex works for `target_modules`.

Note: This should work for all adapters that use BaseTuner, so not only
LoRA but also IA³, LoHa, etc. Only the first two were tested but that
should be enough.
2024-01-10 11:38:40 +01:00
c6b28a22b8 DOC Troubleshooting for unscaling error with fp16 (#1336)
Some users ran into the issue of trying to use a model loaded in float16
with mixed precision, e.g. these issues: #341, #1249. This PR documents
a workaround to solve the issue.

I also added tests that demonstrate the issue, as well as the
workaround.

Notes

This is not strictly a PEFT issue, but more a general error when using
AMP with float16. Still, since PEFT users encounter this sometimes, it
is useful to document it.

When we discussed this issue in the past, I think we concluded that it's
not as straightforward as PEFT automatically casting the weights to
float32, though I cannot remember anymore what the drawbacks were.

In any case, should we ever add an automatic solution for this in PEFT,
the added test should fail, which alerts us to the fact that we need to
update the documentation.
2024-01-10 12:08:23 +05:30
e96eef9ea1 FIX Don't load tokenizer when unnecessary (#1333)
When loading prompt tuning for inference, it is not necessary to load
the tokenizer.
2024-01-09 17:28:57 +01:00
54ee2fb1af Refactor dispatching logic of LoRA layers (#1319)
This PR's goal is to simplify the logic for deciding which LoRA layer
backend is being used when LoRA is applied to a target layer.

Originally, this refactor was done in #1286 which was about adding the
"fast" backend for LoRA, but since that PR was closed, I moved the
refactor to this dedicated PR.

Motivation

Right, now, the LoraModel._create_new_module method has become quite
complex and hard to read, spanning >100 lines:

8665e2b571/src/peft/tuners/lora/model.py (L235-L339)

The reason for this is that method contains the logic for deciding which
LoRA layer backend to use for all the different types of LoRA layers
that we have, i.e. normal Linear layer, Conv2d layer, bnb layer, gptq,
etc.

Description

To remedy this, I moved the logic for deciding which layer to match to
the respective implementation of the layers. For example, in
lora/layer.py, there is now a function called dispatch_default, whose
responsibility it is to decide if an Embedding layer, Conv2d layer or
Linear layer is the right match. Similarly, in lora/bnb.py, there are
now the two functions dispatch_bnb_8bit and dispatch_bnb_4bit to decide
what/if any bnb 8bit or 4bit layer should be matched.

This way, the logic to decide what layer to match now resides next to
the respective layers. The only thing that LoraModel now needs to do is
to collect all the dispatching methods and use the first layer that
matches.

Note that only LoRA was modified, the other tuners don't have different
backends and thus this approach was not necessary for them. The only
exception is IA³, which has the normal and bnb backend. Since those are
only 2, it's not as complicated as for LoRA, but if this PR is accepted,
I can refactor IA³ in a similar fashion.

Other changes

- Removed the optional_kwargs argument from _create_and_replace, as it
  was an unnecessary indirection.
- Removed the bias argument from kwargs, as it was not used.

Backwards compatibility

This should be fully backwards compatible, as the constructed LoRA model
is 100% the same. If there are users that override _create_new_module,
their code will probably break, but since this is a private method, we
should be fine.
2024-01-09 12:18:31 +01:00
cbd783b4df Add an option 'ALL' to include all linear layers as target modules (#1295)
* added helper function to get list of all linear layers; added tests and updated documentation

* added bnb tests

* fixed issues with t5

* style issues

* improved lora and ia3 docs

* fixed code to work for any output embedding layer name

* style changes

* added a test for a base model without lm head

* added comments

* address review comments

* update tests

* update tests

* minor simplification

* changed argument to all_linear

* minor fix to configs

* minor edit

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address review comments

* added test for diffusion models

* minor edits to configs

* spelling correction

* Update tests/test_tuners_utils.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Update src/peft/tuners/tuners_utils.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Update src/peft/tuners/tuners_utils.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address review comments

* revert back to older decorator order

* style changes

* simplify logic for bnb layers

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-01-09 16:19:58 +05:30
26504a0119 Extend merge_and_unload to offloaded models (#1190)
* activated pre-forward

* activated pre-forward hook

* activated pre-forward hook

* activated pre-forward hook

* debugged hook call

* added explicit forwards

* debugged

* debugged

* fixed pre-forward hook call

* fixed pre-forward hook call

* debugged module iteration

* fixed post forward args

* added conditional attr check

* fixed conditional attr check

* memory overflow debug

* memory overflow debug

* added mem trace

* added mem trace

* more memory traces

* debug memory leak

* debug memory leak

* removed replace

* removed device assign during replacement

* no grad during replacement

* new module hook

* to cpu

* to cpu

* removed replace module

* conditional on replace module

* removed traces

* make style

* added back replace_module

* added test and make style

* inline key, module

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* fixed test and make style

* reverted _unload_and_optionally_merge and moved test

* match main

* make style

* reverted model.py

* make style

* reverted merge

* fetched model.py from head

* added onload

* debug

* removed replace module

* removed replace module

* pre forward on target and parent

* removed _replace_module

* reverted

* debugged

* debugged

* traced adapters

* debugged

* added trace on adapter names

* onloaded target

* further traces

* further traces

* further traces

* further traces

* further traces

* onloaded adapters

* onload module

* onload module

* onload module

* debugged

* debugged

* debugged

* removed delta weight onload

* revamped delta weight onload

* revamped delta weight onload

* removed replace module

* added parent and target act

* debugged

* debugged

* added traces

* added traces

* added traces

* init hook

* init hook

* traces

* traces

* specd weights map

* removed traces and offload check

* post forwards on lora

* added post forward for target and parent

* added trace

* removed traces and tp post forwards

* added onloads and offloads to embedding and conv2d

* updated test

* make style

* debugged and make style

* refactored and make style

* cleaned

* refactored and make style

* cleaned

* cleaned

* make style

* make style

* disk offload compatibility

* refactored linear onload via contextmanager

* refactored onloads

* debugged

* tempfile to tempfolder

* changed disk offload to original directory

* refactored for general tuners

* debugged

* explicit base layer

* added base traces

* more traces

* debugged;

* reverted lora layer.py

* removed traces and make style

* cleaned

* removed todo

* fixed test and cleaned

* added suggestions and make style

* onload for unmerge and merge_and_unload

* improved docstring

* onload target only and make style

* Update src/peft/tuners/tuners_utils.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* revised descriptions

* make style

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-01-09 06:31:30 +01:00
4186c9b104 FIX Use torch.long instead of torch.int in LoftQ for PyTorch versions <2.x (#1320)
Solves #1307

For PyTorch < v2.x, using torch.int does not work for indexing, thus
using torch.long.
2024-01-08 10:45:12 +01:00
8665e2b571 fix diffusers tests (#1317)
* fix diffusers tests

* quality
2024-01-03 20:05:06 +05:30
cbf346d962 fix the embedding saving for adaption prompt (#1314)
* fix the embedding saving for adaption prompt

* fix

* automate setting `save_embedding_layers` when embedding layer is resized during finetuning

* fix

* address comment

Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* oops

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-01-03 15:22:26 +05:30
2a0fb71f4f Mistral IA3 config defaults. (#1316) 2024-01-03 01:59:31 +05:30
c4cf9e7d3b FIX Set active adapter in bnb lora layers init (#1294)
Was accidentally removed in #1106
2024-01-02 13:02:42 +01:00
cf04d0353f [BNB] fix dockerfile for single gpu (#1305) 2023-12-27 15:41:33 +01:00
4023da904f fix fsdp auto wrap policy (#1302)
* fix fsdp policy

* fix fsdp

* revert

* refactor to be inline with Accelerate
2023-12-27 14:43:27 +05:30
6fe1aac65d [BNB] Fix bnb dockerfile for latest version (#1291)
* fix docker

* fix

* Update .github/workflows/nightly-bnb.yml

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-26 14:28:39 +01:00
799420aef1 Update nightly-bnb.yml (#1287) 2023-12-22 17:59:26 +01:00
993836ff90 DOC Improve target modules description (#1290)
For LoRA and IA³, it is allowed to not specify a target module, in which
case the correct layers are derived from the model architecture. This
was not documented so far.
2023-12-21 17:00:09 +01:00
1c9679ac71 [docs] Concept guides (#1269)
* concept-docs

* mpt and llama-adapter

* review

* feedback

* toctree

* Update docs/source/conceptual_guides/adapter.md

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-20 10:56:02 -08:00
e745ffd7d0 FIX Errors in StableDiffusion adapter conversion script (#1281) 2023-12-20 12:00:05 +01:00
029dcd5a1c [bnb] Add bnb nightly workflow (#1282)
* add bnb nightly workflow

* add matrix strategy

* temp

* oops

* temp

* oops

* nit

* fixes

* up

* up

* up

* add pytest cov

* up

* oops

* put correct dir

* fix

* fix dir in makefile + failing test

* revert

* Update .github/workflows/nightly.yml

* Update nightly-bnb.yml

* Update log_reports.py

* Update Makefile

* Update .github/workflows/nightly-bnb.yml

* Update .github/workflows/nightly-bnb.yml

* Update .github/workflows/nightly.yml

* Update nightly.yml

* Update .github/workflows/nightly-bnb.yml

* Update nightly-bnb.yml
2023-12-20 10:49:13 +01:00
482a2a6d9a TST Enable LoftQ 8bit tests (#1279)
Due to PR #1276, the bug that prevented use of LoftQ with 8bit
quantization has now been fixed. Therefore, the tests no longer need to
be skipped.
2023-12-18 17:29:33 +01:00
119de1715c [Tests] Add bitsandbytes installed from source on new docker images (#1275)
* add bnb from source dockerfile

* Update build_docker_images.yml

* Update build_docker_images.yml

* minor refactor
2023-12-18 15:15:43 +01:00
a0a46c06db Refactor and a couple of fixes for adapter layer updates (#1268)
* Refactor: Move LoRA update_layer to child classes

For LoRA, so far, we have update_layer for Linear,
update_layer_embedding for Embedding, and update_layer_conv2d for
Conv2d, all defined on LoraLayer.

We can simplify the code by always using the name update_layer, and by
moving the layer-specific methods to the subclasses. So e.g.
update_layer_embedding is moved to the Embedding class and renamed to
update_layer. This way, the caller does not need to differentiate which
type of layer it's calling.

Interestingly, this was already practiced for IA³, so the same change
was not necessary there. But I did find the same method implemented
twice, once on IA3Layer and once on Linear, so I removed one of the
duplicates

* Systematic handling of r (rank) <= 0

Always raise an error when r <= 0, not only for LoRA. Also, removed
later check for r > 0 in LoRA layers, since we already check for r <= 0.

* Fix broken __repr__ method on QuantLinear

Was indented too deep, thus not being applied.

* Fix bug for updating Lora GPTQ and IA3 bnb layers

Before this fix, when adding a 2nd adapter to a model, we did not
correctly check if there was already an adapter layer in the model when
dealing with LoRA GPTQ or IA3 bnb layers. As a consequence, instead of
updating the existing layers, we would create a new layer and the
existing layer would be set as the base_layer of that new layer. Now, we
correctly update the existing layer to add the new adapter.

Note that for this fix to work correctly with LoRA and GPTQ, I had to
add a check for qweight, since we only checked for weight before.

Tests were added to check this. They fail with the current main but are
fixed with this PR.

* Don't match AdaLoraLayer when updating LoraLayers

AdaLoraLayer is a subclass of LoraLayer, so just checking for
isinstance(target, LoraLayer) will match AdaLoraLayer, which we don't
want when it comes to updating a LoraLayer. Now, we explicitly check
that the layer is *not* an instance of AdaLoraLayer.
2023-12-18 10:59:17 +01:00
3708793ba9 TST Extend LoftQ tests to check CPU initialization (#1274)
Tests to complement PR #1256
2023-12-18 10:37:48 +01:00
46a84bd395 LoftQ: edit README.md and example files (#1276)
* fix when num_bits == 2 or 8

* try 13b
2023-12-17 15:21:25 +01:00
bd544bb2ce LoftQ: Allow quantizing models on CPU (#1256) 2023-12-15 16:43:33 +01:00
55c37e9c0b feat: add apple silicon GPU acceleration (#1217)
* feat: add apple silicon GPU acceleration

* Fix device compatibility issue in
load_peft_weights function

* Update save_and_load.py

* Update save_and_load.py

* Update save_and_load.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/peft/utils/save_and_load.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Fix string formatting in image_classification_timm_peft_lora.ipynb and multilayer_perceptron_lora.ipynb

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-12-15 13:05:06 +01:00
997e6ec5ab ENH Rank-stabilized LoRA scaling option (#1244)
Add option to scale LoRA weights by alpha/sqrt(r) by passing
LoraConfig(..., use_rslora=True).

https://doi.org/10.48550/arXiv.2312.03732
2023-12-15 12:16:59 +01:00
ddb114af0a remove a duplicated description (#1271)
remove duplicated description for _check_target_module_exists in BaseTuner class
2023-12-15 11:04:29 +01:00
4b02148af2 TST Revert device_map for AdaLora 4bit GPU test (#1266)
This was recently added in #1242 but fails on CI with single GPU.
2023-12-14 11:41:31 +01:00
0f1e9091cc Fix ModulesToSaveWrapper __getattr__ (#1238)
* Update other.py

* Update other.py

* Update test_low_level_api.py
2023-12-13 12:52:56 +01:00
88e2e75cc3 FIX Error in log_reports.py (#1261)
Silly mistake...
2023-12-13 10:50:05 +01:00
c9df262d69 Bump version to 0.7.2.dev0 post release (#1258) 2023-12-12 18:30:41 +01:00
67a08009ff Release: 0.7.1 (#1257)
Also fix some more seeds to prevent flakiness
2023-12-12 17:53:36 +01:00
971dd6e815 Fix: Multiple adapters with bnb layers (#1243)
Resolves #1239

Fixes a bug that led to an error when loading multiple adapters into a
peft model that uses bnb layers.

Also: Fix for loading 2nd adapter with AutoGPTQ
2023-12-12 15:34:45 +01:00
ee6f6dcee7 FIX Issues with transformers 4.36 (#1252)
Adjust for different type of past_key_values when using caching.

Also: Fix some seeds for flaky tests.

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2023-12-12 15:16:00 +01:00
21c304f6f6 FIX Truncate slack message to not exceed 3000 char (#1251)
Should fix the issue of not receiving slack notifications because the
message is too long, see:

https://github.com/huggingface/peft/actions/runs/7148379741/job/19469273483

Currently, we get:

> Error: ver responded with: {'ok': False, 'error': 'invalid_blocks', 'errors': ['failed to match all allowed schemas [json-pointer:/blocks/1/text]', 'must be less than 3001 characters [json-pointer:/blocks/1/text/text]'], 'response_metadata': {'messages': ['[ERROR] failed to match all allowed schemas [json-pointer:/blocks/1/text]', '[ERROR] must be less than 3001 characters [json-pointer:/blocks/1/text/text]']}}

Fixing the error should also lead to a shorter message, but we should
ensure that even if the message is too long, we still get it.
2023-12-12 11:05:48 +01:00
e73967edea [docs] Quantization (#1236)
* first draft

* feedback

* update api doc

* feedback
2023-12-11 08:48:06 -08:00
b08e6faf2b TST: Add tests for 4bit LoftQ (#1208)
Add GPU tests for LoftQ with 4bit quantization.

Notes

Tests for 8bit quantization are already there but not run at the moment,
see this comment:

https://github.com/huggingface/peft/pull/1150#issuecomment-1838891499

In my testing, 8bit passes when using NFQuantizer, so if the original
author is fine with using that, I can make the adjustment.

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-11 15:34:36 +01:00
5c13ea3b12 FIX Use model argument consistently (#1198) (#1205)
Some methods were using model and self.model interchangeably. This was
fine, as they were referring to the same object, but is also confusing.
Now model is used consistently.
2023-12-11 12:35:28 +01:00
00b820061e Revert "FIX Pin bitsandbytes to <0.41.3 temporarily (#1234)" (#1250)
This reverts commit 86562eec49bede2f4525be343f642af8fb46ddbc.
2023-12-11 12:11:18 +01:00
504d3c8329 [docs] PEFT integrations (#1224)
* rough draft

* remove

* feedback

* fix image links and doc references

* resolve links manually

* use internal link
2023-12-08 13:01:37 -08:00
fc9f4b3176 Bnb integration test tweaks (#1242)
* allow bitsandbytes integration test selection

* fix typo: mutli -> multi

* enable tests to run on >2 GPUs

* fix for >3 GPUs, due to artidoro/qlora #186

* fix formatting
2023-12-08 13:20:13 +01:00
895513c465 TST: Add tolerance for regression tests (#1241)
Tests currently call torch.allclose without any tolerance, which is
probably the cause of the CI failure. Now, tolerance is set to 1e-4.
2023-12-08 11:50:48 +01:00
c893394808 [docs] PeftConfig and PeftModel (#1211)
* rough draft

* feedback

* feedback
2023-12-07 14:22:26 -08:00
86562eec49 FIX Pin bitsandbytes to <0.41.3 temporarily (#1234)
Some tests are failing with bitsandbytes 0.41.3:

python -m pytest -m single_gpu_tests tests/test_common_gpu.py -k
test_4bit_merge

For the time being, use the next smaller version.
2023-12-07 16:46:15 +01:00
b467e3de5c Lazy import of bitsandbytes (#1230)
Previously, we imported from bitsandbytes eagerly if the package was
installed. This caused two major issues:

- Slow loading time of PEFT (~4 sec)
- Errors with multiprocessing because bnb initializes CUDA

This commit fixes both issues by importing bitsandbytes lazily. PEFT
import time is now reduced to ~2sec.

Notes

Implementation-wise, I use a combination of local imports and
module-level __getattr__. The latter was introduced in Python 3.7 and
should therefore be safe to use.
2023-12-07 16:39:08 +01:00
2ab005f3ab TST Run regression test in nightly test runner (#1233)
Follow up to #1115
2023-12-07 15:11:40 +01:00
b482391b80 Don't set config attribute on custom models (#1200)
Initially, we had the issue that it was sometimes assumed that models
had a config attribute, as is given for transformers models. This made
PEFT fail with custom models, so we made a change to set a dummy config
on those.

However, this can lead to issues down the line. For example, when users
use the Trainer class from transformers, they can stumble upon lines
like this:

62ab32b299/src/transformers/integrations/integration_utils.py (L636-L637)

62ab32b299/src/transformers/integrations/integration_utils.py (L729-L730)

Here transformers assumes that if config attribute exists on the model,
it must have a to_json_string method or a to_dict method (as it assumes
the config to be a PretrainedConfig instance). Therefore, in order not
to trip up transformers, it is best not to set any config at all.

Alternative

Alternatively, transformers could be changed to check each time when the
config attributes exists, if it is a PretrainedConfig instance, but that
would be a much larger change (albeit a cleaner one).
2023-12-07 10:56:21 +01:00
d56df7fc64 Bump version to 0.7.1.dev0 post release (#1227)
Also updated the release instruction for installing from pypi, as the
previous command seems to be causing trouble recently (see internal
discussion).
2023-12-06 19:04:13 +01:00
a87ff4c744 [docs] OFT API docs (#1221) 2023-12-06 16:26:21 +01:00
2665f80a17 Release: 0.7.0 (#1214)
In preparation for the 0.7.0 release. Also remove obsolete TODO
comments.
2023-12-06 15:11:00 +01:00
9fd788bedb TST: Add regression tests 2 (#1115)
Description

In general, for regression tests, we need two steps:

1. Creating the regression artifacts, in this case the adapter
   checkpoint and the expected output of the model.
2. Running the regression tests, i.e. loading the adapter and checking
   that the output of the model is the same as the expected output.

My approach is to re-use as much code as possible between those two
steps. Therefore, the same test script can be used for both, with only
an environment variable to distinguish between the two. Step 1 is
invoked by calling:

`REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py`

and to run the second step, we call:

`pytest tests/regression/test_regression.py`

Creating regression artifacts

The first step will create an adapter checkpoint and an output for the
given PEFT version and test setting in a new directory. E.g. it will
create a directory `tests/regression/lora_opt-125m_bnb_4bit/0.5.0/` that
contains adapter_model.bin and output.pt.

Before this step runs, there is a check that the git repo is clean (no
dirty worktree) and that the commit is tagged (i.e. corresponds to a
release version of PEFT). Otherwise, we may accidentally create
regression artifacts that do not correspond to any PEFT release.

The easiest way to get such a clean state (say, for PEFT v0.5.0) is by
checking out a tagged commit, e.g:

`git checkout v0.5.0`

before running the first step.

The first step will also skip the creation of regression artifacts if
they already exist.

It is possible to circumvent all the aforementioned checks by setting
the environment variable `REGRESSION_FORCE_MODE` to True like so:

`REGRESSION_FORCE_MODE=True REGRESSION_CREATION_MODE=True pytest tests/regression/test_regression.py`

You should only do this if you know exactly what you're doing.

Running regression tests

The second step is much simpler. It will load the adapters and the
output created in the first step, and compare the output to the output
from a new PEFT model using the loaded adapter. The outputs should be
the same.

If more than one version is discovered for a given test setting, all of
them are tested.

Notes

Regression artifacts are stored on HF Hub.
2023-12-06 15:07:05 +01:00
2336780f9e Raise error when modules_to_save is specified and multiple adapters are being unloaded (#1137)
* handle `modules_to_save` when unloading

* address comments

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* quality

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-12-06 19:14:58 +05:30
c22a8e5d47 DOC: How to configure new transformers models (#1195)
I believe that new transformers architectures could be the most common
case of users wanting to apply PEFT on a model that is not supported out
of the box. Thus I added a section specifically to help users configure
their configs for new transformers models.

As I wanted to point users to a single file that contains all the
existing transformers models, I added a new file
`src/peft/utils/constants.py`, which contains all the mappings that
previously lived in `src/peft/utils/other.py`. LMK if that makes sense.

Notes

To be absolutely backwards compatible, I re-imported the moved constants
into `other.py`. This way, if there is code that imports them directly
from there, it should continue to work.

To avoid getting a linter error for unused imports, I added those
constants to the `__all__` list in `other.py`.

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-05 18:51:12 +01:00
1a7433b136 TST Improve test for SD LoHa and OFT (#1210) 2023-12-05 18:12:39 +01:00
70d559d029 DOC Initialization options for LoRA (#1218)
Document the initialization options for LoRA. This is especially
important for LoftQ, since otherwise, it may not obvious for users how
to make use of it.
2023-12-05 18:01:47 +01:00
bffbbbf76a MNT Delete the delete doc workflows (#1213)
They are failing because the corresponding GH action no longer exists.

See discussion in #open-source-interal
2023-12-05 13:21:28 +01:00
9c70468a3c [docs] API docs (#1196)
* first draft

* fix path

* fix all paths

* typo

* last typo 🤞

* fix toctree

* typo

* fix section title

* feedback

* update
2023-12-04 11:45:26 -08:00
f7cf460f7c [docs] Update index and quicktour (#1191)
* first draft

* fix toctree

* lora subby section

* feedback

* iframe height

* feedback
2023-12-04 11:00:29 -08:00
1b1091c158 remove HF tokens (#1207) 2023-12-04 15:15:19 +01:00
c456d55216 DOC: Update & improve docstrings and type annotations for common methods and classes (#1201)
The docstrings of the most user-exposed methods and classes have been
updated, or added if not already present. Furthermore, type annotations
have been updated or added for those methods and classes.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-04 12:22:03 +01:00
e05b2670c5 ENH: Enable OFT adapter for mixed adapter models (#1204)
This PR makes it possible to use the newly added OFT adapter in mixed
adapter type models, similar to LoRA, LoHa, etc.

Notes

Adding the integration was pretty straightforward, which is a good sign.

The difficult part was actually about the tests. This stems from the
fact that OFT is (if my understanding is correct) never commutative.
What I mean is that even if the adapters are applied to the last layer
of a model, it makes a difference whether we apply, say, first LoRA,
then OFT vs first OFT, then LoRA.

This is different for the other adapters that were added so far for
mixed models, as they basically do:

- Xa = X + dXa
- Xab = Xa + dXb = X + dXa + dXb = X + dXb + dXa = Xb + dXa = Xba

This is not true for OFT, so when OFT is used, I had to ensure
that no test was applied that (implicitly) assumes commutativity.

Furthermore, I had to increase the model size, see this comment:

https://github.com/huggingface/peft/pull/1160#issuecomment-1836107235
2023-12-04 12:18:49 +01:00
5ed46e4f04 FIX Issue with megatron parallel linear lora (#1202) 2023-12-04 12:16:58 +01:00
5bad88ba04 [DOCS] README.md (#1054)
minor fixes
2023-12-04 11:53:40 +01:00
6a57472665 Mixed adapter models (#1163)
Description

This PR allows to add adapters of different types, e.g. LoRA and LoHa:

base_model = ...
config0 = LoraConfig(...)
peft_model = get_peft_model(base_model, config0, mixed=True)
config1 = LoHaConfig(...)
peft_model.add_adapter(config1, "other")
peft_model.set_adapter(["default", "other"])
peft_model(x)

At this point, both adapters are active at the same time.

Existing code should not be affected by this change, since users need to
opt into this behavior by setting mixed=True, and a completely different
class is being used (PeftMixedModel).

Also interesting is that this method can be used for a single adapter
type but with very different configs. Right now, we have limited support
for that (e.g. for LoRA, different r values by using rank_pattern), but
with this, we don't need to special case the differing arguments
anymore.

Not implemented

- [ ] I'm not yet sure if the same logic can be applied to IA³ or if it
  may fail because IA³ can apply its scaling to the input, not the output.
- [ ] OFT is not supported yet but should work.
- [ ] It is currently not possible to represent a mixed adapter model as
  a single config. I think we can come up with a solution but I don't
  think it is necessary for a first version of this.
- [ ] Saving and loading is not yet implemented for mixed models.

Those could potentially be added in a future PR.

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-11-30 21:58:16 +01:00
da17ac0f48 [Feature] Support OFT (#1160)
* Support OFT

* add test

* Update README

* fix code quality

* fix test

* Skip 1 test

* fix eps rule and add more test

* feat: added examples to new OFT method

* fix: removed wrong arguments from model example

* fix: changed name of inference file

* fix: changed prompt variable

* fix docs

* fix: dreambooth inference revision based on feedback

* fix: review from BenjaminBossan

* apply safe merge

* del partially

* refactor oft

* refactor oft

* del unused line

* del unused line

* fix skip in windows

* skip test

* Add comments about bias added place

* rename orig_weights to new_weights

* use inverse instead of linalg.inv

* delete alpha and scaling

---------

Co-authored-by: Lukas Kuhn <lukaskuhn.lku@gmail.com>
Co-authored-by: Lukas Kuhn <lukas.kuhn@deutschebahn.com>
2023-11-30 21:28:42 +05:30
2674f5ea66 Megatron distributed parallel linear LoRA (#1092)
Adds option to use Megatron's ColumnParallelLinear and RowParallelLinear
for LoRA linear layers, leading to improved performance when using LoRA
with Megatron.
2023-11-30 16:24:58 +01:00
2b901ee572 Add LoftQ initialization method for LoRA (#1150)
---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-11-29 17:08:17 +01:00
8298f1a366 Training PEFT models with new tokens being added to the embedding layers and tokenizer (#1147)
* add support for saving base layers weights along with adapter weights

* Update save_and_load.py

* Add an example showing the usage of the added feature

* refactor the functionality

* fix

* refactoring code

1. Add `is_embedding_layer_resized` parameter to `save_pretrained`
2. Fix the deduplication in README when adding PEFT details.
3. `save_pretrained` should only save the model when `is_main_process=True` which is one of the parameters of `save_pretrained`.

* update example

* fix the model card

* fix model card

* 😅

* fix model card

* automate setting `is_embedding_layer_resized`

* nits

* Update peft_lora_clm_with_additional_tokens.ipynb

* add test

* fix tests

* maybe fixes the issue?

* address comments

Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-11-29 19:28:41 +05:30
f0fb9516d8 ENH: Different initialization methods for LoRA (#1189)
This PR adds the possibility to use different initialization methods for
LoRA, as is a requirement for a completely backwards compatible adoption
of PEFT in diffusers.

The default is still the same as always, namely the one from the
reference implementation by Microsoft. On top of that, it is now
possible to pass `init_lora_weights='gaussian'` to initialize the LoRA
weights in the same way as is default for diffusers, namely with a
normal distribution which is scaled by 1/r.

The init method currently applies to LoRA linear and conv layers, but
not embedding layers, which are always initialized from a normal
distribution (and are probably irrelevant for diffusers).

In the future, similar extensions could be added for other adapter
methods.
2023-11-29 12:37:39 +01:00
04c411010b Examples: add options to save or push model (#1159) 2023-11-28 16:04:52 +01:00
da29ae62d4 ENH Add support for phi model architecture (#1186) 2023-11-28 14:43:06 +01:00
64c8d1da85 FIX Pass HF token when calling PeftModel.from_pretrained (#1076) 2023-11-28 14:17:25 +01:00
e586f96740 DOC Update a few places in the README (#1152)
- fix bits_and_bytes => bitsandbytes
- add a few links
- add mistral to list of supported models
---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-11-28 11:04:57 +01:00
e35d46de19 Fix code example in quicktour.md (#1181) 2023-11-27 22:29:11 +01:00
b4faffea8a [Tests] Migrate to AWS runners (#1185)
* migrate single-gpu runners

* Update nightly.yml

* Update nightly.yml

---------

Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>
2023-11-24 18:40:19 +01:00
19145bba8a FIX Wrong use of base layer (#1183)
This is important if we have nested adapter layers. This was an overlook
during the refactoring #1106.
2023-11-24 17:03:59 +01:00
c0dd27bc97 Fix dockerfile build (#1177)
* Update Dockerfile

* Update build_docker_images.yml

* Update Dockerfile

* Update build_docker_images.yml
2023-11-23 15:40:35 +01:00
fb607d00ad DOC convert mdx to md (#1171)
Content can still technically be mdx but mdx is not rendered well on
GitHub, so this makes reviewing doc files easier.
2023-11-23 11:38:57 +01:00
a634f6a13e Update release checklist about release notes (#1170)
Add a reminder in the release checklist to consult the release note
google doc.
2023-11-23 10:35:53 +01:00
dd4771b2f4 (minor) correct type annotation (#1166)
* add correct type annotation

* make style
2023-11-22 20:52:26 +01:00
043238578f fix add_weighted_adapter method (#1169)
* fix `add_weighted_adapter` method

Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-Authored-By: jihuishan <151612440+jihuishan@users.noreply.github.com>

* Update testing_common.py

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: jihuishan <151612440+jihuishan@users.noreply.github.com>
2023-11-22 17:44:21 +05:30
b4ac2d840b FIX Dataset loaded twice in 4-bit finetuning script (#1164) 2023-11-22 12:23:50 +01:00
0ae52fece1 [Docs fix] Relative path issue (#1157) 2023-11-21 10:57:56 +01:00
8351331d78 ENH Delete IA3 adapters (#1153) 2023-11-20 18:22:52 +01:00
f1ecfa6ae6 Use huggingface_hub.file_exists instead of custom helper (#1145)
* Use 'huggingface_hub.file_exists' instead of custom helper

* make quality
2023-11-17 15:48:02 +01:00
b5a8a294ed FIX A few issues with AdaLora, adding tests (#1146)
This PR fixes a handful of issues with AdaLora, should resolve #1113.

Description

1. lora_A.weight.device was called but for AdaLora, lora_A is a
   nn.Paramter, not an nn.Module, so the weight attribute does not
   exist. lora_A.device is sufficient.
2. For 8bit, an inplace operation failed because it was on a view. Now
   the operation is no longer inplace.
3. The loss term of the model output is not necessarily a torch tensor.
   In the test, it was a dict and did not contain an actual loss.
   Therefore, I added a check to make sure the loss is a torch tensor.
2023-11-17 15:18:34 +01:00
9cdaed2769 CI Add Python 3.11 to test matrix (#1143)
Only required change was to call .value on some enums when used in
messages, as their repr has changed in Python 3.11.
2023-11-17 14:11:54 +01:00
18a0910113 [Tests] Do not stop tests if a job failed (#1141)
* Update nightly.yml

* Update nightly.yml
2023-11-16 18:11:19 +01:00
99e1a55f54 [core / LoRA] Add adapter_names in bnb layers (#1139)
* Update bnb.py

* fix style
2023-11-16 17:12:39 +01:00
21df968fd1 [Tests] Fix daily CI (#1136)
* fix daily CI

* adapt from suggestion
2023-11-16 14:43:36 +01:00
5a3a5acff2 Refactor base layer pattern (#1106)
Description

Refactor all tuners (where it applies, i.e. not prompt tuning) to use
the "base layer pattern". This means that the adapter layer will always
hold a reference to the original layer that it modifies. This pattern is
already partly used (e.g. LoRA bnb, gptq layers), now it is consistently
used everywhere when applicable.

This PR is a companion PR to #1069, where I first added these changes.
They are now extracted to a separate PR to make code review easier and
to advance more quickly.

Implementation

The main change is that the adapter layer wraps the original layer and
calls forward on that layer, instead of doing stuff like this:

F.linear(input, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)

which completely circumvents the call to the target layer's forward
method. With the base layer pattern, we now call the target layer's
forward method. Therefore, if the target layer is another adapter
layer (which will be crucial for mixed adapters), we call its forward
method correctly. Also, this should allow passing extra arguments, like
lora_scale to forward.

This change has the nice side benefit that we no longer need to use
_init_empty_weights -- in fact, we don't initialize any of the target
layer's weights anymore, since we have a reference to it. There is thus
no risk of having slow but superfluous initialization of layers.

Moreover, I could greatly simplify merge_and_unload by just using the
base_layer instead of having to create a completely new layer. For
OPT-350m, this results in a 15x speedup.

Note that same as for the bnb layers, this should be backwards
incompatible, since the adapter weights and their state_dicts are not
affected by this change. I used #1115 for regression testing.

Somewhat unrelated changes

During debugging, I got very annoyed with the fact that the reprs of
adapter layers and normal PyTorch layers are hard to distinguish, e.g.
the type is just "Linear". Now, for adapter layers, it is prefixed by
the adapter type, e.g. "lora.Linear". This should have no further
implications except for the repr (e.g. state_dict remains unaffected).

For LoHa and LoKr, I had to change the init of weights when using
init_weights=False. This is because of what is discussed in Numerical
instabilities with LoHa #1058.

IA³ now has the unload method too.

LoHa and LoKr now support safe_merge=True when merging layers.

Migration guide

For 99% of users, the code should continue working as ususal, because
the API stays the same. Only low level details have been changed.

Code that relies on isinstance checks on specific PEFT classes may
break. E.g. the LoRA Linear layer no longer inherits from nn.Linear. It
is, however, still a BaseTunerLayer. The same logic applies for other
layer types like Conv2d and for other tuners like IA³.

To retrieve the base layer of an adapter layer, you should now call
module.get_base_layer() if you deal with a BaseTunerLayer. Don't rely on
something like module.weight being present (though it might be).
2023-11-16 12:45:12 +01:00
70302d7b4f FEAT: Merging only specified adapter_names when calling merge (#1132)
* working v1

* add tests

* remove

* add it also for lokr and loha, left a todo

* Update tests/testing_common.py

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* better test

* up

* fix tests

* credits contrib and suggestions from disscussions

* credits contrib and suggestions from disscussions

* address last comments

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-11-16 12:05:22 +01:00
3ff90626b6 FEAT: Make safe serialization the default one (#1088)
* make safe serialization the default one

* adapt tests

* fix final tests'

* adapt from suggestion
2023-11-15 11:21:23 +01:00
1877329093 TST Improve requires grad testing: (#1131)
Previously, the corresponding tests were testing only whether specific
parameters had requires_grad True or False. Now, all parameters are
being checked. This is more rigorous.

Also, tests for Embedding, Conv1D, Conv2d were added, thus superseding
PR #1115.

Finally, tests for LoHa and LoKr were added.

Note

I considered moving the tests to a separate module, as they were getting
quite big and this would help with readability. For now, I left them in
the same module because it leads to a better diff view and is thus
easier to review. LMK if I should move the tests to a separate file.
2023-11-14 17:44:49 +05:30
98429b8184 Fix: TorchTracemalloc ruins Windows performance (#1126)
* feat: added tracemalloc arg to train_dreambooth

* fix: added help for arg

* fix: changed arg name

* fix formatting

* fix: import order
2023-11-14 17:04:32 +05:30
d350a00ece Prompt tuning: fix AutoTokenizer.from_pretrained (#1053)
Fixes #1032

Description

Currently, when using prompt tuning with TEXT, we call
AutoTokenizer.from_pretrained with only the model id. However, it may be
necessary to pass additional arguments, e.g. trust_remote_code=True.
This fix allows to pass more arguments by setting the argument
tokenizer_kwargs in the PromptTuningConfig.

I also added a check that when tokenizer_kwargs is set, the TEXT option
is actually being used.

Moreover, I noticed that we have no tests for prompt tuning with TEXT,
so I added those tests for decoder models.

Additional changes

There was a bug in PromptEmbedding where the device of the
init_token_ids was not set, which resulted in errors when using CUDA.

Finally, I removed an unused constant CONFIG_CLASSES from a test.
2023-11-14 16:58:55 +05:30
ad756173f1 FIX: Adding 2 adapters when target_modules is a str fails (#1111)
* Fix adding 2 adapters when target_modules is a str

Problem description

Adding two adapters (e.g. LoRA) when using a list for `target_mdules`
works but passing a str fails. The issue is that for str, we do a
`re.fullmatch`, whereas for list, we just check `endswith`. After adding
the first adapter, though, the naming pattern of the modules changes. In
the example above, the name for the linear layer changes from `"lin0"`
to `"base_model.model.lin0"`, which is why the `fullmatch` fails but the
`endswith` still works.

Reproduction

from peft import LoraConfig, get_peft_model
from torch import nn

class MLP(nn.Module):
    def __init__(self, bias=True):
        super().__init__()
        self.lin0 = nn.Linear(10, 20, bias=bias)

def test_target_modules_list():
    config = LoraConfig(target_modules=["lin0"])
    test_it(config)
    print("Adding two adapters with target_module being a list works")

def test_target_modules_str():
    config = LoraConfig(target_modules="lin0")
    test_it(config)

def test_it(config):
    model = MLP()
    model = get_peft_model(model, config, "adapter0")
    model.add_adapter("adapter1", config)
    print("Adding two adapters with target_module being a str works")

if __name__ == "__main__":
    # works
    test_target_modules_list()
    # ValueError: Target modules lin0 not found in the base model
    test_target_modules_str()

I think that most users would be surprised that:

1. Adding the first adapter works but adding the second fails, even
   though they use the same config.
2. Using `target_modules=["lin0"]` works but `target_modules="lin0"`
   fails for the 2nd adapter.

Solution

We could change the logic of not using `re.fullmatch` for str, but I
think that could be tricky to achieve without breaking BC. Instead, I
chose to change the inject_adapter call in add_adapter to pass the base
model, not the whole peft model. This way, the naming pattern is
preserved.

Tests

I haven't added extra tests for this. The script above could serve as a
test. However, it will be sufficient to remove the guard added in #1105:

    if isinstance(config.target_str, modules):
        # TODO this should be doable
        self.skipTest("Multiple adapters cannot currently be added when target_modules is a string.")

as that will test exactly this behavior and was how the bug was
originally uncovered. Depending on what PR lands first, the guard has to
removed in this PR or in #1105.

* Enable tests for adding 2 adapters with str
2023-11-14 15:00:52 +05:30
94877b5008 Release: v0.6.3.dev0 (#1128) 2023-11-14 14:59:55 +05:30
f020404ee6 Release: v0.6.2 (#1125) 2023-11-14 11:13:21 +05:30
ChG
79298c7c24 fix doc typo (#1121) 2023-11-13 10:48:50 +01:00
b25ce8a0cd Correctly deal with ModulesToSaveWrapper when using Low-level API (#1112)
* correctly deal with  `ModulesToSaveWrapper`

* style

* fix tests (#1117)
2023-11-13 12:22:30 +05:30
5d84484079 fix import issue transformers (#1116) 2023-11-10 18:37:38 +01:00
49ddefa834 Add num_dataloader_workers arg to dreambooth script (#1107)
This is especially important for Windows users, who may have to set the
number of workers to 0.
2023-11-10 14:21:14 +01:00
3af469eeea Refactor adapter deletion (#1105)
Description

The job of deleting an adapter is now transferred to the adapter layer,
instead of the adapter model. This makes it easier for users or other
libraries who don't use the adapter model to delete adapters.

Implementation

The code should now be more generic, relying less on hard-coded
attributes.

As a precaution, I also changed the type of adapter_layer_names from
list to tuple, as it should not be mutated.

When deleting the active adapter, the logic for choosing the new active
adapter has been changed slightly to ensure consistency across layers.
In practice, this should rarely make a difference. An error is now
raised if the last remaining adapter is deleted.

Test coverage has been increased:

- Deleting adapters is now also tested for custom models.
- It is also tested for LoHa, LoKr, not only LoRA.
- I added a test for deleting the non-active adapter.

Not implemented

I did not add adapter deletion to IA³, since it is included in #980. LMK
if it should be added here instead.
2023-11-10 13:33:56 +01:00
5e7e5ad836 Avoid over-eager auto-gptq import (#1109) 2023-11-10 12:35:18 +01:00
9d8287f3e3 set dev version (#1104) 2023-11-09 15:44:28 +01:00
2efd02769b Release: 0.6.1 (#1103) 2023-11-09 15:16:33 +01:00
669dd4edeb Change to 0.6.1.dev0 (#1102)
* change to 0.6.1.dev0

* oops
2023-11-09 15:03:15 +01:00
b5641cc744 [core] Fix safetensors serialization for shared tensors (#1101)
* fix st serialization

* add test

* add CI test

* add comment
2023-11-09 14:50:35 +01:00
c5d94855cd FIX Failing nightly CI tests due to IA3 config (#1100)
Same idea as in PR as #1094, but for yet more ill-configured IA³
configs. The tests are now failing because we do stricter checks on
incorrect IA³ configs.
2023-11-09 13:50:44 +01:00
face67dfeb Fix IA3 config for Falcon models (#1007)
* fixed feedforward for falcon

* fixed target_modules for falcon
2023-11-09 12:41:57 +05:30
d9094cebea FIX: broken f-string in import_utils (#1091) 2023-11-08 12:12:24 +01:00
493ae58beb fix the failing CI tests (#1094) 2023-11-08 14:47:55 +05:30
ed4ce9fc94 fix-gptq-training (#1086)
* fix-gptq-training

* style

* review
2023-11-07 11:12:23 -05:00
4c48970cb0 Update the release checklist (#1075)
As discussed, we wanted to make small amendments to the release process,
so that we have a 0.N.0 commit on main. I also adjusted the wording here
and there.
2023-11-07 14:23:38 +01:00
46e03602ed [Docker] Update Dockerfile to force-use transformers main (#1085)
* Update Dockerfile

* Update Dockerfile

* Update Dockerfile
2023-11-07 12:20:15 +01:00
45343a4ccc Improve documentation for IA³ (#984)
- Improve ia3 documentation
- Raise value error for incorrect feedforward_module list
- Added tests

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-11-07 11:44:27 +01:00
276c91b143 FIX: fix adaptation prompt CI and compatibility with latest transformers (4.35.0) (#1084)
* fix adaptation prompt CI

* undo some other changes
2023-11-06 14:04:19 +01:00
cfe35a7878 FIX: Skip adaption prompt tests with new transformers versions (#1077)
Adaption prompt is failing with transformers v4.35.0. This PR skips the
adaption prompt tests so that CI is green again. The PR also adds an
error when users try to use adaption prompt with that version,
instructing them to use an older transformers version instead.

This should be removed as soon as the issue is fixed in
PEFT/transformers.
2023-11-03 15:52:51 +01:00
d47d23aa0e After release: Bump version to 0.7.0.dev0 (#1074) 2023-11-03 11:25:04 +01:00
02f0a4ca59 Release version 0.6.0 (#1072) 2023-11-02 15:07:03 +01:00
23cfbf22eb Fix slow tests not running (#1071)
* Update nightly.yml

* Update nightly.yml
2023-11-02 10:23:17 +01:00
9da72d25ed Fix Slack bot not displaying error messages (#1068)
* Update log_reports.py

* Update log_reports.py

* Update log_reports.py

* change logic

* fix
2023-11-01 12:41:23 +01:00
0ad95fa361 TST test coverage for layer matching (#1031)
Add tests for module name matching using regex and other custom arguments.
2023-11-01 11:39:40 +01:00
6960076699 [tests] Update Dockerfile to use cuda 12.2 (#1050)
* [`tests`] Update Dockerfile to use cuda 12.2

* Update nightly.yml
2023-11-01 10:48:12 +01:00
bdeb06b16c [core] Fix use_reentrant issues (#1036)
* fix use_reentrant issues

* fix

* fixup

* address comments.

* add warnings

* oops

* fix

* quality
2023-10-31 16:51:41 +01:00
884b1ac3a8 Add implementation of LyCORIS LoKr for SD&SDXL models (#978)
KronA-like adapter
2023-10-30 15:36:41 +01:00
207229ad5e FIX Conv1D merge error for IA3 (#1014) 2023-10-26 15:51:49 +02:00
2464c572eb FIX setting active adapter correctly (#1051)
Currently, when calling set_adapter, the active adapter is not updated.
Tests have been added to trigger the bug and the method updated to fix
it.

Moreover, I created an active_adapters property on the PeftModel class
so that it behaves consistently with the underlying models like
LoraModel.
2023-10-25 14:53:45 +02:00
8b21a4e5ab DOC fix wrong import in p-tuning docs (#1049) 2023-10-25 11:16:08 +02:00
894e68a408 FIX: wrong construction of LoHa weights (#1021)
Also: Update convert_sd_adapter_to_peft.py to account for a bug in
Lycoris-LoRA. See https://github.com/KohakuBlueleaf/LyCORIS/pull/115
2023-10-24 15:26:42 +02:00
7594903444 DOC: Fix StackLLaMa link, typos in README (#1047) 2023-10-24 12:10:21 +02:00
1d0535e255 Fix target_modules type in config.from_pretrained (#1046)
Fixes #1045, supersedes #1041

Description

When loading a config from a file, we currently set the loaded
attributes on the config directly. However, this sidesteps the
__post_init__ call, which is required to convert the target_modules to a
set. This PR fixes this by avoiding to set attributes on the config
class directly, instead of going through __init__.

Other changes

While working on this, I did a slight refactor of the config tests.

1. All config classes are included now (some where missing before).
2. Use parameterized instead of looping through the classes.
3. Added a unit test for the aforementioned bug.
2023-10-24 12:06:01 +02:00
56556faa17 [LoRA] Raise error when adapter name not found in set_scale (#1034)
* fix scale nit

* style

* nit
2023-10-18 19:36:03 +02:00
15a013af5f [LoRA] Revert original behavior for scale / unscale (#1029)
* revert original behavior for scale / unscale

* harmonize arg name

* credits contrib

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-10-17 00:27:02 +02:00
45565f4357 fix lora scaling and unscaling (#1027) 2023-10-16 10:10:30 -07:00
aaa7e9f44a FEAT: Add fp16 + cpu merge support (#1017)
* add fp16 + cpu merge support

* fix tests

* add fp16 tests for custom models

* fix tests

* adapt from comments

* more clarifications
2023-10-13 12:23:16 +02:00
07f2b82dae Fix stale.py to use timezone-aware datetime (#1016)
Fix an error with our stale.py script:

> can't subtract offset-naive and offset-aware datetimes

https://github.com/huggingface/peft/actions/runs/6497439325/job/17646562512
2023-10-12 18:42:06 +02:00
eced2edff8 FIX Don't assume model_config contains model_type (#1012) 2023-10-11 10:34:28 +02:00
e98df91906 ENH: Refactor LoRA bnb layers for faster initialization (#994)
Partly addresses #896

Description

After speeding up normal LoRA layer initialization, this PR improves
initialization speed of bnb LoRA layers.

The method to achieve this is different from the one used before, namely
this time the base layer is stored as a reference on the LoRA layer.
This allows us to avoid calling __init__ on the bnb layer, which is what
is slow.

Notes

We cannot use the same method as for the normal LoRA layers, (i.e.
calling the super class's __init__ with meta device) because the bnb
layers have extra logic that still creates unnecessary weights.

However, the way used here could also be a solution to the normal
layers, so if we want to have consistency, the normal layers could be
refactored to use the same approach.

Interestingly, even though we now save the base layer as a reference,
which results in a different state_dict, the existing models can still
be loaded successfully. This is because the adapter state_dict is not
affected by the change, so users can still load their existing adapters.

The only problem would occur if users dump the whole model, i.e. base
model and adapter, using torch.save and then trying to load with
torch.load. For those users, we could theoretically provide a script to
convert the state_dict (i.e. renaming some keys).

To ensure that the old adapters can still be loaded successfully, I'm
working at the same time on adding regression tests. I'll create a
separate PR for those to avoid blowing up this one.

Tests

I ran a test on bloomz-1b1 for how long it takes to create the
PeftModel, the results are:

8bit: 1108.34 ms > 26.82 ms
4bit: 1101.96 ms > 23.69 ms
2023-10-10 16:47:35 +02:00
0c16918c34 [core / LoRA] Add safe_merge to bnb layers (#1009)
* add `safe_merge` to bnb layers

* adapt from suggestions
2023-10-10 11:30:21 +02:00
c2c544dc9f FEAT: Add safe_merge option in merge (#1001)
* add `safe_merge` option in `merge`

* oops

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address final comments

* Update src/peft/tuners/lora/layer.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Update src/peft/tuners/lora/layer.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* add it for ia3

* add it for adalora

* up

* revert for loha

* style

* fix CI

* adapt from suggestions

* add tests

* up

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-10-09 18:28:00 +02:00
d7f520a320 Fix word_embeddings match for deepspeed wrapped model (#1000)
* vocab size prompt vocab fix

* add comments

* Update src/peft/peft_model.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-10-09 14:25:07 +02:00
d17266d599 ENH Support Conv2d layers for IA³ (#972)
Adds support for Conv2D layers to the IA³ tuner. Tests are added to
check that they work.

Notes:

Unfortunately, when unmerging the Conv2d IA³ layers, there is quite a
bit of rounding error. I had to increase the tolerances for this
specific test case to make the tests pass. I'm not 100% sure why this
is, but I could imagine that for Conv2d, small errors accumulate because
of the convolution operation.

I also added tests for IA³ Linear layers for the custom models, which
also pass. However, there is an error when using Conv1D. The reason is
that merging fails because there is a shape mismatch when
fan_in_fan_out=True (which is set automatically for Conv1D). This is
left for a future PR.
2023-10-09 12:20:19 +02:00
dfd99f61f8 TST: Comment out flaky LoHA test (#1002)
This test is flaky when running on Windows. It is probably related to
PyTorch 2.1, as this test used to work. Further investigation is needed.
2023-10-09 10:33:54 +02:00
dbd40d96a1 Fix lora creation (#993)
* reducing the time for inject lora modules

* fix bugs

* fix bug

* fixes

* Revert "fixes"

This reverts commit c7f30627c1798db11be8a5da8f3c801f9469a5e3.

* refactor

* fix failing tests

* fix tests

* fix tests

* fix tests

* fix tests

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address comments

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-10-05 13:27:49 +05:30
99f792e8a3 MNT Make .merged a property (#979)
After adding support for multiple active adapters, we have some double
bookkeeping when it comes to tracking merged adapters. On the one hand,
we have merged_adapters, which lists all merged adapters, and on the
other hand, we have the merged attribute, which indicates if at least
one adapter is merged.

Having two sources of truth is bad, because it's more work to keep them
in sync and there is a risk of them getting out of sync. Therefore, this
PR removes the .merged attribute. In order to keep the same interface,
we add a merged property, which returns True if there are any
merged_adapters.
2023-10-04 11:39:36 +02:00
a7fb9fb090 Add base model metadata to model card (#975)
Resolves #938

This PR adds the base model metadata, if present, to the model card.

On top of this, the code for creating the model card has been refactored
to use the huggingface_hub classes instead of doing ad hoc parsing and
writing.
---------

Co-authored-by: Lucain <lucainp@gmail.com>
2023-10-04 09:44:10 +02:00
a977ce69a5 Fix typo in custom_models.mdx (#964)
* Fix typo in custom_models.mdx

* Fix typo in low_level_api.mdx
2023-10-03 18:05:06 +05:30
3d0edccc4a Correct minor errors in example notebooks for causal language modelling (#926)
* updated Readme

* Corrected label bos token error; switched to eos token from pad token

* reverted readme change
2023-10-03 18:01:10 +05:30
763511dc28 add the lora target modules for stablelm models (#982) 2023-10-03 17:59:07 +05:30
1367bc6f0d FIX: issues with (un)merging multiple LoRA and IA³ adapters (#976)
* Fix issues with merging multiple adapters

This should resolve the failing slow test
test_4bit_merge_and_disable_lora.

While investigating, I also noticed that merging multiple adapters was
not correct for IA³. I added a test that should catch this bug and
provided a fix for it too. However, the test does not check IA³ at the
moment because the test parameters do not contain IA³. For this, #972
needs to be merged too, which adds IA³ to the test parameters.

* Small adjustments to tests

Previously, tests had some exploding gradients, making them unstable.
2023-10-03 16:53:33 +05:30
88dfc5d2a8 update Bibtex (#989) 2023-10-03 14:01:09 +05:30
7a5f17f39e FEAT Add LyCORIS LoHa for SD&SDXL models (#956)
https://arxiv.org/abs/2108.06098
2023-10-02 10:44:51 +02:00
52ff0cde9f Fix missing tokenizer attribute in test (#977)
Fix the error in test_causal_lm_training_mutli_gpu_4bit.
2023-09-29 15:34:43 +02:00
cacee957e6 [tests] add multiple active adapters tests (#961)
* add tests for multiple active adapters

* add multiple active adapter tests

* fix tests

* fix the device error

* fix typo

* fix the variables

* fix the `adalora` config

* add util function for proper naming of tests

* fix bugs

1. fix `add_weighted_adapter` when working with adapters targeting different layers
2. fix `ia3` model and layer to handle adapters targeting different layers
3. fix the multiple active adapter tests

* fix `ia3` issue

* remove debug statements

* fix test

* fix bug

* address comments

* address comments

Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* fix tests

* remove unused code

* Update test_custom_models.py

* increasing tolerance for a test

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-09-29 09:44:30 +05:30
bedcaa4f82 TST: Fix broken save_pretrained tests (#969)
Resolves #968

As the linked issue mentions, the test_save_pretrained_selected_adapters
test was broken because the 2nd adapter would load the weight of the
default adapter, instead of its own weights. This was a pretty quick
fix.

However, this made me wonder why the test was passing beforehand when it
is loading the wrong weights, so I dug deeper.

The first issue I encountered was that for IA³, we did not set
init_ia3_weights=False. For this reason, all weights were set to 1.0. Of
course, in that case, it doesn't matter what weights are loaded, they're
all the same. Therefore, I changed init_ia3_weights=False.

However, this still doesn't explain why this worked for LoRA, which,
even without init_lora_weights=True, should have some completely random
weights which should not be the same.

The reason for that is that we used get_peft_model_state_dict to get the
state_dict we used for comparison. This function only returns the
weights of one of the adapters (in this case default), so the weights
for the new adapter were never compared at all. Thus I changed this so
that all weights are now compared.

However, this now caused the tests for prompt encoders to fail. For some
reason, the state_dict from prompt encoding models is not the same after
loading. At this point, I stopped investigating and just wrote an
exception for prompt encoding to use get_peft_model_state_dict instead
of comparing the whole state_dict. Any insights would be appreciated.

Finally, for completeness, I added some checks for the existence of the
files of the new adapter.
2023-09-28 16:33:16 +02:00
f66c3859b0 add the lora target modules for Mistral Models (#974) 2023-09-28 14:55:13 +05:30
69665f24e9 Update integrations_tests.yml (#966)
* Update integrations_tests.yml

* Update integrations_tests.yml
2023-09-26 14:45:01 +02:00
08b6665167 ENH Add 4-bit support for IA3 (#864)
Notes:

- Add guard to IA³ Linear8bitLt definition (should have already been there).
- Merging not supported (yet).
2023-09-26 14:11:32 +02:00
d54a23d30e Fix integrations_tests.yml (#965) 2023-09-26 14:01:22 +02:00
9856f79cf9 [tests] add transformers & diffusers integration tests (#962)
* add transformers integration tests

* add diffusers

* test also on transformers release

* adapt from suggestions

* suggestions
2023-09-26 13:00:57 +02:00
634bd197f2 FIX: setting requires_grad on adapter layers (#905)
* [WIP] Fix setting requires_grad on adapter layers

This is an alternative to #900, resolves #899.

Description

Currently, we don't handle setting requires_grad on adapter layers
really well. The main issue is that it can be set to True on adapter
parameters that are not being used, e.g. the original_module in
ModulesToSaveWrapper or inactive adapters in LoRA.

Normally, this is not a big issue, except maybe if we want to correctly
count the number of trainable parameters. However, when training with
DistributedDataParallel, this results in errors, as PyTorch thinks that
all parameters with requires_grad=True should participate in the loss
computation, but those mentioned parameters don't. For that reason,
training with DDP currently fails when using modules_to_save or multiple
adapters.

Implementation

This turned out to be more complicated than I initially thought. The
logic for setting requires_grad is all over the place, it was hard to
encapsulate the logic and I only succeeded partially. As is, this PR is
more complex than the one it tries to supersede, #900, but it is also
"more correct".

Tests were added to check whether requires_grad is set correctly. There
are (so far) no tests for whether DDP indeed works, they could be added
with multi-GPU. I did, however, test an early stage of this PR with DDP
and setting requires_grad correctly will indeed fix the DDP error.

DONE/TODO

- [x] ModulesToSaveWrapper
- [x] LoRA
- [ ] IA³
- [ ] AdaLora

Since some tuners are not implemented yet, tests are expected to fail.
Check the new tests at the bottom of test_custom.py, those should pass.

* Refactor: move more requires_grad machinery to ABC

* [skip ci] [WIP] Add requires_grad logic to IA³

* Add AdaLora

* Fix some minor issues

* Make style
2023-09-26 13:01:05 +05:30
1af8ca484b feat: add type hints (#858)
* feat: add type hints

* build: trigger ci
2023-09-25 16:42:51 +02:00
1c0654b9a5 support multiple ranks and alphas for LoRA (#873)
* support multiple ranks and alphas

* Update lora.py

* Update lora.py

* commit suggestions

Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* address comments

Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Fixed multirank + multialpha for sequential LoRAs, added correct support of LoRA-C3Lier conversion (#937)

* Fixed multirank multialpha for sequential loras, added tests, fixed docs

* Refactored kohya_ss conversion script for proper support of LoRA-C3Lier

* Fixed styling

* Removed old comment from docstring

* shift `scale_layer`/`unscale_layer` to `LoraLayer` class to support all the child classes

* support multiple active adapters

* add `active_adapters` property

Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* fix bug related to active adapter of `ModulesToSaveWrapper`

* revert the change wrt active_adapter assignment

Co-Authored-By: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* addressing comments

* address comments

* address comment

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Alexander Kovalchuk <kovalexal@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-09-23 01:33:44 +05:30
1dc4a6761e Install correct PyTorch nightly in GH action (#954)
For the GH action about running torch.compile, when using the nightly
options, install from the correct index (used to be test, now is
nightly).
2023-09-21 16:15:11 +02:00
f3d4fef6e6 Allow compile GH action to run on torch nightly (#952)
If the action is triggered with nightly=true, the nightly PyTorch
version will be installed.

Also, added a line that *may* enable the action to run on forks, not
sure.
2023-09-21 09:57:46 +02:00
39264a0141 Fix some tests that would fail with torch.compile (#949)
Some tests would currently fail with torch.compile, not because there is
anything wrong with how PEFT works with compiled models, but simply
because of the way the tests are written. This is because when models
are compiled, the keys of the state dict change. Tests have now been
adapted to unwrap the compiled model first before getting the state
dict.

Note that the mentioned issue does not affect saving and loading,
because save_pretrained is already called on the original module, so
there is no issue with mismatched keys.

Also fixed the docstring of get_peft_model_state_dict.
2023-09-21 09:46:28 +02:00
ba0477f298 ENH error message when choosing wrong bias (#946)
Raise an error with a helpful error message when the user chooses an incorrect
option for the bias argument.

---------

Co-authored-by: datta0 <venkatadattasanimmaturi@gmail.com>
2023-09-20 11:26:35 +02:00
139624750a FIX: torch compile gh action installs pytest (#944)
* FIX: Install pytest for torch compile GH action

* [skip ci] commit to skip CI
2023-09-18 17:17:01 +02:00
1bbde1bfe0 Add GH action to run unit tests with torch.compile (#943)
The GitHub action works by setting an environment variable
PEFT_DEBUG_WITH_TORCH_COMPILE=1. This causes the tests to run with
torch.compile if get_peft_model is being used. The action is triggered
manually and requires to indicate the branch to run it on.

With this action, we should be able to incrementally add support for
torch.compile in PEFT without disrupting the existing tests. Once we
fully support torch.compile, we can think about adding to the tests by
default and to remove the code from this PR.
2023-09-18 16:58:01 +02:00
6b4554e643 add scale_layer / unscale_layer (#935) 2023-09-15 13:47:09 +02:00
c8c936eddf pin diffusers (#936) 2023-09-15 13:46:43 +02:00
93d0c03d5b Fixed LoRA conversion for kohya_ss (#916) 2023-09-14 11:00:24 +02:00
5bdbf2bcd6 fix lora layer init (#928) 2023-09-14 03:41:21 -04:00
4c611f40b4 MNT Add accelerate min dependency (#892)
Because of is_npu_available import.
2023-09-12 11:25:33 +02:00
8bdd4848f4 Make base_model.peft_config single source of truth (#921)
Resolves #802, #923

For the problem description, please check the first issue.

I went with solution 2, i.e. making the base_model.peft_config the
"single source of truth" for the PEFT configuration. That way, we
minimize the risk of diverging configurations.

This does not apply for prompt learning, where we don't have a
peft_config on the base model (which is just the normal model, not a
PEFT class).

I added a setter for peft_config but from my testing, it isn't being
used. It's only there for completeness.
2023-09-12 11:12:40 +02:00
b786b884f6 ENH speed up init emb conv2d (#915)
Partly resolves #872

Description

After getting faster initialization of the LoRA Linear layer,
initialization of Conv2D and Embedding is now sped up.

Implementation

The approach of how to achieve the speed up has slightly changed
compared to last time. To refresh memory, in #887, we avoided the
unnecessary initialization of the full weight matrix by completely
skipping nn.Linear.__init__.

Although it is possible to do the same for Embedding and Conv2d, we run
into some trouble here. The issue is that the __init__ methods of these
classes have quite a lot more arguments and some custom logic (i.e. not
only self.foo = foo but more on top). If we wanted to skip __init__
entirely, we would have to basically copy all of that into our code.
Although that is possible, it is brittle (e.g. the logic could be
different for different PyTorch versions or change over time).

For that reason, I opted to implement this differently, using a
suggestion we had discussed earlier. The approach is to call __init__ of
the parent class but enforce empty weights (this is what
torch.nn.utils.skip_init does, although we cannot use that function
directly). This way, we can avoid having to copy the __init__ code while
still avoiding expensive initialization of the weights.

I did not change the code for Linear to also use this approach because
the logic inside of Linear.__init__ is quite simple (at least for now),
so we are good here with the existing approach.

However, I was curious how changing the approach for Linear would affect
the initialization speed. Therefore, I ran the script from #872 again, 3
times each.

Current approach:

test 1 with model bert-base took 0.021 sec.
test 1 with model bert-base took 0.020 sec.
test 1 with model bert-base took 0.020 sec.
test 2 with model bloomz-1b7 took 0.030 sec.
test 2 with model bloomz-1b7 took 0.030 sec.
test 2 with model bloomz-1b7 took 0.030 sec.

New approach if applied to Linear:

test 1 with model bert-base took 0.038 sec.
test 1 with model bert-base took 0.039 sec.
test 1 with model bert-base took 0.038 sec.
test 2 with model bloomz-1b7 took 0.072 sec.
test 2 with model bloomz-1b7 took 0.048 sec.
test 2 with model bloomz-1b7 took 0.048 sec.

This shows that the new approach is indeed a bit slower than the
existing one, though still a lot faster than what we had before. IMHO, I
think we're safe to leave the code inside of Linear as is and benefit
from the slightly better performance at the cost of slightly more
fragile code.
2023-09-12 11:05:29 +02:00
0fa63fb4a2 DOC: Section on common issues encountered with PEFT (#909)
So far, this section contains two types of issues, not using latest package
versions and issues with loading PEFT models. This is based on what I
feel are issues that are commonly brought up.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-09-08 11:25:33 +02:00
f5aae1b47d ENH Merge lora module to 8bit model (#875)
Allows merging 8bit weights from bnb.

4bit weight merging was already implemented through the dequantization method
provided by bnb but there is no official dequantization method for 8bit weights.
This PR works by multiplying the weights to an identity matrix using bnb's
quantization aware matmul operation. Empirically, this results in a very small
rounding error.
2023-09-07 12:14:37 +02:00
6d140bad39 support prefix tuning for starcoder models (#913)
* support prefix tuning for starcoder models

* remoce the test filter for prefix tuning tests for StarCoder models
2023-09-07 15:06:42 +05:30
1f55957402 DOC remove double setup section (#911) 2023-09-07 10:41:14 +02:00
08368a1fba ENH Remove redundant initialization layer calls (#887)
This should lead to a big speedup when initializing LoRA layers.

---------

Co-authored-by: poedator <ruslansv@gmail.com>
2023-09-06 17:31:55 +02:00
20d9c175e2 FIX linting issue in example (#908) 2023-09-06 11:59:46 +02:00
d4dbf684e0 FIX gradient_accumulation_steps in examples (#898)
* fix gradient_accumulation_steps in examples
* update accelerator init
2023-09-06 11:14:57 +02:00
0c9354bda9 DOC Fix for semantic_segmentation_lora (#891)
An argument was renamed.

---------

Co-authored-by: Raghavan <oneraghavan@gmail.com>
2023-08-31 14:07:19 +02:00
f113af0b9e FIX: error using deepspeed zero2 + load_in_8bit + lora (#874)
Fix an issue in (Ada)LoRA forward of bnb layers when using bf16 + lora +
load_in_8bit.
2023-08-31 12:48:39 +02:00
43381008d6 Update build_docker_images.yml (#889) 2023-08-31 12:40:06 +02:00
7d99466446 DOC: PeftModel save_pretrained docstring (#881) (#888) 2023-08-30 17:16:22 +02:00
ecaaae8719 MNT Run tests that were skipped previously (#884)
Some tests were skipped because of an issue with how LoRA weights were
initialized for embeddings. This issue has been fixed for some time now,
so the tests no longer need to be skipped.
2023-08-30 14:40:56 +02:00
0b2f950cc2 FIX: Error in forward of 4bit linear lora layer (#878)
This was introduced during the refactoring of the forward function. It
should now be fixed and be equivalent to the forward function before the
refactoring:

4df9c5a243/src/peft/tuners/lora.py (L1207)

Bug reported by @jiqing-feng
2023-08-30 10:52:43 +02:00
85013987aa MNT: Move tuners to subpackages (#807)
For each tuner, created a sub-module that contains at least:

- config.py for config stuff
- model.py for the actual model/encoder/embedding
- __init__.py so that imports are preserved

Then, when there was a need, further files were created, like layer.py
or utils.py.

Imports were changed to absolute imports everywhere, except for the
sub-packages within a tuner directory, as these packages will always 
stay together in the same place.

For some existing modules, the license comment of the top of the file
was missing, I always added it.

There was a bug in the forward method of 4bit linear lora layers introduced
in #851, for the case that the model is merged AND adapters are disabled.
For that scenario, we need to unmerge first before generating the output,
same as we do for the vanilla Linear layer. This step was missing from the
code previously and is now implemented correctly. Tests were adjusted to
catch that error.
2023-08-29 11:32:29 +02:00
a23b9213f4 FIX: seq2seq prompt tuning (#439) (#809) 2023-08-29 10:53:14 +02:00
140a69bb90 Support merge lora module for 4bit and 8bit linear (#851)
* support merge lora module for 4bit and 8bit linear

* add tests for merging lora module to 8bit and 4bit model

* state shoule reset grad

* add prepare output before and after merge lora

* fix format

* fix format 2

* fix format 3

* add warning

* fix parameter format

* remove 8bit merge

* remove 8bit linear merge

* add comment for 4bit merge
2023-08-28 19:59:03 +05:30
8c17d556a8 DOC Fix typos in ia3 config docstring (#844) 2023-08-25 11:19:40 +02:00
0e37b85609 🎉 Add Multitask Prompt Tuning (#400)
* mpt

* fix save

* fix save

* add jupyter notebook

* add jupyter notebook

* add jupyter notebook

* drop shuffling

* drop classify_dataset

* drop classify_dataset

* fix keys

* fix keys

* add comments

* use EXACT_SOURCE_TASK in the example

* formatting

* Fix dict index in embedding retrieval

* run style and quality

* run style and quality

* run style and quality

* style

* final fix

* style

* comment out failing tests

* fix generation tests

* fix style and save test

* all testcases

* fix import

* add license header

* reformat

* fix encoder-decoder models

* fix tests running multiple times

* fix paper name for IA3 and add MPT paper

* Trigger CI

* address the recommended changes

* reformat

* address suggestions

* address suggestions

* revert reformatting

* revert reformatting

---------

Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>
2023-08-25 11:42:11 +05:30
6e783780ca MNT: Refactor tuner forward methods for simplicity (#833)
This is to be consistent with changes in #794

Description

The forward methods of several tuner layers were partly unnecessarily
nested, which makes them harder to read and which can conceal bugs (as
in #794). Therefore, these methods have been refactored to (hopefully)
be as readable as possible. Moreover, the different methods are now
coded as similar as possible between the different implementations.

On top of those changes, I made the following adjustments:

- added some type hints to the methods of the layers
- removed a comment about code being copied which I think is not
  necessary
- for the lora embedding layer, we sometimes used F.embedding(...) and
  sometimes nn.Embedding.forward(self, ...) -- now, we consistently use
  only F.embedding(...)
- for IA³ Linear8bitLT, apply dtype conversion regardless of whether
  self.is_feedforward is True or False (as discussed internally)
- for the definition of lora Linear4bit, we (implicitly) checked if bnb
  is available and if bnb 4bit is available, but it is enough to check the
  latter, as it calls the former internally -- now we only check the
  latter, saving us 1 level of indentation
- a few times, ModuleDict.update(<dict>) is called when <dict> has only
  1 single item -- I simplified this code to just assign that item, which
  is more readable and consistent with other nearby code
- removed an unnecessary clone call (was copy/pasted)
2023-08-24 11:04:02 +02:00
fd1c0f66eb Remove backlog section from README.md (#853)
As discussed, since it is not kept up to date.
2023-08-23 17:22:35 +02:00
a4ca8fa3b6 DOC Clarify the new model size in README (#839) 2023-08-23 13:30:08 +02:00
3d9ceb5162 DOC: Add a contribution guide (#848)
As discussed internally, we would like to add a contribution guide to
facilitate contributions from the community and clarify the
requirements.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-08-22 18:35:40 +02:00
bbaafc2fef Release version 0.6.0.dev0 (#849) 2023-08-22 17:04:42 +05:30
573cb35036 DOC Fixed typos in custom_models.mdx (#847) 2023-08-22 11:46:23 +02:00
6c44096c7b Type annotation fix (#840) 2023-08-19 17:53:15 +02:00
4b371b489b [Low-level-API] Add docs about LLAPI (#836)
* add docs about LLAPI

* address comments
2023-08-18 16:05:07 +02:00
87c1d2410e [Tests] Add 4bit slow training tests (#834)
* add 4bit slow training tests

* oops

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-08-18 12:00:20 +02:00
2439203eff fix gptq dockerfile (#835) 2023-08-18 10:54:50 +02:00
312d294fdd Fix unbound error in ia3.py (#794)
Fix an error in IA³'s Linear8bitLt's forward method that resulted in an unbound
local error when using FP16.

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-08-17 13:35:17 +02:00
369a0fba85 TST: add test about loading custom models (#827)
Prompted by #808, I added a test that shows that loading a trained
custom model works as expected. I only added this to custom models
because it involves a few steps of training and I didn't want to slow
down tests too much. LMK if this should be added to all tests.

In addition, I renamed some custom model tests which had strange
names (probably caused by an overeager query-replace).
2023-08-16 10:57:38 +02:00
438b16b8c9 Merging LoRA adapters: concatenation feature, more SVD options (#817)
Added a new feature to concatenate the LoRA weights as a mixing method.
SVD now accepts more options, does not clamp by default anymore.
2023-08-16 10:51:27 +02:00
dbe7e644f1 Only fail quantized Lora unload when actually merging (#822)
Fix an error when unloading and _not_ merging parameters. This used to raise an
error when the weights were quantized, even though the error is not necessary
when there is no merging.
2023-08-16 10:45:58 +02:00
a916465ad0 GPTQ Integration (#771)
* add gptq lora

* fix peft gptq

* fix condition

* fix test

* remove unused weights

* check type

* style

* change attribute

* remove print

* add exllama

* make style

* refactor + fix tests

* remove print

* remove dep on transformers
2023-08-11 17:31:17 -04:00
412d7bc985 Helper function to update model signature (#784)
Provides helper functions in peft.helpers to update the signature of the
forward or generate method of a PeftModel (or subclass). This can be
useful because the wrapping class may override the docstring and type
annotations of the underlying base model. Applying the helper functions
will restore those, leading to better tab completion, help text, etc.

For the time being, these helper functions are purely optional to use.
At a later stage, we may consider applying them automatically, but that
would require testing to ensure that nothing breaks.
2023-08-10 12:14:40 +02:00
7d44026dea fix crash when using torch.nn.DataParallel for LORA inference (#805)
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2023-08-08 15:07:23 +02:00
ba90047d70 Update docstring of PeftModel.from_pretrained (#799)
1. Addresses
https://github.com/huggingface/peft/issues/430#issuecomment-1666312815
2. Reword docstring to not be LoRA-specific
2023-08-08 14:38:23 +02:00
10cf3a4fa3 add lora default target module for codegen (#787)
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2023-08-08 18:08:04 +05:30
aac7722b9e Add adapter error handling (#800)
When a user tries to add a 2nd adapter, Lora and AdaLora make some checks to
ensure the new adapter is compatible with existing adapters. Currently, that
check is performed halfway through the method. This means that if the check
fails, the new adapter is partially applied, leaving the model in a bad state.
The main purpose of this PR is to ensure that the model state is correct after
such a failure is encountered.

Tests were added to catch this potential bug.

While working on this, I also did some related, but not strictly necessary
changes to the add_adapter methods:

- Previously, the peft_config from the PeftModel was passed to the base
  model. This meant that sometimes, the base model would hold a reference
  to PeftModel.peft_config, but not always, as some base models would
  create new dicts. This is problematic, because some code would rely on
  the objects being the same. Now, they are never the same, leading to
  more consistency.
- I think that the check if multiple adapters have biases (which is not
  supported) was accidentally removed by #749. It is added back in.
- Add some type annotations
- Extend docstrings to contain adapter_name
2023-08-08 14:35:19 +02:00
ed396a69ed [core] PEFT refactor + introducing inject_adapter_in_model public method (#749)
Refactors a bit the internals of some PEFT models and introduces a new
method inject_adapter_in_model for users that want to pass a bare model
and a peft config to inject adapters in-place into the model. These
changes are totally BC with the previous PEFT versions.

This PR makes things easier for the PEFT integration in transformers
huggingface/transformers#25077

The main goal of the PR is to expose a new API for advanced users that
want to integrate PEFT method without the need to use the PeftModel
wrapper. A simple use case could be someone that wants to inject adapters
into a model and wants to keep the original class of the model without
having to offload that to peft that will create a PeftModel. I have
faced this issue in huggingface/transformers#25077 Among other things,
this PR refactors some internals of PEFT library, while keeping it fully
backward compatible.

To tackle the main motivation I propose to differentiate things between
two type of adapters

1- adapters that are injectable (LoRA, AdaLoRA, IA3)
2- adapters that are not injectable (the rest)

As a first iteration this API would be supported only for the scenario
1- / therefore I decided to create 2 abstract classes to make things
easy to be able to determine if the adapter layer (e.g. LoraLayer) /
adapter module (e.g. LoraModel) does follow the minimal
requirement (i.e. needed attributes, etc.)

Other related changes:

1- Creates a new property method is_prompt_learning to avoid importing
   PromptLearningConfig all the way down
2- Introduces a new object TUNERS_MAPPING, which is a mapping of
   supported pluggable adapters
3- Creates two abstract classes
3.1- BaseTunerLayer: a mixin to check for minimal required attributes
     that a tuner layer should have active_adapter / _is_plugable
3.2- BaseTuner: a higher level module mixin that should be used for any
     injectable adapters in the future.

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-08-07 16:34:54 +02:00
ec267c644a Allow passing inputs_embeds instead of input_ids (#757)
Resolves #727

Right now, there is an issue with a few PeftModelForXxx classes when
users pass only inputs_embeds but not input_ids. First of all, the batch
size was being derived on input_ids, now it is derived from
inputs_embeds instead if input_ids is None. Furthermore, a few forward
calls to the base model were not passing the inputs_embeds along, which
resulted in errors down the line. These issues have been fixed now.
2023-08-02 16:59:11 +02:00
9b5808938f Support NPU adapter loading (#772) 2023-08-02 12:30:02 +02:00
b10a8cedf6 Support XPU adapter loading (#737) 2023-08-01 15:46:18 +02:00
bfb264ad96 Add progressbar unload/merge (#753)
* add progressbar unload/merge

* make style

* manual fix style

* Update src/peft/tuners/lora.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/peft/tuners/lora.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-08-01 12:26:17 +02:00
702f9377e3 Add tests for AdaLoRA, fix a few bugs (#734)
So far, there have been no tests for AdaLoRA. This PR adds tests similar
to the existing ones. While working on those tests, a few bugs were
encountered and fixed.

The changes made to AdaLoRA:

- Linked to paper abstract, not pdf.
- Don't assume that target modules have a .bias attribute (same as for
  LoRA).
- Fixed an issue where it was assumed that if an output object from
  forward has a .loss attribute, it is a scalar, when it can be None.
- Fixed an issue that when init_lora_weights=False, the weights were
  still initialized to be an identity transform.
- When replacing modules, if a target module is a ModuleList or
  ModuleDict, they are now skipped instead of raising an error that the
  module type is not supported. My reasoning was that it is never intended
  to change those modules, so if their names are matched, it must be a
  false positive. The issue arose because for some target modules, the
  names are just k" etc., and since we match with endswith, this can
  easily lead to modules like "block" to match.
2023-07-28 13:06:53 +02:00
0e33ac1efe DOC: Examples for LoRA with custom models (#724)
Example 1: training a multilayer perceptron
Example 2: fine-tuning a timm image classifier
New section "Developer Guides" in docs.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-07-27 15:28:33 +02:00
e27e883443 [ModulesToSave] add correct hook management for modules to save (#755)
* add correct hook management for modules to save

* forward contrib credits from finding the solution

* add nice GPU tests

* quality

---------

Co-authored-by: BenjaminBossan <BenjaminBossan@users.noreply.github.com>
2023-07-27 10:29:32 +02:00
ffbb6bcf9c Add btlm to officially supported LoRA (#751) 2023-07-26 22:18:37 +05:30
8541b60acb fix adalora inference issue (#745) 2023-07-26 14:29:25 +02:00
96c0277a1b Updated Example in Class:LoraModel (#672)
* updated Example in Class:LoraModel

* update docstring

* Update src/peft/tuners/adalora.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/peft/tuners/lora.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* update adalora.py for doc style check

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-07-24 16:48:59 +02:00
b15c185939 FIX: Disabling adapter works with modules_to_save (#736)
Resolves #493

For LoRA and IA³, there was a bug that even even using the
disable_adapter context, if the module was listed in modules_to_save,
the updated weights would be used instead of the original weights. This
meant that disable_adapter would not return the same results as the base
model without adaptation. This PR fixes the issue and provides a test.

Note: I tried to adjust AdaLoRA too, since it seemed that the same
reasoning should apply there. However, I think that AdaLoRA does not
really support disabling adapters at all. E.g. there is no
disable_adapter_layers method. Therefore, AdaLoRA was not changed.
2023-07-24 13:23:23 +02:00
a955ef1088 ENH: Warn when disabling adapters and bias != 'none' (#741)
For LoRA, given that bias='all' or bias='none', when doing inference
with a model in the disable_adapter context, the output will not be
identical to the output of the base model. This may be surprising to
users. Therefore, a warning is given. Furthermore, the docstring has
been extended to reflect this fact.
2023-07-24 10:34:39 +02:00
e06d94ddeb Fixes warning when initializing prompt encoder (#716)
Right now, when the user initializes a prompt encoder with MLP, they get
a warning that a certain argument is ignored, and there is no possible
value for the argument that would stop the warning. Usually, warnings
are for issues that something is (probably) going wrong, but here,
everything is going as expected. Therefore, by default, I would not give
this warning, thus avoiding users getting confused.

However, I would still give the warning if the user set the argument for
encoder_num_layers explicitly to a different value. In that case, they
expect the change to make a difference, but since the argument is
ignored, their expectation is not met, which warrants a warning.
2023-07-19 16:08:29 +02:00
1681cebf60 [Patch] patch trainable params for 4bit layers (#733)
* patch trainable params for 4bit layers

* revert

* added tests.

* added comments.

* addressed final comments
2023-07-19 14:57:14 +02:00
a09f66c8cd [Llama2] Add disabling TP behavior (#728)
* add disabling TP behavior

* add comments

* adapt from new changes of transformers PR
2023-07-19 14:29:36 +02:00
1869fe6e05 FIX: add type information to package_data (#729)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-07-19 12:35:39 +02:00
1c27e24d50 revert change (#731) 2023-07-19 14:29:55 +05:30
30fd5a4c88 fix the param count when using 4-bit bnb 2023-07-19 13:22:25 +05:30
3040782e04 Add falcon to officially supported LoRA & IA3 modules (#722)
* add falcon to officially supported modules

* add lora

* add also `RefinedWeb`
2023-07-19 11:18:45 +05:30
1b8b17de86 Fix subfolder issue (#721)
* fix subfolder issue

* added tests
2023-07-19 11:17:15 +05:30
029f416fce Release version 0.5.0.dev0 (#717) 2023-07-17 16:30:46 +05:30
a1953baef6 FIX: Removes warnings about unknown pytest marker (#715)
This is a low prio PR but it solves an annoyance.

Right now, when running tests, the output is spammed by messages like:

> PytestUnknownMarkWarning: Unknown pytest.mark.multi_gpu_tests - is
this a typo? ...

This makes it more difficult to see the actually relevant information.
This PR fixes this by registering the two pytest markers we use, thus
removing the warnings.
2023-07-17 15:30:08 +05:30
e90dcc4be4 better hub kwargs management (#712) 2023-07-17 15:28:57 +05:30
71b326db68 FEAT: Make LoRA work with custom models (#676)
Enable custom models to work with LoRA

This PR enables custom models to work with LoRA in peft by performing a few
changes required for non-transformers models. New tests for linear,
transformers conv1d, and conv2d layers were added.

Not yet contained in this PR:

- support for AdaLoRA and IA³
- documentation
- examples

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-07-17 10:02:30 +02:00
42ab10699b [Auto] Support AutoPeftModel for custom HF models (#707)
* support `AutoPeftModel` for custom HF models

* added documentation.
2023-07-15 14:18:34 +02:00
5a0e19dda1 [Feature] Save only selected adapters for LoRA (#705)
* v1 working for LoRA

* more checks

* fix prompt learning issues

* fix failing test

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* fixed indentation

* move the check above

* added tests for adaption prompt, enc-dec and feature extraction

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-07-14 16:28:03 +02:00
86ad5ce55c [Core] Enhancements and refactoring of LoRA method (#695)
* refactor lora and add utils

1. Refactor LoRA code
2. Add method to delete LoRA adapters
3. Add method to unload the PEFT LoRA model.
4. Add `svd` weighted adapter support.
5. minor fixes

* fixes

* fixes

* Update lora.py

* fixes

* Update lora.py

* docstrings for the added public APIs

* docs

* Update src/peft/tuners/lora.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* resolve comments, refactoring and adding tests

* fix the remaining failing tests

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-07-14 19:44:51 +05:30
61a8e3a3bd [WIP] FIX for disabling adapter, adding tests (#683)
This PR deals with some issues with disabling adapter:

- typo in active.adapter
- prompt encoder could be on wrong device
- when using prompt learning + generate, disabling did not work

For the last point, there is a somewhat ugly fix in place for now,
pending a more comprehensive refactor (a comment was added to that
effect).

Comprehensive tests were added to check that everything works now.

The following tests still not working:

- adaption prompt
- seq2seq with prompt tuning/prompt encoding
- stable diffusion is a little bit flaky but test is hopefully robust enough

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-07-14 14:33:33 +02:00
0675541154 Introducing AutoPeftModelForxxx (#694)
* working v1 for LMs

* added tests.

* added documentation.

* fixed ruff issues.

* added `AutoPeftModelForFeatureExtraction` .

* replace with `TypeError`

* address last comments

* added comment.
2023-07-14 11:07:09 +02:00
fa5957f7ca chore: add py.typed (#678)
Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-07-14 13:40:26 +05:30
5265eb7ebd Fix code typo in int8-asr.mdx (#698)
Having `bias="None"` in `LoraConfig` raised a `NotImplementedError`. Replaced it with `bias="none"` as per the [`LoraConfig` reference](https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.LoraConfig) and now the code works, I can run training.
2023-07-14 09:27:37 +02:00
878a8bc990 update Readme to include IA3 (#702) 2023-07-14 09:10:49 +02:00
b1bafca333 Fix a small bug in forward method of IA³ (#696) 2023-07-13 14:39:13 +02:00
92d38b50af add support for Feature Extraction using PEFT (#647)
* add support for embedding with peft

* add example and resolve code quality issues

* update notebook example post fixing the loss

* adding full example with inference notebook

* quality 

* add tests, docs, guide and rename task_type to be inline with Hub

* fixes

* fixes

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update peft_model.py

* fixes

* final fixes

* Update _toctree.yml

* fixes and make style and make quality

* deberta exception with checkpointing

* Update docs/source/task_guides/semantic-similarity-lora.md

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update docs/source/task_guides/semantic-similarity-lora.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* resolve comments

* testing prompt learning methods

* Update testing_common.py

* fix the tests

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2023-07-13 18:04:28 +05:30
5de5c24a8a Init IA³ weights randomly when so configured (#693)
Right now, no matter what the value of init_ia3_weights, these weights
are always initialized to be 1 (i.e. identity transforms). With this
fix, when init_ia3_weights=False, the weights are initialized randomly.
This is a setting mostly used for testing, so this fix has no user
impact.
2023-07-13 12:55:29 +02:00
062d95a09e FIX: base_model_torch_dtype when using model.half() after init (#688) 2023-07-13 11:12:40 +02:00
c33c42f158 Add functionality to support IA3 (#578)
* Added initial ia3 code

* Implemented ia3 correctly for feedforward layers; Fixed regex matching

* Fixed module mapping for mt5

* Merged changes from huggingface:main

* Merged changes

* Fixed lora merge conflicts

* Different bloom config

* Added save option for ia3

* Added loading code for ia3

* Added feedforward implementation in utils and seq cls example

* Added feedforward implementation in utils and seq cls example

* Implemented merge, unmerge, enable/disable adapters functionality

* Fixed feedforward during merge

* Debugging Merge

* Removing debug messages

* Cleaned up repo

* Removed non-IA3 changes

* Refactor save and load

* Added support to all models in tests; Added IA3Config for common tests

* Added half-precision support and test for gradient checkpointing; Formatted jupyter notebooks

* Added target modules for new models GPTBigCode and LLama

* Cleaned up code

* Cleaned up code

* Cleaned up example notebook

* Cleaned up  seq2seq notebook

* Corrected function docstrings; refactored find_and_replace

* Corrected function docstrings; refactored find_and_replace

* Added basic docs for IA3

* Added new conceptual guide in source tree for documentation

* Minor fix to documentation

* Minor fixes to docstrings; Added error handling for 4bit quantization; Cleaned unused merge/unmerge methods

* styling changes after merge from main

* Update src/peft/tuners/ia3.py

Remove unused attribute merge_weights

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

---------

Co-authored-by: Abhishek2304 <abhishekgupta2304@gmail.com>
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-07-13 13:15:50 +05:30
c46d76ae3a Update Dockerfile (#684) 2023-07-11 18:41:52 +02:00
4f542e319f Fix embedding LoRA weights initialization (#681)
When init_lora_weights=False, embedding LoRA weights were initialized as
all zeros, resulting in LoRA becoming an idenitity transform. This is
inconsistent with other module types, where init_lora_weights=False
results in random initialization and thus a non-identity operation.

As init_lora_weights=False is just for internal testing, users should
not be affected by this change. In fact, I updated the doc of this
parameter to - hopefully - better reflect this.

There is no direct test for this change. However, there are tests
in #676 that will fail without this fix, so it is tested indirectly.
2023-07-11 12:20:26 +02:00
b5e341bb8a Added wandb support for lora train_dreambooth (#639)
* Update train_dreambooth.py

Accelerator init updated from logging_dir to project_dir. Newer versions of accelerate uses project_dir. logging_dir is deprecated

* Bugfix: Adapter name variable inserted, when changing LORA_ADAPTER_NAME it causes error

* Adapter name added as kwarg

* Black code formatted

* Style & Quality check

* Wandb import added for logging and project initialization

* Wandb import added for logging and project initialization

* fix project_name

* print tqdm progress to wandb
2023-07-11 13:56:03 +05:30
06fd06a4d2 Remove skipping certain tests (#668)
The generate tests so far were skipped for non-lora, non-prefix tuning
cases. However, those cases are now passing, so it is no longer
necessary to skip the tests.
2023-07-07 14:19:10 +02:00
7d1d959879 Adding support for RoBERTa layers_transform in COMMON_LAYERS_PATTERN (#669)
* fix: add pattern layer to support RoBERTa layers_transform

* chore: fix code quality error
2023-07-07 14:19:01 +02:00
39ef2546d5 Update clm-prompt-tuning.mdx (#652)
Fixed typo that prevented training.
2023-07-06 09:21:23 -07:00
9f7492577f Fix bug resulting in config copies not working (#653)
Resolves #424

The bug was caused by __dict__ being overwritten to return a copy of the
dataclass. This can lead to unpredictable behavior, as shown in the
issue. This fix removes the __dict__ property and preservers the
original behavior where needed.

All three added tests would fail without the fix.
2023-07-06 09:06:41 +02:00
bef8e3584c [docs] API example (#650)
* api example

* apply feedback

* fix format

* make style
2023-07-05 11:19:20 -07:00
032fff92fb Fixed LoraConfig alpha modification on add_weighted_adapter (#654)
* Fixed LoraConfig modification on add_weighted_adapter

* Added test for issue with adding weighted adapter for LoRA

* Fixed formatting
2023-07-01 11:13:25 +05:30
6c8659f8f9 Require Python version 3.8 (#649) 2023-06-30 14:01:41 +02:00
5884bdbea4 Add pytest-cov for reporting test coverage (#641)
As discussed, this adds line coverage to the tests. This will allow us
to identify parts of the code that are missing coverage and make it
easier to ensure newly added code is well covered.

At the moment, CI is not set up to report if new, uncovered code is
being added. We could add codecov to the CI to get this functionality,
but having 100% coverage for new code is not always desired, so it's
debatable if it is needed.

Right now, there are multiple test commands (normal, single, multi GPU).
For each individual command, the coverage report would only include the
lines covered by that command, so the total coverage would be
underreported. It is possible to combine multiple coverage reports into
a single report:

https://coverage.readthedocs.io/en/stable/cmd.html#cmd-combine

Combining the reports will be added in a future PR.
2023-06-30 14:01:02 +02:00
86290e9660 style: tentatively add hints for some public function (#614)
* style: tentatively add hints for some public function

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* fix: import annotations to evaluate to str

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* fix: style

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-28 12:33:16 +05:30
563acf0832 Remove loralib reqs from examples, small fixes (#640)
- As discussed, loralib is no longer required, so the examples from the
  docs have been updated to no longer require loralib as dependencies
- In one example, a missing torch import was added
- In another example, a missing line was added (output of that line is
  shown, but not the line itself)
2023-06-28 12:23:09 +05:30
f4526d57fc importing peft with an old version of bitsandbytes causes an exception (#642) (#646) 2023-06-28 00:52:06 +02:00
d9b0a118af Update peft_model.py (#644) 2023-06-27 23:41:51 +02:00
f5352f08c5 feat(model): Allow from_pretrained to accept PeftConfig class (#612)
* feat(model): Allow from_pretrained to accept PeftConfig class

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* tests: add test cases for config construction

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* chore: address comments and run tools

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

* fix: style

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
2023-06-27 18:27:57 +05:30
48ffd07276 fix ptun and prompt tuning generation issue (#543)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2023-06-27 16:56:47 +05:30
eb01b5ee1d fix Prefix-tuning error in clm Float16 evaluation (#520)
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2023-06-27 13:57:21 +05:30
a7ea02a709 [Bugfix] Inserted adapter_name to get_peft_model_state_dict function (#626)
* Update train_dreambooth.py

Accelerator init updated from logging_dir to project_dir. Newer versions of accelerate uses project_dir. logging_dir is deprecated

* Bugfix: Adapter name variable inserted, when changing LORA_ADAPTER_NAME it causes error

* Adapter name added as kwarg

* Black code formatted

* Style & Quality check
2023-06-27 13:56:54 +05:30
66fd087205 [Bugfix] Fixed LoRA conv2d merge (#637)
* Fixed LoRA conv2d merge

* Fixed typo
2023-06-27 12:18:08 +05:30
0e8932f1cb Add seq2seq prompt tuning support (#519)
* Added prompt tuning for seq2seq and corresponding notebook examples

* Added prompt tuning for seq2seq and corresponding notebook examples

* Added prompt tuning for seq2seq and corresponding notebook examples

* Call encoder with get_encoder() and update notebook example

* Style formatting

* Add seq2seq p-tuning support, and improve seq2seq prompt tuning support, enabling the use of generate()

* Fix imports

* Fix imports

* Add co-author.

Co-authored-by: ZhengxiangShi michaelszx117@gmail.com

* Add co-author.

Co-authored-by: ZhengxiangShi <michaelszx117@gmail.com>

---------

Co-authored-by: Thomas SCHILLACI <tschilla@px101.prod.exalead.com>
Co-authored-by: ZhengxiangShi <michaelszx117@gmail.com>
2023-06-27 11:45:49 +05:30
e2b8e3260d [AdaptionPrompt] Add 8bit + 4bit support for adaption prompt (#604)
* add 8bit support for adaption prompt

* add 4bit support
2023-06-26 15:44:51 +02:00
c476c1e348 add adalora 4bit (#598) 2023-06-26 15:09:22 +02:00
18544647ac Update train_dreambooth.py (#624)
Accelerator init updated from logging_dir to project_dir. Newer versions of accelerate uses project_dir. logging_dir is deprecated
2023-06-26 18:18:24 +05:30
8af8dbd2ec Update README.md, citation (#616)
bibtex was giving me a "too many commas" error, this fixes it
2023-06-23 15:59:34 +05:30
39fc09ec1b update whisper test (#617) 2023-06-23 14:37:43 +05:30
016722addd Added Civitai LoRAs conversion to PEFT, PEFT LoRAs conversion to webui (#596)
* Fixed kohya_ss to peft lora conversion, added script for backward conversion

* Fixed getting alpha from PEFT

---------

Co-authored-by: Alexander Kovalchuk <a.kovalchuk@prequelapp.com>
2023-06-21 19:34:39 +05:30
fd10faedfa stronger import of bnb (#605) 2023-06-21 17:45:04 +05:30
702d06098e add adapter_name in get_peft_model (#610) 2023-06-21 17:43:40 +05:30
0b62b4378b fix final failing tests (#609) 2023-06-20 14:15:00 +02:00
b8b84cb6ce [tests] Fix dockerfile (#608)
* fix dockerfile and test

* relax constraints

* fix

* fix log reports and empty cache

* revert workflow

* add librosa
2023-06-20 12:33:14 +02:00
08cb3dde57 Improve the README when using PEFT (#594)
* add logic

* Update peft_model.py

* fix test failures

* fixes

* fix
2023-06-19 14:19:41 +05:30
03eb378eb9 feat: Add PeftModelForQuestionAnswering (#473)
* Added first try of supporting QuestionAnswering

* Updated example to be correct

* Added changes from PR 404

* Added missing mapping for task type

* Remove unrelated code

* Run make style
2023-06-16 16:53:58 +05:30
6b81d7179f when from_pretrained is called in finetune of lora with flag "is_trainable" True, should not call model.eval() (#591) 2023-06-16 16:34:07 +05:30
0270b7c780 add more CI tests (#586) 2023-06-16 11:06:48 +02:00
38e9c650ba Fix typo at peft_model.py (#588)
Fix typo on description:
- `imputs_embeds` to `inputs_embeds`
2023-06-16 14:28:51 +05:30
9320373c12 LoRA for Conv2d layer, script to convert kohya_ss LoRA to PEFT (#461)
* Added LoRA for Conv2d layer, script to convert kohya_ss linear lora to PEFT

* Fixed code style, added missing safetensors dependency for kohya_ss to peft conversion script
2023-06-15 16:03:38 +05:30
019b7ff9d6 fix adalora device mismatch issue (#583) 2023-06-15 12:25:36 +02:00
b519e3f9e1 [core] Correctly passing the kwargs all over the place (#575)
* v1 of the fix

* forward contrib credits from discussions

* add tests

---------

Co-authored-by: winglian <winglian@users.noreply.github.com>
2023-06-15 12:23:05 +02:00
e48dfc331c Fix minor typo bug-report.yml (#582) 2023-06-15 10:41:03 +02:00
4d51464045 enable lora for mpt (#576)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2023-06-15 13:05:43 +05:30
8563a63af2 [BugFix] Set alpha and dropout defaults in LoraConfig (#390)
* Set alpha and dropout defaults in LoraConfig

* Update src/peft/tuners/lora.py

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-06-15 12:41:35 +05:30
eb75374fb1 add issue template (#562) 2023-06-13 23:03:12 +05:30
1cbc985018 feat: add type hint to get_peft_model (#566)
Signed-off-by: samsja <sami.jaghouar@hotmail.fr>
2023-06-13 23:02:54 +05:30
58f4dee67a Fix typo and url to openai/whisper-large-v2 (#563) 2023-06-13 09:49:20 -07:00
a8d11b36a3 [core] Fix config kwargs (#561)
* fix config kwargs

* style

* fix order
2023-06-13 17:54:49 +02:00
189a6b8e35 [core] Add safetensors integration (#553)
* add v1

* clean up

* more improvements

* add device

* final adjustements

* use `EntryNotFoundError`

* better checks

* add tests and final fixes

* make style && make quality

* remove `push_to_hub` because of the release
2023-06-09 12:33:13 +02:00
e45529b149 improve code readability (#409) 2023-06-08 14:45:35 +05:30
ba7b1011b8 [doc build] Use secrets (#556) 2023-06-07 18:41:51 +02:00
c23be52881 add thousands separator in print_trainable_parameters (#443) 2023-06-07 18:09:17 +05:30
7fb5f90a38 add library name to model card (#549) 2023-06-05 18:44:40 +05:30
fcff23f005 Remove device_map when training 4,8-bit model. (#534)
* Remove device_map when training 4,8-bit model.

* Fix style
2023-06-02 13:37:46 +05:30
42a184f742 Fix a minor typo where a non-default token_dim would crash prompt tuning (#459) 2023-06-01 15:14:06 +05:30
7add756923 Add starcoder model to target modules dict (#528)
* Add starcoder model to target modules dict

* make style and quality
2023-06-01 14:49:58 +05:30
9914e76d5b Fixed problem with duplicate same code. (#517) 2023-06-01 14:47:05 +05:30
668f045972 return load_result when load_adapter (#481) 2023-06-01 14:46:38 +05:30
11b
38f48dd769 fix merge_and_unload when LoRA targets embedding layer (#438) 2023-06-01 14:45:06 +05:30
db55fb34b8 [Llama-Adapter] fix half precision inference + add tests (#456)
* fix + add tests

* forward contrib credits from discussions

---------

Co-authored-by: HamidShojanazeri <HamidShojanazeri@users.noreply.github.com>
2023-06-01 14:44:11 +05:30
76d4ecd40d Enable PeftConfig & PeftModel to load from revision (#433)
* Enable PeftConfig to load from revision

* Add revision to PeftModel

* Fix weights download with revision
2023-06-01 14:39:54 +05:30
27f956a73b [LoRA] Allow applying LoRA at different stages (#429)
* working v1

- working v1
- added tests
- needs some documentation

* more fixes

- stronger tests
- documentation
- remove unneeded common layers pattern

* add more docstring

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* quality & style

* style

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-06-01 09:35:24 +02:00
dd1c0d87fe change comment in tuners.lora, lora_alpha float to int (#448) 2023-06-01 12:22:00 +05:30
207d290865 [docs] Prettify index (#478)
* prettify index

* fix format
2023-05-31 09:32:22 -07:00
5e8ee44091 fix (#524) 2023-05-31 09:22:27 -07:00
662ebe593e [core] Add gradient checkpointing check (#404)
* add automatic input enable gradients when calling `get_peft_model`

* style

* better check

* add 4bit check
2023-05-31 12:14:27 +02:00
c42968617b Remove merge_weights (#392) 2023-05-31 11:38:12 +05:30
3714aa2fff [core] Raise warning on using prepare_model_for_int8_training (#483)
* raise warning on using older method

* Update src/peft/utils/other.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* quality

---------

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2023-05-22 17:47:32 +02:00
0fcc30dd43 [core] Protect 4bit import (#480)
* protect 4bit import

* fix CI

* better check for python 3.7
2023-05-21 00:14:43 +02:00
d6015bc11f 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) (#476)
* 4bit lora

* 4bit test

* fixing 4bits bugs

* fp4 pass variables

* fix inference datatype and generation config

* updating prep for int8 function to work for 4-bit

* Added FP4 LoRA and FP4 fine-tuning example.

* LinearFP4 -> Linear4bit

* fixes

* Fixed 4-bit example.

* Style changes.

* final changes

---------

Co-authored-by: Artidoro Pagnoni <pagnoni.artidoro@gmail.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
2023-05-20 17:47:15 +02:00
4fd374e80d add sound file to docker images (#401) 2023-05-10 09:00:07 +02:00
3d7770bfd5 Update README.md (#399) 2023-05-10 12:13:14 +05:30
f173f97e9d Fix documentation links on index page (#406) 2023-05-10 12:06:57 +05:30
ef8523b5a4 fix index alignment? (#397) 2023-05-10 12:05:17 +05:30
63c5c9a2c0 [CI] Fix CI - pin urlib (#402)
* fix CI - pin urlib

* revert
2023-05-10 11:56:51 +05:30
5ed95f49d0 add accelerate example for DDP and FSDP in sequence classification for non-lora case (#358)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2023-05-10 10:10:30 +05:30
8a3fcd060d do not use self.device. In FSDP cpu offload mode. self.device is "CPU" instead of "cuda" (#352)
and there's error like "Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:1"

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2023-05-10 10:09:28 +05:30
b1059b73aa Release: v0.4.0.dev0 (#391) 2023-05-04 01:38:33 +05:30
1a1cfe3479 fix INT8 prepare function (#389)
* fix INT8 prepare function

* remove unused function args

* fix related tests, examples and docs
2023-05-03 12:47:53 +05:30
8e53e16005 Fix missing arg for transpose in AdaLora (#347)
`transpose() missing 1 required positional argument: 'fan_in_fan_out'`
2023-05-03 12:38:22 +05:30
6a18585f25 Add nn.Embedding Support to Lora (#337)
* add lora embedding

* reformat lora.py using black formatter

* Add Embedding Layer to add_weighted_adapter and address PR

* Refactor unused fan_in_fan_out variable, fix and test bugs in Embedding's merge and unmerge methods

---------

Co-authored-by: splo2t <yu990410@gmail.com>
2023-05-03 12:37:56 +05:30
e509b8207d Fix a link to the example script (#383)
* Fix a link to the example script

* Update README.md
2023-05-03 12:21:15 +05:30
a37156c2c7 [CI] Fix nightly CI issues (#375)
* Update log_reports.py

* Update nightly.yml

* Update Makefile

* Update Makefile

* fixes

* add tabulate

* fix setup

* final fix

* fix nits
2023-05-02 09:59:54 +02:00
632997d1fb [docs] Quicktour update (#346)
* clean up quicktour

* finish draft

* fix format

* apply feedback

* apply feedback

* oopsie
2023-04-27 10:08:22 -07:00
0c1e3b470a [WIP-docs] Accelerate scripts (#355)
* deepspeed script

* apply feedback

* fsdp

* toctree
2023-04-27 09:51:42 -07:00
e8f66b8a42 [docs] Supported models tables (#364)
* supported models moved to index

* minor fix

* Update docs/source/index.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply feedback

* Update docs/source/index.mdx

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-04-26 09:03:40 -04:00
2c2bbb4064 Use try and finally in disable_adapter() to catch exceptions (#368) 2023-04-26 18:26:14 +05:30
3890665e60 [core] Better log messages (#366)
* better log messages

* better log messages

* change order
2023-04-26 12:30:32 +05:30
49a20c16dc [tests] add slow tests to GH workflow (#304)
* add slow tests to GH workflow

* add correct channel
2023-04-25 12:12:16 +02:00
af1849e805 Implement adaption prompt from Llama-Adapter paper (#268)
* Implement adaption prompt from Llama-Adapter paper

* Support multi-adapters

* Refactor adaption prompt to target attn modules instead of layers

* Refactor adaption prompt to be more generic

* Fix adaption prompt not on right device

* Apply suggestions from code review

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* Fix style

* Add support for Llama config use_cache=True

* Fix rebase issues

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-04-25 12:24:18 +05:30
2822398fbe [docs] Conceptual overview of prompting methods (#339)
* conceptual overview of prompting methods

* fix image caption

* minor edits

* update toctree
2023-04-20 07:29:29 -07:00
1ef4b61425 [docs] LoRA conceptual guide (#331)
* WIP LoRA conceptual guide

* conceptual guide for LoRA

* Update docs/source/conceptual_guides/lora.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/conceptual_guides/lora.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* feedback addressed

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-04-20 08:02:38 -04:00
f703cb2414 [docs] Task guide with Dreambooth LoRA example (#330)
* dreambooth lora training part

* dreambooth lora: finetuning and inference

* title update

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* multi-adapter inference + feedback addressed

* make style

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-04-20 07:50:18 -04:00
9413b555c4 fix lora modules_to_save issue (#343)
* fix lora modules_to_save issue

* fix quality
2023-04-20 16:16:13 +05:30
8818740bef [docs] int8 training (#332)
* int8 training

* apply feedback
2023-04-19 12:49:19 -07:00
34027fe813 [docs] LoRA for token classification (#302)
* everything but training

* add to toctree

* complete training section with Trainer

* apply feedback
2023-04-18 08:14:17 -07:00
0bdb54f03f fix gathering for metrics (#327) 2023-04-18 15:49:44 +05:30
4ee024846b feat(ci): add pip caching to CI (#314) 2023-04-18 12:12:34 +02:00
26577aba84 Fix merge_and_unload when having additional trainable modules (#322) 2023-04-17 14:37:17 +02:00
b21559e042 [docs] P-tuning for sequence classification (#281)
* first draft

* minor edits

* apply feedback

* use trainer
2023-04-14 12:06:18 -07:00
c0209c35ab [docs] Task guide for semantic segmentation with LoRA (#307)
* WIP: semantic segmentation example

* make style

* some polishing

* Update docs/source/task_guides/semantic_segmentation_lora.mdx

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* minor update

---------

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-04-14 09:31:19 -04:00
070e3f75f3 [tests] add CI training tests (#311)
* add training tests

* styling
2023-04-14 14:34:01 +02:00
4ca286c333 Fix: unexpected keyword argument 'has_fp16_weights' (#299)
When iterating over the list of named modules, if a Linear8BitLt module came before a non-8 bit
module, the kwargs would be permuted to include the 'has_fp16_weights' and other parameters, and
the next time it had to instantiate a 'torch.nn.Linear' object, the constructor would error out
because **kwargs contained keyword arguments that were not appropriate for the base constructor.

This fixes that by copying kwargs when sending the extra data to Linear8BitLt.
2023-04-14 17:59:37 +05:30
59778af504 Change gather for gather_for_metrics in eval. (#296)
Otherwise the assert at L259 breaks. It's also the thing to do to get more accurate metrics.
2023-04-14 17:57:15 +05:30
10a2a6db5d [docs] Prompt tuning for CLM (#264)
* first draft

* uncomment

* minor edits

* apply feedback
2023-04-13 15:59:15 -07:00
70af02a2bc fix and update examples and readme (#295)
* fix and update examples and readme

* fix formatting issues

* add weighted lora adapters support

* Update README.md
2023-04-13 00:17:13 +05:30
cc82b674b5 [test] Add Dockerfile (#278)
* add peft-gpu dockerfile

* add workflow file

* add cpu dockerfile

* fix dockerfiles

* Apply suggestions from code review

* Update .github/workflows/build_docker_images.yml

* Update .github/workflows/build_docker_images.yml
2023-04-11 10:27:56 +02:00
6ba67723df Merge pull request #288 from bigeagle/main
Fix lora_dropout operator type when dropout=0
2023-04-11 11:09:06 +05:30
202f1d7c3d Merge pull request #272 from stevhliu/prefix-tuning-seq2seq
[docs] Prefix tuning for Seq2Seq
2023-04-10 09:42:51 -07:00
e1c41d7183 minor edits 2023-04-10 09:32:15 -07:00
053573e0df fix lora_dropout operator type when dropout=0 2023-04-10 21:03:07 +08:00
1117d47721 Merge pull request #283 from huggingface/smangrul/multi-lora-support
fix trainable params setting
2023-04-08 13:57:08 +05:30
f982d75fa0 Merge pull request #261 from younesbelkada/fix-half-prec
Fix half precision forward
2023-04-08 13:56:36 +05:30
fdebf8ac4f Merge pull request #256 from younesbelkada/add-gpu-tests
[`tests`] Adds GPU tests
2023-04-08 13:56:18 +05:30
ff282c2a8f Update peft_model.py 2023-04-08 11:45:32 +05:30
7b7038273a fix trainable params issue 2023-04-08 11:32:00 +05:30
0422df466e Fix typo in examples/causal_language_modeling/peft_lora_clm_accelerate_ds_zero3_offload.py (#275) (#277)
Co-authored-by: rmilleti <rmilleti@amazon.com>
2023-04-07 17:49:35 +02:00
f35b20a845 add and fix tests 2023-04-07 10:48:22 +00:00
c22a57420c Merge remote-tracking branch 'upstream/main' into fix-half-prec 2023-04-07 10:36:58 +00:00
04689b6535 make style 2023-04-07 10:35:39 +00:00
bd1b4b5aa9 Merge branch 'main' into add-gpu-tests 2023-04-07 12:34:16 +02:00
445940fb7b Merge pull request #263 from huggingface/smangrul/multi-lora-support
Multi Adapter support
2023-04-07 04:40:59 +05:30
e8b0085d2b fixing adalora saving and loading 2023-04-07 04:08:10 +05:30
31560c67fb first draft 2023-04-06 14:10:37 -07:00
d5feb8b787 fixing 🐛 2023-04-06 21:17:54 +05:30
3258b709a3 fix 🐛 2023-04-06 20:41:36 +05:30
a591b4b905 final fix I guess 2023-04-06 20:04:04 +05:30
3aaf482704 fix 2023-04-06 20:02:31 +05:30
07a4b8aacc fix 🐛 2023-04-06 19:56:55 +05:30
b6c751455e 😅 2023-04-06 19:31:21 +05:30
dee2a96fea Update adalora.py 2023-04-06 19:22:56 +05:30
b728f5f559 🐛 fixing 2023-04-06 19:20:13 +05:30
74e2a3da50 😅 2023-04-06 19:16:37 +05:30
1a6151b91f Merge branch 'main' into smangrul/multi-lora-support 2023-04-06 19:10:48 +05:30
7397160435 making adalora compatible with multiple adapters 2023-04-06 19:06:33 +05:30
75808eb2a6 Merge branch 'main' into smangrul/multi-lora-support 2023-04-06 19:05:31 +05:30
382b178911 Merge pull request #260 from younesbelkada/add-pix2struct
Add BLIP2 Example
2023-04-06 11:45:55 +05:30
a7d5e518c3 Merge pull request #233 from QingruZhang/main
The Implementation of AdaLoRA (ICLR 2023)
2023-04-06 11:30:30 +05:30
072da6d9d6 Run make style and make quality 2023-04-05 20:52:12 +00:00
4f8c134102 raise exception for MergedLinear of AdaLoRA 2023-04-05 20:40:06 +00:00
b8a57a3649 Update src/peft/tuners/adalora.py
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-04-05 16:32:23 -04:00
d892beb0e7 Update src/peft/mapping.py
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-04-05 16:31:58 -04:00
9a534d047c Update src/peft/tuners/adalora.py
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-04-05 16:31:46 -04:00
c240a9693c Update src/peft/tuners/adalora.py
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-04-05 16:31:08 -04:00
b3e6ef6224 Update src/peft/tuners/adalora.py
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-04-05 16:30:16 -04:00
3e6a88a8f9 Update src/peft/tuners/adalora.py
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-04-05 16:23:48 -04:00
37e1f9ba34 fix test 2023-04-05 17:17:01 +00:00
d936aa9349 fix tests 2023-04-05 19:54:02 +05:30
7888f699f5 Merge branch 'main' into smangrul/multi-lora-support 2023-04-05 14:09:01 +05:30
deff03f2c2 [tests] Adds more tests + fix failing tests (#238)
* adds more tests

- refactor tests
- add enc-dec tests
- skips generate tests for non-lora adapters

* rm unneeded file

* fix tests

* fix

* more checks

* fix issue
2023-04-05 10:17:26 +02:00
405f68f54a fix doc failure 2023-04-05 12:40:02 +05:30
44f3e86b62 Update config.py 2023-04-05 03:35:08 +05:30
75131959d1 fix 🐛 2023-04-05 02:03:31 +05:30
b9433a8208 😅 2023-04-05 01:39:51 +05:30
6f1f26f426 😅 2023-04-05 01:30:07 +05:30
dbdb8f3757 fix more 🐛 2023-04-05 01:19:19 +05:30
41b2fd770f 😅 2023-04-05 01:13:41 +05:30
d4c2bc60e4 fix more 🐛 2023-04-05 01:04:42 +05:30
122f708ae8 😅. Fix 🐛 2023-04-04 20:05:59 +05:30
18ccde8e86 fixing 🐛 2023-04-04 19:44:58 +05:30
d4b64c8280 fix 🐛 2023-04-04 18:27:23 +05:30
96ca100e34 Update lora.py 2023-04-04 18:03:32 +05:30
bd80d61b2a fix 🐛 2023-04-04 17:44:24 +05:30
6e0f124df3 Merge branch 'main' into smangrul/multi-lora-support 2023-04-04 17:04:32 +05:30
7ed9ad04bf revert changes 2023-04-04 11:01:07 +00:00
8266e2ee4f fix half precision forward 2023-04-04 10:01:47 +00:00
8c83386ef4 few fixes 2023-04-04 08:37:32 +00:00
4cbd6cfd43 revert 2023-04-04 08:31:37 +00:00
96cd039036 fix 2023-04-04 08:29:51 +00:00
46ab59628c revert 2023-04-04 08:23:47 +00:00
f569bc682b Update src/peft/peft_model.py 2023-04-04 10:21:38 +02:00
af6794e424 add blip2 2023-04-04 08:18:47 +00:00
c7e22ccd75 v1 2023-04-04 07:59:03 +00:00
3d1e87cb78 Merge remote-tracking branch 'upstream/main' into add-pix2struct 2023-04-04 07:58:50 +00:00
c2ef46f145 v1 2023-04-04 07:58:48 +00:00
e29d6511f5 more description 2023-04-04 07:06:57 +00:00
4d3b4ab206 add whisper tests 2023-04-04 06:56:14 +00:00
127a74baa2 Merge pull request #257 from toncho11/main
Fixing a bug where a wrong parameter name is used for the offload_folder
2023-04-03 22:30:31 +05:30
93f1d35cc7 Merge pull request #250 from tpoisonooo/patch-1
Update other.py
2023-04-03 22:19:50 +05:30
697b6a3fe1 Merge pull request #252 from guspan-tanadi/main
docs: have fix bit typo README
2023-04-03 22:18:42 +05:30
9299c88a43 Merge pull request #254 from huggingface/younesbelkada-patch-1
[`Automation`] Update stale.py
2023-04-03 22:17:34 +05:30
2fe22da3a2 fix CI 2023-04-03 16:47:09 +00:00
50191cd1ec Merge pull request #255 from huggingface/stas00-patch-1
[resources] replace pdf links with abs links
2023-04-03 09:47:08 -07:00
c2e9a6681a fix import 2023-04-03 16:40:02 +00:00
2b8c4b0416 remove from init 2023-04-03 16:38:53 +00:00
519c07fb00 add import_utils 2023-04-03 16:30:08 +00:00
ff9a1edbfd Fixing a bug where a wrong parameter name is used. 2023-04-03 18:28:11 +02:00
8058709d5a fix failing CIs 2023-04-03 16:27:30 +00:00
f413e3bdaf v1 GPU tests 2023-04-03 16:11:10 +00:00
45d7aab39a Update README.md 2023-04-03 08:51:01 -07:00
4ddb85ce1e Update stale.py 2023-04-03 17:08:42 +02:00
dd30335ffd [Automation] Add stale bot (#247)
* add stale bot

* fix
2023-04-03 14:31:11 +02:00
39cbd7d8ed docs: have fix bit typo README
Improve readability
2023-04-03 16:13:33 +07:00
7ef47be5f5 Update other.py
typo
2023-04-03 14:02:13 +08:00
e536616888 [core] Fix offload issue (#248)
* fix offload dir

* remove offload index

* safety checker

* forward contrib credits from previous PR

---------

Co-authored-by: cosimoiaia <cosimoiaia@users.noreply.github.com>
2023-04-01 14:54:46 +02:00
11edb618c3 Merge pull request #240 from stevhliu/build-notebooks
[docs] Build notebooks from Markdown
2023-03-31 17:01:09 -07:00
cfe992f0f9 make style 2023-03-31 16:54:12 -07:00
f948a9b4ae build notebooks 2023-03-31 16:45:11 -07:00
86f4e45dcc Merge pull request #241 from stevhliu/add-api-docs
[docs] Add API references
2023-03-31 14:48:35 -07:00
8e61e26370 fix kwargs 2023-03-31 14:41:14 -07:00
622a5a231e clean up docstrings 2023-03-31 14:30:05 -07:00
7c31f51567 use explicit path 2023-03-31 13:46:54 -07:00
47f05fe7b5 fix path to loralayer too 2023-03-31 13:46:02 -07:00
8fd53e0045 fix path to peftconfigmixin? 2023-03-31 13:46:02 -07:00
39fb96316f first draft of api docs 2023-03-31 13:41:33 -07:00
165ee0b5ff Merge pull request #239 from MKhalusova/task-guide-image-classification
Move image classification example to the docs
2023-03-31 11:15:01 -04:00
221b39256d feedback addressed 2023-03-31 09:34:00 -04:00
de2a46a2f9 version fix 2023-03-31 09:29:41 -04:00
8a6004232b Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-03-31 08:55:31 -04:00
1d01a70d92 Merge pull request #243 from guspan-tanadi/main
Have fix typo in README
2023-03-31 10:24:25 +05:30
4d27c0c467 Have fix typo in README
notebook provider name in capitalization
2023-03-31 09:41:00 +07:00
9ced552e65 doc building fixes 2023-03-30 12:29:29 -04:00
d49cde41a7 make style 2023-03-30 12:09:28 -04:00
e4dcfaf1b3 task guide based on notebook 2023-03-30 11:30:55 -04:00
8f63f565c6 [utils] add merge_lora utility function (#227)
* add merge_lora utility function

* forward contrib credits from original script

* some changes

* make style

* fix tets

* finally fix tests

* Update tests/test_peft_model.py

* adapt from suggestions

* adapt

* Update src/peft/tuners/lora.py

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

* fix 8bit

* Update src/peft/tuners/lora.py

Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>

---------

Co-authored-by: edbeeching <edbeeching@users.noreply.github.com>
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-03-30 13:45:37 +02:00
f15548ebeb Merge branch 'huggingface:main' into main 2023-03-30 02:24:45 -04:00
d4292300a0 Finish the test for model load and save 2023-03-30 06:19:45 +00:00
e3b4cd4671 Implement the save_pretrained for AdaLoRA 2023-03-30 01:59:26 -04:00
300abd1439 refine the key of rank pattern 2023-03-30 05:53:04 +00:00
ccf53ad489 Save the rank pattern into the config file 2023-03-30 05:41:28 +00:00
542f2470e7 Merge pull request #214 from huggingface/smangrul/add-docs
add docs
2023-03-30 11:10:14 +05:30
98f51e0876 Merge branch 'main' into smangrul/add-docs 2023-03-30 10:46:37 +05:30
c7b5280d3c Merge pull request #231 from aitor-gamarra/main
Show CONFIG_NAME instead of "config.json"
2023-03-30 10:29:05 +05:30
d3a48a891e save rank pattern 2023-03-30 01:04:11 +00:00
ce61e2452a define the resize function 2023-03-29 21:03:48 -04:00
d6ae6650b2 Finish the test for rank finalization 2023-03-30 00:45:33 +00:00
1141b125d0 Impelment the budget finalization 2023-03-29 19:56:23 -04:00
d6c68ae1a5 Show CONFIG_NAME instead of "config.json" 2023-03-29 21:03:39 +02:00
df71b84341 [CI] Add more ci tests (#223)
* add more tests

* fix

* add generate tests

* make style

* fix test

* add -n

* skip llama
2023-03-29 15:28:38 +02:00
d8d1007732 Causal LM generation fix for prefix tuning: GPT2 model (#222)
* expand attention mask after preparing generation inputs for prefix tuning

* reformat

* Update src/peft/peft_model.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* reformat as per black

---------

Co-authored-by: Vineet Kumar <vineeku6@in.ibm.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-03-29 15:20:14 +02:00
51f49a5fe4 Merge pull request #224 from younesbelkada/fix-ci-dreambooth
Fix failing test on `main`
2023-03-29 17:15:42 +05:30
4626b36e27 addressing remaining comments 2023-03-29 17:12:32 +05:30
d242dc0e72 merge the conflit' 2023-03-28 23:19:14 -04:00
2c84a5ecdd Merge remote-tracking branch 'upstream/main' 2023-03-28 23:13:48 -04:00
002da1b450 fix bugs 2023-03-28 20:19:06 +05:30
7c8ee5814a Update peft_model.py 2023-03-28 19:40:06 +05:30
090d074399 Update lora.py 2023-03-28 19:38:22 +05:30
8ec7cb8435 Update lora.py 2023-03-28 19:36:41 +05:30
e9d45da4c5 Update lora.py 2023-03-28 19:35:49 +05:30
64cae2aab2 Update lora.py 2023-03-28 19:34:04 +05:30
7d7c598647 Update peft_model.py 2023-03-28 19:32:21 +05:30
af252b709b Update peft_model.py 2023-03-28 19:29:24 +05:30
891584c8d9 fix ci dreambooth 2023-03-28 13:55:43 +00:00
c21afbe868 multi adapter for training and inference
Might have breaking changes
2023-03-28 18:56:24 +05:30
13476a807c Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2023-03-27 13:44:00 +05:30
3d00af4799 add docs 2023-03-24 13:16:26 +05:30
098962fa65 Merge pull request #204 from haofanwang/patch-1
Update train_dreambooth.py
2023-03-24 10:29:09 +05:30
d8c3b6bca4 Fix CI tests (#210)
* fix ci ruuner creation

* fix python versions

* fix setup and test
2023-03-23 12:54:52 +01:00
2632e7eba7 [CI] Add ci tests (#203)
* add ci tests

* fix some tests

* fix tests

* rename

* fix

* update tests

* try

* temp hotfix

* refactor tests

* Update .github/workflows/tests.yml

* fix test
2023-03-23 12:38:40 +01:00
b5b3ae3cbe Update train_dreambooth.py 2023-03-21 21:38:15 +08:00
13e53fc7ee Merge pull request #180 from mymusise/chatglm
ChatGLM support
2023-03-20 15:02:58 +05:30
54b6ce2c0e ChatGLM support
Signed-off-by: mymusise <mymusise1@gmail.com>
2023-03-16 19:09:12 +08:00
64f63a7df2 Merge pull request #167 from dumpmemory/update-readme-patch-for-zero-init
Update README.md
2023-03-15 22:33:42 +05:30
df0e1fb592 [core] Fix peft multi-gpu issue (#145)
* add multi-gpu support

* rm deepcopy

* tryo to comment

* style

* fix nits
2023-03-14 13:06:33 +01:00
1c11bc067f Merge pull request #160 from zphang/llama
Support for LLaMA models
2023-03-14 11:18:29 +05:30
3b3fc47f84 Merge pull request #170 from PanQiWei/temporarily_replace_prepare_inputs_for_generation
Replace base_model's function temporarily
2023-03-14 11:05:01 +05:30
321cbd6829 Merge pull request #172 from huggingface/smangrul/fix-megred-linear-lora-issues
fixing merged_linear lora issues
2023-03-14 10:56:44 +05:30
43cb7040c6 fixing merged_linear lora issues 2023-03-13 16:01:28 +05:30
644d68ee6f changed: 1. replace base_model.prepare_inputs_for_generation and base_model._prepare_encoder_decoder_kwargs_for_generation temporarily 2023-03-10 10:36:11 +08:00
354bea8719 Update README.md 2023-03-09 22:03:40 +08:00
e85c18f019 Update README.md
add one caveat situation for using LoRA + ZeRO 3 setting.
2023-03-09 22:01:16 +08:00
50aaf99da7 Merge pull request #166 from huggingface/smangrul/release-v0.3.0.dev0
release v0.3.0.dev0
2023-03-09 09:24:05 +05:30
80c96de277 release v0.3.0.dev0 2023-03-09 09:23:24 +05:30
eb07373477 Merge pull request #165 from huggingface/smangrul/add-trl-example-in-readme
minor changes
2023-03-09 08:58:09 +05:30
f1980e9be2 minor changes 2023-03-09 08:57:52 +05:30
8777b5606d Merge pull request #164 from huggingface/smangrul/add-trl-example-in-readme
Update README.md
2023-03-09 08:53:56 +05:30
4497d6438c Update README.md 2023-03-09 08:53:36 +05:30
3d898adb26 Merge pull request #157 from huggingface/smangrul/lora_fixes_and_updates_wrt_trl
lora fixes and adding 8bitMegredLinear lora
2023-03-08 23:06:25 +05:30
842b09a280 Merge pull request #159 from zphang/prefix_citation
Add Prefix Tuning citation
2023-03-08 23:05:17 +05:30
91c69a80ab Merge pull request #162 from dumpmemory/fix_count
fix count
2023-03-08 23:03:58 +05:30
c1199931de Merge pull request #163 from alvanli/alvanli/add-local-saving-whisper-largev2
Add local saving for whisper largev2 example notebook
2023-03-08 23:02:36 +05:30
5e788b329d Use on save callback 2023-03-08 10:05:53 -05:00
48dc4c624e Add callback to save to local 2023-03-08 09:57:13 -05:00
d2b99c0b62 fix count
num_params should be directly used.
2023-03-08 18:41:30 +08:00
baa2a4d53f LLaMA support 2023-03-07 21:05:58 -05:00
27c2701555 Add Prefix Tuning citation 2023-03-07 19:17:35 -05:00
a43ef6ec72 fixing ds conv1D issue thanks to @dumpmemory 2023-03-08 00:53:08 +05:30
c81b6680e7 adding 8bitMegredLinear lora 2023-03-07 17:59:02 +05:30
8358b27445 Merge pull request #149 from huggingface/smangrul/fixes
minor fixes to the examples
2023-03-07 14:08:25 +05:30
b9451ab458 fixing issues and quality 2023-03-07 14:04:19 +05:30
ce4e6f3dd9 Merge pull request #150 from mayank31398/mayank/single-module
support option for encoder only prompts
2023-03-04 09:03:26 +05:30
53eb209387 support option for encoder only prompts 2023-03-03 23:43:25 +05:30
a84414f6de minor fixes to the examples 2023-03-03 19:36:13 +05:30
2c532713ad Merge pull request #125 from SauravMaheshkar/minimal-structure
chore: update `pyproject.toml`
2023-03-02 19:07:59 +05:30
fa65b95b9e update comment 2023-03-02 01:11:14 +00:00
0a0c6ea6ea adalora training example 2023-03-02 01:08:41 +00:00
7471035885 finish the testing and debugging 2023-03-02 01:04:48 +00:00
35cd771c97 example 2023-03-01 21:55:31 +00:00
1a3680d8a7 test for adalora example 2023-03-01 21:52:33 +00:00
510f172c58 adalora example 2023-03-01 21:26:07 +00:00
6a03e43cbc peft import 2023-03-01 02:47:33 -05:00
4acd811429 target module mapping for adalora 2023-03-01 02:43:27 -05:00
be86f90490 Implement the AdaLoRA 2023-02-28 23:18:19 -05:00
26b84e6fd9 add adalora example 2023-02-28 23:14:25 -05:00
94f00b7d27 chore: update Makefile with ruff commands 2023-02-28 10:46:07 +00:00
7820a539dd fix(pyproject.toml): update known_first_party
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-02-28 16:10:38 +05:30
81eec9ba70 train script 2023-02-27 21:08:55 -05:00
47601bab7c chore: drop setup.cfg as we shifted to ruff 2023-02-28 02:58:12 +05:30
99901896cc style: switch to ruff 2023-02-27 10:50:10 +00:00
5c7fe97753 Merge branch 'huggingface:main' into minimal-structure 2023-02-27 10:37:41 +00:00
aa18556c56 Merge pull request #140 from zanussbaum/zero_init
fix: count params when zero init'd
2023-02-27 13:18:30 +05:30
e6bf09db80 fix: count params when zero init'd 2023-02-26 22:31:20 -05:00
681ce93cc1 Merge pull request #134 from gabinguo/main
issue#126: torch.load device issue.
2023-02-25 11:43:43 +05:30
85ad682530 issue#126: torch.load device issue. 2023-02-25 07:09:07 +01:00
e19ee681ac Merge pull request #127 from huggingface/smangrul/make-activation-checkpointing-optional
fixes `prepare_for_int8_training`
2023-02-25 02:13:04 +05:30
83d6d55d4b address layernorm issue 2023-02-24 18:24:27 +05:30
7dfb472424 make gradient checkpointing optional when using PEFT+INT8 2023-02-24 13:02:40 +05:30
a78f8a0495 style: move isort and pytest config to pyproject 2023-02-23 11:34:18 +00:00
6175ee2c4c chore: drop MANIFEST 2023-02-23 11:30:14 +00:00
a3537160dc fix autocast issue (#121) 2023-02-23 09:17:40 +01:00
75925b1aae Merge pull request #117 from huggingface/smangrul/fix-lora-merging-in-inference
fix merging lora weights for inference
2023-02-22 00:20:40 +05:30
1ef0f89a0c add util for getting the base model 2023-02-22 00:14:24 +05:30
e6ef85a711 fix merging lora weights for inference 2023-02-22 00:00:36 +05:30
6f2803e8a7 Merge pull request #109 from huggingface/smangrul/add-gpt-neox
add `EleutherAI/gpt-neox-20b` to support matrix
2023-02-18 12:38:04 +05:30
1c9d197693 add EleutherAI/gpt-neox-20b to support matrix 2023-02-18 12:37:02 +05:30
592b1dd99f Merge pull request #106 from huggingface/smangrul/add-diable-adapter-context-manager
add disable adapter context manager
2023-02-17 20:32:31 +05:30
3240c0bb36 Merge pull request #107 from mrm8488/patch-1
Fix typo
2023-02-17 20:30:27 +05:30
e8fbcfcac3 Fix typo 2023-02-17 15:08:47 +01:00
1a8928c5a4 Update lora.py 2023-02-17 17:48:16 +05:30
173dc3dedf add disable_adapter context manager 2023-02-17 17:40:45 +05:30
dbf44fe316 [core] Some changes with prepare_model_for_training & few fixes (#105)
* changes

* apply to other notebooks
2023-02-17 10:49:11 +01:00
648fcb397c Merge pull request #104 from huggingface/smangrul/make_lora_target_modules_accept_regex
add support for regex target modules in lora
2023-02-17 15:00:58 +05:30
7aadb6d9ec add support for regex target modules in lora 2023-02-17 14:52:03 +05:30
49842e1961 Merge pull request #97 from huggingface/smangrul/make-bnb-optional
making `bnb` optional
2023-02-16 22:33:51 +05:30
44d0ac3f25 fix 2023-02-16 20:15:48 +05:30
43a9a42991 fix 2023-02-16 20:11:39 +05:30
145b13c238 making bnb optional 2023-02-16 20:07:06 +05:30
8ace5532b2 Merge pull request #95 from huggingface/smangrul/add-whisper-example
adding whisper large peft+int8 training example
2023-02-16 17:43:46 +05:30
c1281b96ff resolving comments and running jupyter black 2023-02-16 17:42:28 +05:30
ca7b46209a adding whisper large peft+int8 training example 2023-02-16 15:06:42 +05:30
81285f30a5 Merge pull request #90 from huggingface/smangrul/fix-prepare-inputs-for-training
making `prepare_model_for_training` flexible
2023-02-16 11:38:35 +05:30
c9b225d257 revert 2023-02-15 16:58:33 +05:30
af7414a67d fix forward signature 2023-02-15 15:44:26 +05:30
6d6149cf81 preventing other 1D layers to be casted in FP32 2023-02-15 14:03:47 +05:30
a31dfa3001 Merge pull request #86 from younesbelkada/add-flan-t5-int8
[`bnb`] add flan-t5 example
2023-02-15 09:02:24 +01:00
afa7739131 update 2023-02-15 08:01:56 +00:00
f1ee1e4c0f making prepare_model_for_training flexible 2023-02-15 12:51:23 +05:30
ed5a7bff6b Merge pull request #85 from younesbelkada/int8-wrapper
[`core`]  add `prepare_model_for_training`
2023-02-15 11:21:03 +05:30
42a793e2f5 update 2023-02-14 16:45:37 +00:00
eb8362bbe1 update 2023-02-14 16:44:18 +00:00
5733ea9f64 add flan example 2023-02-14 13:50:22 +00:00
36c7e3b441 apply suggestions 2023-02-14 11:50:55 +00:00
0e80648010 add prepare_model_for_training 2023-02-14 11:12:37 +00:00
be0e79c271 Merge pull request #68 from mayank31398/mayank/fp32-prompt-tuning
convert prompt tuning vocab to fp32
2023-02-13 11:26:46 +05:30
5acd392880 Merge pull request #77 from huggingface/smangrul/update-readme
Update README.md
2023-02-13 11:16:20 +05:30
951119fcfa Update README.md 2023-02-13 11:14:35 +05:30
29d608f481 Merge remote-tracking branch 'huggingface/main' into mayank/fp32-prompt-tuning 2023-02-13 11:13:18 +05:30
15de814bb4 Merge pull request #73 from Muhtasham/patch-1
Fixed typo in Readme
2023-02-13 10:56:32 +05:30
a29a12701e Fixed typo in Readme
Also added links to datasets and models, plus enhanced config render with yaml command
2023-02-11 21:58:18 +01:00
3bd50315a6 Merge pull request #72 from huggingface/sayakpaul-patch-1
Update README.md
2023-02-11 15:32:38 +05:30
45186ee04e Update README.md 2023-02-11 15:02:53 +05:30
c8e215b989 Merge pull request #69 from huggingface/smangrul/release-v0.2.0dev0
release v0.2.0.dev0
2023-02-10 15:45:22 +05:30
d1735e098c release v0.2.0.dev0 2023-02-10 15:43:06 +05:30
c53ea2c9f4 fp32 2023-02-10 09:39:02 +05:30
f8e737648a Merge pull request #67 from huggingface/smangrul/fix-save-pretrained
make `save_pretrained` work in a way training could be resumed
2023-02-10 00:14:39 +05:30
b1af297707 make save_pretrained work in a way training could be resumed 2023-02-10 00:06:25 +05:30
85c7b98307 Merge pull request #66 from huggingface/smangrul/update-bibtex
update bibtex
2023-02-09 17:26:17 +05:30
e41152e5f1 update bibtex 2023-02-09 17:25:52 +05:30
9f19ce6729 Merge pull request #64 from kashif/patch-1
Fix typos in readme
2023-02-09 17:18:36 +05:30
ae85e185ad another typo 2023-02-09 10:59:56 +01:00
93762cc658 Fix typos in readme 2023-02-09 10:38:18 +01:00
ed608025eb Merge pull request #63 from huggingface/vision-examples
add: vision examples to readme.
2023-02-09 13:57:11 +05:30
14a293a6b3 PeftModel => get_pefT_model() 2023-02-09 13:34:21 +05:30
c7b744db79 add: vision examples to readme. 2023-02-09 12:23:48 +05:30
250edccdda Merge pull request #59 from sayakpaul/example/sem-seg
add: example on semantic segmentation.
2023-02-09 12:09:18 +05:30
1daf087682 reword some things. 2023-02-09 11:22:50 +05:30
d3d601d5c3 Merge pull request #55 from huggingface/smangrul/fix-examples-with-hub-utils
many code fixes and updates to examples
2023-02-08 18:56:48 +05:30
8083c9515f update README and fix token_cls example 2023-02-08 18:54:46 +05:30
73cd16b7b5 quality 2023-02-08 18:43:00 +05:30
65112b75bb Merge branch 'main' into smangrul/fix-examples-with-hub-utils 2023-02-08 18:41:19 +05:30
3cf0b7a2d4 fix more examples 2023-02-08 18:40:57 +05:30
afb171eefb fixes and updating examples 2023-02-08 18:07:15 +05:30
b07ea17f49 update examples 2023-02-08 14:55:08 +05:30
83ded43ee7 Update peft_lora_clm_accelerate_ds_zero3_offload.py 2023-02-08 13:35:10 +05:30
537c971a47 fix 2023-02-08 13:05:27 +05:30
ed0c962ff5 fixes 2023-02-08 12:59:29 +05:30
eec0b9329d Update peft_lora_clm_accelerate_ds_zero3_offload.py 2023-02-08 12:41:27 +05:30
1929a84e1e remove peft_model_load_and_dispatch as it is part of PeftModel.from_pretrained 2023-02-08 12:29:03 +05:30
522a6b6c17 add load_and_dispatch to load_pretrained 2023-02-08 12:18:03 +05:30
462b65fe45 fix lora_only 2023-02-08 10:26:56 +05:30
2b89fbf963 add: example on semantic segmentation. 2023-02-08 09:49:13 +05:30
b5c97f2039 Update save_and_load.py 2023-02-08 09:25:21 +05:30
64d2d19598 update peft_model_load_and_dispatch 2023-02-08 09:21:49 +05:30
a7dd034710 fix prefix tuning config to remove function field as it cannot be converted to json 2023-02-08 08:49:15 +05:30
ed0bcdac4f Merge pull request #58 from sayakpaul/patch-1
Update image classification README.md to include the latest Colab Notebook link
2023-02-07 19:00:11 +05:30
bdeb3778d0 add support for generate when using prompt_tuning 2023-02-07 15:07:56 +05:30
185c852088 Update README.md 2023-02-07 12:53:37 +05:30
a1b7e42783 Merge pull request #56 from sayakpaul/example/img-cls
add: example on fine-tuning for image classification.
2023-02-07 12:51:56 +05:30
3c4b64785f Update README.md 2023-02-07 11:11:36 +05:30
ab43d6aa5c fix: inference section 2023-02-07 11:04:39 +05:30
3cf7034e9c Empty commit.
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-02-07 09:57:57 +05:30
ddb37c353c add: correct Colab link. 2023-02-07 09:55:07 +05:30
dbe3b9b99e add: example on fine-tuning for image classification. 2023-02-07 09:53:34 +05:30
5bc815e2e2 fix generate because of recent transformers release 2023-02-06 23:50:48 +05:30
5a43a3a321 seq cls examples update 2023-02-06 18:57:13 +05:30
7ae63299a8 Merge pull request #53 from younesbelkada/add-int8-example
[ `example`] add bnb example
2023-02-03 21:00:41 +05:30
57de1d2677 add bnb example 2023-02-02 17:45:38 +01:00
383b5abb33 Merge pull request #51 from huggingface/smangrul/lora-raise-error-when-no-target-module-found
for lora, raise error when no target modules in base model
2023-02-02 13:35:47 +05:30
d8ccd7d84c for lora, raise error when no target modules in base model 2023-02-02 13:29:49 +05:30
df5b201c6b Merge pull request #50 from huggingface/smangrul/add-modules-to-save-to-lora-config
add `modules_to_save` to LoraConfig and other fixes
2023-02-02 13:19:37 +05:30
44d8e72ca8 fixes 2023-02-02 13:19:14 +05:30
c37ee25be7 trying diff approaches 2023-02-01 19:35:19 +05:30
c884daf96a getting rid to forward call linking 2023-02-01 19:18:38 +05:30
fcd213708d fixes 2023-02-01 17:17:14 +05:30
915a5db0c6 fixes 2023-02-01 16:25:42 +05:30
d53a631608 fixes 2023-02-01 15:59:24 +05:30
b4d0885203 Merge pull request #49 from orenwang/main
fix validation_steps handling in dreambooth example
2023-02-01 15:42:35 +05:30
d04f6661ee add modules_to_save to LoraConfig and other fixes
1. Add `modules_to_save` to LoraConfig
2. Using PeftModel for LoraConfig instead of task-specific classes because LoRA is task agnostic.
2023-02-01 15:41:35 +05:30
80e1b262e5 fix validation_steps handling in dreambooth example 2023-01-31 10:59:46 +08:00
dd518985ff Merge pull request #47 from orenwang/main
allow validation images for lora training
2023-01-30 16:13:28 +05:30
a17cea104e add validation images for lora training 2023-01-30 17:18:43 +08:00
3f9b310c6a Merge pull request #46 from huggingface/smangrul/fix-hf-hub-utils-tests
fix hf hub util tests
2023-01-30 13:35:06 +05:30
06e49c0a87 fixes 2023-01-30 13:31:01 +05:30
6cf2cf5dae fix hf hub util tests 2023-01-30 12:51:30 +05:30
3faaf0916a Merge pull request #39 from younesbelkada/add-push-to-hub
[`core`] Add hub utils
2023-01-30 12:34:13 +05:30
6c9534e660 adapt for other models 2023-01-29 11:18:31 +00:00
22295c4278 adapt from code review
- remove `README`
- inherit from `dataclass`
- add new test
2023-01-29 10:49:31 +00:00
16182ea972 Apply suggestions from code review
Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com>
2023-01-29 11:41:38 +01:00
ad69958e52 Merge branch 'main' into add-push-to-hub 2023-01-28 10:17:57 +01:00
f8a2829318 Merge pull request #38 from huggingface/smangrul/fixes
adding support for int8 lora training
2023-01-27 19:20:12 +05:30
634f3692d8 working v1
- push to hub method works
- add tests
- add config super class
- add Lora support for `from_pretrained`
2023-01-26 11:17:24 +00:00
2cc7f2cbac add config tests 2023-01-26 10:12:51 +00:00
2896cf05fb v1 working
- from_pretrained support for config
- from_pretrained support for loramodel
- todo: tests
- todo: push_to_hub
2023-01-25 22:43:22 +00:00
776a28f053 update lora to support int8 training 2023-01-25 12:27:02 +05:30
d75746be70 adding support for int8 lora training 2023-01-25 04:19:23 +05:30
1dbe7fc0db Merge pull request #37 from huggingface/smangrul/fixes
colab notebook example for lora peft application
2023-01-24 21:20:06 +05:30
ff8a5b9a69 colab notebook example for lora peft application 2023-01-24 21:19:47 +05:30
36267af51b Merge pull request #36 from huggingface/smangrul/fixes
correcting requirements.txt in example sub-folders
2023-01-22 11:21:01 +05:30
fef162cff8 correcting requirements.txt in example sub-folders 2023-01-22 11:20:39 +05:30
a8587916c8 Merge pull request #35 from huggingface/smangrul/fixes
fixes and addressing comments from previous PR
2023-01-21 18:31:28 +05:30
77670ead76 fixes and addressing comments from previous PR
1. Minor updates/fixes in README.md and setup.py
2. Make `loralib` optional
2023-01-21 18:17:19 +05:30
360fb2f816 Merge pull request #34 from huggingface/fix-typos
Review & fix typos
2023-01-21 18:00:12 +05:30
a40f20ad6c Fix typos 2023-01-20 11:34:45 -05:00
407482eb37 Merge pull request #33 from huggingface/smangrul/fixes
fixes, docs and version bump up
2023-01-20 15:34:29 +05:30
d9e7d6cd22 fixes, docs and version bump up 2023-01-20 15:34:11 +05:30
dbf438f99d Merge pull request #32 from huggingface/v0.0.1-release
V0.0.1 release
2023-01-20 14:37:57 +05:30
369 changed files with 188467 additions and 10896 deletions

70
.github/ISSUE_TEMPLATE/bug-report.yml vendored Normal file
View File

@ -0,0 +1,70 @@
name: "\U0001F41B Bug Report"
description: Submit a bug report to help us improve the library
body:
- type: textarea
id: system-info
attributes:
label: System Info
description: Please share your relevant system information with us
placeholder: peft & accelerate & transformers version, platform, python version, ...
validations:
required: true
- type: textarea
id: who-can-help
attributes:
label: Who can help?
description: |
Your issue will be replied to more quickly if you can figure out the right person to tag with @.
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
All issues are read by one of the core maintainers, so if you don't know who to tag, just leave this blank and
a core maintainer will ping the right person.
Please tag fewer than 3 people.
Library: @benjaminbossan @sayakpaul
Documentation: @stevhliu
placeholder: "@Username ..."
- type: checkboxes
id: information-scripts-examples
attributes:
label: Information
description: 'The problem arises when using:'
options:
- label: "The official example scripts"
- label: "My own modified scripts"
- type: checkboxes
id: information-tasks
attributes:
label: Tasks
description: "The tasks I am working on are:"
options:
- label: "An officially supported task in the `examples` folder"
- label: "My own task or dataset (give details below)"
- type: textarea
id: reproduction
validations:
required: true
attributes:
label: Reproduction
description: |
Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
Please provide the simplest reproducer as possible so that we can quickly fix the issue. When you paste
the error message, please include the full traceback.
placeholder: |
Reproducer:
- type: textarea
id: expected-behavior
validations:
required: true
attributes:
label: Expected behavior
description: "A clear and concise description of what you would expect to happen."

View File

@ -0,0 +1,30 @@
name: "\U0001F680 Feature request"
description: Submit a proposal/request for a new feature
labels: [ "feature" ]
body:
- type: textarea
id: feature-request
validations:
required: true
attributes:
label: Feature request
description: |
A clear and concise description of the feature proposal. Please provide a link to the paper and code in case they exist.
- type: textarea
id: motivation
validations:
required: true
attributes:
label: Motivation
description: |
Please outline the motivation for the proposal. Is your feature request related to a problem?
- type: textarea
id: contribution
validations:
required: true
attributes:
label: Your contribution
description: |
Is there any way that you could help, e.g. by submitting a PR?

View File

@ -0,0 +1,172 @@
name: Build Docker images (scheduled)
on:
workflow_dispatch:
workflow_call:
schedule:
- cron: "0 1 * * *"
concurrency:
group: docker-image-builds
cancel-in-progress: false
env:
CI_SLACK_CHANNEL: ${{ secrets.CI_DOCKER_CHANNEL }}
jobs:
latest-cpu:
name: "Latest Peft CPU [dev]"
runs-on:
group: aws-general-8-plus
steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
uses: actions/checkout@v3
- name: Login to DockerHub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
- name: Build and Push CPU
uses: docker/build-push-action@v4
with:
context: ./docker/peft-cpu
push: true
tags: huggingface/peft-cpu
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: 🤗 Results of the PEFT-CPU docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-cuda:
name: "Latest Peft GPU [dev]"
runs-on:
group: aws-general-8-plus
steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
uses: actions/checkout@v3
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
- name: Build and Push GPU
uses: docker/build-push-action@v4
with:
context: ./docker/peft-gpu
push: true
tags: huggingface/peft-gpu
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: 🤗 Results of the PEFT-GPU docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-cuda-bnb-source:
name: "Latest Peft GPU + bnb source [dev]"
runs-on:
group: aws-general-8-plus
steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
uses: actions/checkout@v3
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
- name: Build and Push GPU
uses: docker/build-push-action@v4
with:
context: ./docker/peft-gpu-bnb-source
push: true
tags: huggingface/peft-gpu-bnb-source
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: 🤗 Results of the PEFT-GPU (bnb source / HF latest) docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-cuda-bnb-source-latest:
name: "Latest Peft GPU + bnb source [accelerate / peft / transformers latest]"
runs-on:
group: aws-general-8-plus
steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
uses: actions/checkout@v3
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
- name: Build and Push GPU
uses: docker/build-push-action@v4
with:
context: ./docker/peft-gpu-bnb-latest
push: true
tags: huggingface/peft-gpu-bnb-latest
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: 🤗 Results of the PEFT-GPU (bnb source / HF source) docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-cuda-bnb-source-multi:
name: "Latest Peft GPU + bnb (multi-backend) source [accelerate / peft / transformers source]"
runs-on:
group: aws-general-8-plus
steps:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
uses: actions/checkout@v3
- name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
- name: Build and Push GPU
uses: docker/build-push-action@v4
with:
context: ./docker/peft-gpu-bnb-multi-source
push: true
tags: huggingface/peft-gpu-bnb-multi-source
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: 🤗 Results of the PEFT-GPU (bnb source multi-backend / HF latest) docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}

View File

@ -0,0 +1,20 @@
name: Build documentation
on:
push:
branches:
- main
- doc-builder*
- v*-release
jobs:
build:
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
with:
commit_sha: ${{ github.sha }}
package: peft
notebook_folder: peft_docs
custom_container: huggingface/transformers-doc-builder
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}

View File

@ -0,0 +1,17 @@
name: Build PR Documentation
on:
pull_request:
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
build:
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
with:
commit_sha: ${{ github.event.pull_request.head.sha }}
pr_number: ${{ github.event.number }}
package: peft
custom_container: huggingface/transformers-doc-builder

View File

@ -0,0 +1,82 @@
name: integration tests
on:
workflow_dispatch:
inputs:
branch:
description: 'Branch to test on'
required: true
jobs:
run_transformers_integration_tests:
strategy:
fail-fast: false
matrix:
transformers-version: ['main', 'latest']
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.branch }}
repository: ${{ github.event.pull_request.head.repo.full_name }}
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
cache: "pip"
cache-dependency-path: "setup.py"
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install .[test]
if [ "${{ matrix.transformers-version }}" == "main" ]; then
pip install -U git+https://github.com/huggingface/transformers.git
else
echo "Nothing to do as transformers latest already installed"
fi
- name: Test transformers integration
run: |
cd .. && git clone https://github.com/huggingface/transformers.git && cd transformers/ && git rev-parse HEAD
RUN_SLOW=1 pytest tests/peft_integration/test_peft_integration.py
run_diffusers_integration_tests:
strategy:
fail-fast: false
matrix:
# For now diffusers integration is not on PyPI
diffusers-version: ['main']
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.branch }}
repository: ${{ github.event.pull_request.head.repo.full_name }}
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
cache: "pip"
cache-dependency-path: "setup.py"
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install .[test]
if [ "${{ matrix.diffusers-version }}" == "main" ]; then
pip install -U git+https://github.com/huggingface/diffusers.git
else
echo "Nothing to do as diffusers latest already installed"
fi
- name: Test diffusers integration
run: |
cd .. && git clone https://github.com/huggingface/diffusers.git && cd diffusers/ && git rev-parse HEAD
pytest tests/lora/test_lora_layers_peft.py

249
.github/workflows/nightly-bnb.yml vendored Normal file
View File

@ -0,0 +1,249 @@
name: BNB from source self-hosted runner with slow tests (scheduled)
on:
workflow_dispatch:
schedule:
- cron: "0 2 * * *"
env:
RUN_SLOW: "yes"
IS_GITHUB_CI: "1"
# To be able to run tests on CUDA 12.2
NVIDIA_DISABLE_REQUIRE: "1"
SLACK_API_TOKEN: ${{ secrets.SLACK_API_TOKEN }}
jobs:
run_all_tests_single_gpu:
timeout-minutes: 60
strategy:
fail-fast: false
matrix:
docker-image-name: ["huggingface/peft-gpu-bnb-source:latest", "huggingface/peft-gpu-bnb-latest:latest", "huggingface/peft-gpu-bnb-multi-source:latest"]
runs-on:
group: aws-g6-4xlarge-plus
env:
CUDA_VISIBLE_DEVICES: "0"
TEST_TYPE: "single_gpu_${{ matrix.docker-image-name }}"
container:
image: ${{ matrix.docker-image-name }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
defaults:
run:
shell: bash
steps:
- uses: actions/checkout@v3
- name: Pip install
run: |
source activate peft
pip install -e . --no-deps
pip install pytest-reportlog pytest-cov parameterized datasets scipy einops
pip install "pytest>=7.2.0,<8.0.0" # see: https://github.com/huggingface/transformers/blob/ce4fff0be7f6464d713f7ac3e0bbaafbc6959ae5/setup.py#L148C6-L148C26
mkdir transformers-clone && git clone https://github.com/huggingface/transformers.git transformers-clone # rename to transformers clone to avoid modules conflict
if [ "${{ matrix.docker-image-name }}" == "huggingface/peft-gpu-bnb-latest:latest" ]; then
cd transformers-clone
transformers_version=$(pip show transformers | grep '^Version:' | cut -d ' ' -f2 | sed 's/\.dev0//')
echo "Checking out tag for Transformers version: v$transformers_version"
git fetch --tags
git checkout tags/v$transformers_version
cd ..
fi
- name: Test bnb import
id: import
if: always()
run: |
source activate peft
python3 -m bitsandbytes
python3 -c "import bitsandbytes as bnb"
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
title: 🤗 Results of bitsandbytes import
status: ${{ steps.import.outcome }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
- name: Run examples on single GPU
id: examples_tests
if: always()
run: |
source activate peft
make tests_examples_single_gpu_bnb
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
title: 🤗 Results of bitsandbytes examples tests - single GPU
status: ${{ steps.examples_tests.outcome }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
- name: Run core tests on single GPU
id: core_tests
if: always()
run: |
source activate peft
make tests_core_single_gpu_bnb
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
title: 🤗 Results of bitsandbytes core tests - single GPU
status: ${{ steps.core_tests.outcome }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
# TODO: this is a test to see if BNB multi-backend single-GPU tests succeed w/o regression tests
# - name: Run BNB regression tests on single GPU
# id: regression_tests
# if: always()
# run: |
# source activate peft
# make tests_gpu_bnb_regression
# - name: Post to Slack
# if: always()
# uses: huggingface/hf-workflows/.github/actions/post-slack@main
# with:
# slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
# title: 🤗 Results of bitsandbytes regression tests - single GPU
# status: ${{ steps.regression_tests.outcome }}
# slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
- name: Run transformers tests on single GPU
id: transformers_tests
if: always()
run: |
source activate peft
make transformers_tests
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
title: 🤗 Results of bitsandbytes transformers tests - single GPU
status: ${{ steps.transformers_tests.outcome }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
- name: Generate Report
if: always()
run: |
pip install slack_sdk tabulate
python scripts/log_reports.py --slack_channel_name bnb-daily-ci-collab >> $GITHUB_STEP_SUMMARY
run_all_tests_multi_gpu:
timeout-minutes: 60
strategy:
fail-fast: false
matrix:
docker-image-name: ["huggingface/peft-gpu-bnb-source:latest", "huggingface/peft-gpu-bnb-latest:latest", "huggingface/peft-gpu-bnb-multi-source:latest"]
runs-on:
group: aws-g6-12xlarge-plus
env:
CUDA_VISIBLE_DEVICES: "0,1"
TEST_TYPE: "multi_gpu_${{ matrix.docker-image-name }}"
container:
image: ${{ matrix.docker-image-name }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
defaults:
run:
shell: bash
steps:
- uses: actions/checkout@v3
- name: Pip install
run: |
source activate peft
pip install -e . --no-deps
pip install pytest-reportlog pytest-cov parameterized datasets scipy einops
pip install "pytest>=7.2.0,<8.0.0" # see: https://github.com/huggingface/transformers/blob/ce4fff0be7f6464d713f7ac3e0bbaafbc6959ae5/setup.py#L148C6-L148C26
mkdir transformers-clone && git clone https://github.com/huggingface/transformers.git transformers-clone
if [ "${{ matrix.docker-image-name }}" == "huggingface/peft-gpu-bnb-latest:latest" ]; then
cd transformers-clone
transformers_version=$(pip show transformers | grep '^Version:' | cut -d ' ' -f2 | sed 's/\.dev0//')
echo "Checking out tag for Transformers version: v$transformers_version"
git fetch --tags
git checkout tags/v$transformers_version
cd ..
fi
- name: Test bnb import
id: import
if: always()
run: |
source activate peft
python3 -m bitsandbytes
python3 -c "import bitsandbytes as bnb"
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
title: 🤗 Results of bitsandbytes import
status: ${{ steps.import.outcome }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
- name: Run core GPU tests on multi-gpu
if: always()
run: |
source activate peft
- name: Run examples on multi GPU
id: examples_tests
if: always()
run: |
source activate peft
make tests_examples_multi_gpu_bnb
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
title: 🤗 Results of bitsandbytes examples tests - multi GPU
status: ${{ steps.examples_tests.outcome }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
- name: Run core tests on multi GPU
id: core_tests
if: always()
run: |
source activate peft
make tests_core_multi_gpu_bnb
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
title: 🤗 Results of bitsandbytes core tests - multi GPU
status: ${{ steps.core_tests.outcome }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
- name: Run transformers tests on multi GPU
id: transformers_tests
if: always()
run: |
source activate peft
make transformers_tests
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.BNB_SLACK_CHANNEL_ID }}
title: 🤗 Results of bitsandbytes transformers tests - multi GPU
status: ${{ steps.transformers_tests.outcome }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
- name: Generate Report
if: always()
run: |
pip install slack_sdk tabulate
python scripts/log_reports.py --slack_channel_name bnb-daily-ci-collab >> $GITHUB_STEP_SUMMARY

110
.github/workflows/nightly.yml vendored Normal file
View File

@ -0,0 +1,110 @@
name: Self-hosted runner with slow tests (scheduled)
on:
workflow_dispatch:
schedule:
- cron: "0 2 * * *"
env:
RUN_SLOW: "yes"
IS_GITHUB_CI: "1"
# To be able to run tests on CUDA 12.2
NVIDIA_DISABLE_REQUIRE: "1"
SLACK_API_TOKEN: ${{ secrets.SLACK_API_TOKEN }}
jobs:
run_all_tests_single_gpu:
strategy:
fail-fast: false
runs-on:
group: aws-g6-4xlarge-plus
env:
CUDA_VISIBLE_DEVICES: "0"
TEST_TYPE: "single_gpu"
container:
image: huggingface/peft-gpu:latest
options: --gpus all --shm-size "16gb" -e NVIDIA_DISABLE_REQUIRE=true
defaults:
run:
shell: bash
steps:
- uses: actions/checkout@v3
- name: Pip install
run: |
source activate peft
pip install -e . --no-deps
pip install pytest-reportlog
- name: Run common tests on single GPU
run: |
source activate peft
make tests_common_gpu
- name: Run examples on single GPU
run: |
source activate peft
make tests_examples_single_gpu
- name: Run core tests on single GPU
run: |
source activate peft
make tests_core_single_gpu
- name: Run regression tests on single GPU
run: |
source activate peft
make tests_regression
- name: Generate Report
if: always()
run: |
pip install slack_sdk tabulate
python scripts/log_reports.py >> $GITHUB_STEP_SUMMARY
run_all_tests_multi_gpu:
strategy:
fail-fast: false
runs-on:
group: aws-g6-12xlarge-plus
env:
CUDA_VISIBLE_DEVICES: "0,1"
TEST_TYPE: "multi_gpu"
container:
image: huggingface/peft-gpu:latest
options: --gpus all --shm-size "16gb" -e NVIDIA_DISABLE_REQUIRE=true
defaults:
run:
shell: bash
steps:
- uses: actions/checkout@v3
- name: Pip install
run: |
source activate peft
pip install -e . --no-deps
pip install pytest-reportlog
- name: Run core GPU tests on multi-gpu
run: |
source activate peft
- name: Run common tests on multi GPU
run: |
source activate peft
make tests_common_gpu
- name: Run examples on multi GPU
run: |
source activate peft
make tests_examples_multi_gpu
- name: Run core tests on multi GPU
run: |
source activate peft
make tests_core_multi_gpu
- name: Generate Report
if: always()
run: |
pip install slack_sdk tabulate
python scripts/log_reports.py >> $GITHUB_STEP_SUMMARY

30
.github/workflows/stale.yml vendored Normal file
View File

@ -0,0 +1,30 @@
name: Stale Bot
on:
schedule:
- cron: "0 15 * * *"
jobs:
close_stale_issues:
name: Close Stale Issues
if: github.repository == 'huggingface/peft'
runs-on: ubuntu-latest
permissions:
issues: write
pull-requests: write
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
steps:
- uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Install requirements
run: |
pip install PyGithub
- name: Close stale issues
run: |
python scripts/stale.py

59
.github/workflows/test-docker-build.yml vendored Normal file
View File

@ -0,0 +1,59 @@
name: Test Docker images (on PR)
on:
pull_request:
paths:
# Run only when DockerFile files are modified
- "docker/**"
jobs:
get_changed_files:
name: "Build all modified docker images"
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- name: Check out code
uses: actions/checkout@v3
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@1c8e6069583811afb28f97afeaf8e7da80c6be5c #v42
with:
files: docker/**
json: "true"
- name: Run step if only the files listed above change
if: steps.changed-files.outputs.any_changed == 'true'
id: set-matrix
env:
ALL_CHANGED_FILES: ${{ steps.changed-files.outputs.all_changed_files }}
run: |
echo "matrix=${{ steps.changed-files.outputs.all_changed_files}}" >> $GITHUB_OUTPUT
build_modified_files:
needs: get_changed_files
name: Build Docker images on modified files
runs-on: ubuntu-latest
if: ${{ needs.get_changed_files.outputs.matrix }} != ''
strategy:
fail-fast: false
matrix:
docker-file: ${{ fromJson(needs.get_changed_files.outputs.matrix) }}
steps:
- name: Cleanup disk
run: |
sudo ls -l /usr/local/lib/
sudo ls -l /usr/share/
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo du -sh /usr/local/lib/
sudo du -sh /usr/share/
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
uses: actions/checkout@v3
- name: Build Docker image
uses: docker/build-push-action@v4
with:
file: ${{ matrix.docker-file }}
context: .
push: False

36
.github/workflows/tests-main.yml vendored Normal file
View File

@ -0,0 +1,36 @@
name: tests on transformers main
on:
push:
branches: [main]
paths-ignore:
- 'docs/**'
jobs:
tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python 3.11
uses: actions/setup-python@v4
with:
python-version: 3.11
cache: "pip"
cache-dependency-path: "setup.py"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
# cpu version of pytorch
pip install -U git+https://github.com/huggingface/transformers.git
pip install -e .[test]
- name: Test with pytest
run: |
make test
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.SLACK_CHANNEL_ID }}
title: 🤗 Results of transformers main tests
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}

61
.github/workflows/tests.yml vendored Normal file
View File

@ -0,0 +1,61 @@
name: tests
on:
push:
branches: [main]
paths-ignore:
- 'docs/**'
pull_request:
paths-ignore:
- 'docs/**'
jobs:
check_code_quality:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.8"
cache: "pip"
cache-dependency-path: "setup.py"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .[dev]
- name: Check quality
run: |
make quality
tests:
needs: check_code_quality
strategy:
# TODO: remove 'fail-fast' line once timeout issue from the Hub is solved
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11"]
os: ["ubuntu-latest", "macos-12", "windows-latest"]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
cache: "pip"
cache-dependency-path: "setup.py"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
# cpu version of pytorch
pip install -e .[test]
- name: Downgrade numpy on MacOS and Windows
# TODO: remove numpy downgrade on MacOS & Windows once torch fixes numpy 2.0 issue
shell: bash
if: matrix.os == 'windows-latest' || matrix.os == 'macos-12'
run: |
pip install --force-reinstall -U "numpy<2.0.0"
- name: Test with pytest
run: |
make test

View File

@ -0,0 +1,52 @@
name: torch compile tests
on:
workflow_dispatch:
inputs:
branch:
description: 'Branch to test on'
required: true
pytorch_nightly:
description: 'Whether to use PyTorch nightly (true/false)'
required: false
default: false
env:
RUN_SLOW: "yes"
IS_GITHUB_CI: "1"
# To be able to run tests on CUDA 12.2
NVIDIA_DISABLE_REQUIRE: "1"
jobs:
run_tests_with_compile:
runs-on:
group: aws-g6-4xlarge-plus
env:
PEFT_DEBUG_WITH_TORCH_COMPILE: 1
CUDA_VISIBLE_DEVICES: "0"
TEST_TYPE: "single_gpu_huggingface/peft-gpu-bnb-latest:latest"
container:
image: "huggingface/peft-gpu-bnb-latest:latest"
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
defaults:
run:
shell: bash
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.branch }}
repository: ${{ github.event.pull_request.head.repo.full_name }}
- name: Pip install
run: |
source activate peft
pip install -e . --no-deps
pip install pytest-cov pytest-reportlog parameterized datasets scipy einops
pip install "pytest>=7.2.0,<8.0.0" # see: https://github.com/huggingface/transformers/blob/ce4fff0be7f6464d713f7ac3e0bbaafbc6959ae5/setup.py#L148C6-L148C26
if [ "${{ github.event.inputs.pytorch_nightly }}" = "true" ]; then
python -m pip install --upgrade --pre torch --index-url https://download.pytorch.org/whl/nightly/cpu
fi
- name: Test compile with pytest
run: |
source activate peft
echo "PEFT_DEBUG_WITH_TORCH_COMPILE=$PEFT_DEBUG_WITH_TORCH_COMPILE"
make tests_torch_compile

15
.github/workflows/trufflehog.yml vendored Normal file
View File

@ -0,0 +1,15 @@
on:
push:
name: Secret Leaks
jobs:
trufflehog:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Secret Scanning
uses: trufflesecurity/trufflehog@main

View File

@ -0,0 +1,16 @@
name: Upload PR Documentation
on:
workflow_run:
workflows: ["Build PR Documentation"]
types:
- completed
jobs:
build:
uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@main
with:
package_name: peft
secrets:
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}

2
.gitignore vendored
View File

@ -138,4 +138,4 @@ dmypy.json
.DS_Store
# More test things
wandb
wandb

13
.pre-commit-config.yaml Normal file
View File

@ -0,0 +1,13 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.1
hooks:
- id: ruff
args:
- --fix
- id: ruff-format
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
- id: check-merge-conflict
- id: check-yaml

View File

@ -1 +0,0 @@
include LICENSE

View File

@ -1,19 +1,61 @@
.PHONY: quality style test docs
check_dirs := src examples
check_dirs := src tests examples docs scripts docker
# Check that source code meets quality standards
# this target runs checks on all files
quality:
black --check $(check_dirs)
isort --check-only $(check_dirs)
flake8 $(check_dirs)
doc-builder style src --max_len 119 --check_only
ruff check $(check_dirs)
ruff format --check $(check_dirs)
doc-builder style src/peft tests docs/source --max_len 119 --check_only
# Format source code automatically and check is there are any problems left that need manual fixing
style:
black $(check_dirs)
isort $(check_dirs)
doc-builder style src --max_len 119
ruff check --fix $(check_dirs)
ruff format $(check_dirs)
doc-builder style src/peft tests docs/source --max_len 119
test:
python -m pytest -n 3 tests/ $(if $(IS_GITHUB_CI),--report-log "ci_tests.log",)
tests_examples_multi_gpu:
python -m pytest -m multi_gpu_tests tests/test_gpu_examples.py $(if $(IS_GITHUB_CI),--report-log "multi_gpu_examples.log",)
tests_examples_single_gpu:
python -m pytest -m single_gpu_tests tests/test_gpu_examples.py $(if $(IS_GITHUB_CI),--report-log "single_gpu_examples.log",)
tests_core_multi_gpu:
python -m pytest -m multi_gpu_tests tests/test_common_gpu.py $(if $(IS_GITHUB_CI),--report-log "core_multi_gpu.log",)
tests_core_single_gpu:
python -m pytest -m single_gpu_tests tests/test_common_gpu.py $(if $(IS_GITHUB_CI),--report-log "core_single_gpu.log",)
tests_common_gpu:
python -m pytest tests/test_decoder_models.py $(if $(IS_GITHUB_CI),--report-log "common_decoder.log",)
python -m pytest tests/test_encoder_decoder_models.py $(if $(IS_GITHUB_CI),--report-log "common_encoder_decoder.log",)
tests_examples_multi_gpu_bnb:
python -m pytest -m "multi_gpu_tests and bitsandbytes" tests/test_gpu_examples.py $(if $(IS_GITHUB_CI),--report-log "multi_gpu_examples.log",)
tests_examples_single_gpu_bnb:
python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_gpu_examples.py $(if $(IS_GITHUB_CI),--report-log "single_gpu_examples.log",)
tests_core_multi_gpu_bnb:
python -m pytest -m "multi_gpu_tests and bitsandbytes" tests/test_common_gpu.py $(if $(IS_GITHUB_CI),--report-log "core_multi_gpu.log",)
tests_core_single_gpu_bnb:
python -m pytest -m "single_gpu_tests and bitsandbytes" tests/test_common_gpu.py $(if $(IS_GITHUB_CI),--report-log "core_single_gpu.log",)
tests_gpu_bnb_regression:
python -m pytest tests/bnb/test_bnb_regression.py $(if $(IS_GITHUB_CI),--report-log "bnb_regression_gpu.log",)
# For testing transformers tests for bnb runners
transformers_tests:
RUN_SLOW=1 python -m pytest transformers-clone/tests/quantization/bnb $(if $(IS_GITHUB_CI),--report-log "transformers_tests.log",)
tests_regression:
python -m pytest -s --regression tests/regression/ $(if $(IS_GITHUB_CI),--report-log "regression_tests.log",)
tests_torch_compile:
python -m pytest tests/test_torch_compile.py $(if $(IS_GITHUB_CI),--report-log "compile_tests.log",)

324
README.md
View File

@ -19,43 +19,67 @@ limitations under the License.
<p>State-of-the-art Parameter-Efficient Fine-Tuning (PEFT) methods</p>
</h3>
Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters. Fine-tuning large-scale PLMs is often prohibitively costly. In this regard, PEFT methods only fine-tune a small number of (extra) model parameters, thereby greatly decreasing the computational and storage costs. Recent State-of-the-Art PEFT techniques achieve performance comparable to that of full fine-tuning.
Fine-tuning large pretrained models is often prohibitively costly due to their scale. Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of large pretrained models to various downstream applications by only fine-tuning a small number of (extra) model parameters instead of all the model's parameters. This significantly decreases the computational and storage costs. Recent state-of-the-art PEFT techniques achieve performance comparable to fully fine-tuned models.
Seamlessly integrated with 🤗 Accelerate for large scale models leveraging PyTorch FSDP.
PEFT is integrated with Transformers for easy model training and inference, Diffusers for conveniently managing different adapters, and Accelerate for distributed training and inference for really big models.
Supported methods:
> [!TIP]
> Visit the [PEFT](https://huggingface.co/PEFT) organization to read about the PEFT methods implemented in the library and to see notebooks demonstrating how to apply these methods to a variety of downstream tasks. Click the "Watch repos" button on the organization page to be notified of newly implemented methods and notebooks!
1. LoRA: [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685.pdf)
2. Prefix Tuning: [P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks](https://arxiv.org/pdf/2110.07602.pdf)
3. P-Tuning: [GPT Understands, Too](https://arxiv.org/pdf/2103.10385.pdf)
4. Prompt Tuning: [The Power of Scale for Parameter-Efficient Prompt Tuning](https://arxiv.org/pdf/2104.08691.pdf)
Check the PEFT Adapters API Reference section for a list of supported PEFT methods, and read the [Adapters](https://huggingface.co/docs/peft/en/conceptual_guides/adapter), [Soft prompts](https://huggingface.co/docs/peft/en/conceptual_guides/prompting), and [IA3](https://huggingface.co/docs/peft/en/conceptual_guides/ia3) conceptual guides to learn more about how these methods work.
## Getting started
## Quickstart
Install PEFT from pip:
```bash
pip install peft
```
Prepare a model for training with a PEFT method such as LoRA by wrapping the base model and PEFT configuration with `get_peft_model`. For the bigscience/mt0-large model, you're only training 0.19% of the parameters!
```python
from transformers import AutoModelForSeq2SeqLM
from peft import get_peft_config, get_peft_model, LoRAConfig, TaskType
from peft import get_peft_config, get_peft_model, LoraConfig, TaskType
model_name_or_path = "bigscience/mt0-large"
tokenizer_name_or_path = "bigscience/mt0-large"
peft_config = LoRAConfig(
peft_config = LoraConfig(
task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
# output: trainable params: 2359296 || all params: 1231940608 || trainable%: 0.19151053100118282
"trainable params: 2359296 || all params: 1231940608 || trainable%: 0.19151053100118282"
```
## Use Cases
To load a PEFT model for inference:
### Get comparable performance to full finetuning by adapting LLMs to downstream tasks using consumer hardware
```py
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
import torch
GPU memory required for adapting LLMs on the few-shot dataset `ought/raft/twitter_complaints`. Here, settings considered
are full finetuning, PEFT-LoRA using plain PyTorch and PEFT-LoRA using DeepSpeed with CPU Offloading.
model = AutoPeftModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
Hardware: Single A100 80GB GPU with CPU RAM above 64GB
model.eval()
inputs = tokenizer("Preheat the oven to 350 degrees and place the cookie dough", return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=50)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])
"Preheat the oven to 350 degrees and place the cookie dough in the center of the oven. In a large bowl, combine the flour, baking powder, baking soda, salt, and cinnamon. In a separate bowl, combine the egg yolks, sugar, and vanilla."
```
## Why you should use PEFT
There are many benefits of using PEFT but the main one is the huge savings in compute and storage, making PEFT applicable to many different use cases.
### High performance on consumer hardware
Consider the memory requirements for training the following models on the [ought/raft/twitter_complaints](https://huggingface.co/datasets/ought/raft/viewer/twitter_complaints) dataset with an A100 80GB GPU with more than 64GB of CPU RAM.
| Model | Full Finetuning | PEFT-LoRA PyTorch | PEFT-LoRA DeepSpeed with CPU Offloading |
| --------- | ---- | ---- | ---- |
@ -63,9 +87,7 @@ Hardware: Single A100 80GB GPU with CPU RAM above 64GB
| bigscience/mt0-xxl (12B params) | OOM GPU | 56GB GPU / 3GB CPU | 22GB GPU / 52GB CPU |
| bigscience/bloomz-7b1 (7B params) | OOM GPU | 32GB GPU / 3.8GB CPU | 18.1GB GPU / 35GB CPU |
Performance of PEFT-LoRA tuned `bigscience/T0_3B` on `ought/raft/twitter_complaints` leaderboard.
A point to note is that we didn't try to sequeeze performance by playing around with input instruction templates, LoRA hyperparams and other training related hyperparams. Also, we didn't use the larger 13B mt0-xxl model.
So, we are already seeing comparable performance to SoTA with parameter effcient tuning. Also, the final checkpoint size is just `19MB` in comparison to `11GB` size of the backbone `bigscience/T0_3B` model.
With LoRA you can fully finetune a 12B parameter model that would've otherwise run out of memory on the 80GB GPU, and comfortably fit and train a 3B parameter model. When you look at the 3B parameter model's performance, it is comparable to a fully finetuned model at a fraction of the GPU memory.
| Submission Name | Accuracy |
| --------- | ---- |
@ -73,258 +95,64 @@ So, we are already seeing comparable performance to SoTA with parameter effcient
| Flan-T5 | 0.892 |
| lora-t0-3b | 0.863 |
**Therefore, we can see that performance comparable to SoTA is achievable by PEFT methods with consumer hardware such as 16GB and 24GB GPUs.**
> [!TIP]
> The bigscience/T0_3B model performance isn't optimized in the table above. You can squeeze even more performance out of it by playing around with the input instruction templates, LoRA hyperparameters, and other training related hyperparameters. The final checkpoint size of this model is just 19MB compared to 11GB of the full bigscience/T0_3B model. Learn more about the advantages of finetuning with PEFT in this [blog post](https://www.philschmid.de/fine-tune-flan-t5-peft).
### Parameter Efficient Tuning of Diffusion Models
### Quantization
GPU memory required by different settings during training are given below. The final checkpoint size being `8.8 MB`.
Quantization is another method for reducing the memory requirements of a model by representing the data in a lower precision. It can be combined with PEFT methods to make it even easier to train and load LLMs for inference.
Hardware: Single A100 80GB GPU with CPU RAM above 64G
* Learn how to finetune [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) with QLoRA and the [TRL](https://huggingface.co/docs/trl/index) library on a 16GB GPU in the [Finetune LLMs on your own consumer hardware using tools from PyTorch and Hugging Face ecosystem](https://pytorch.org/blog/finetune-llms/) blog post.
* Learn how to finetune a [openai/whisper-large-v2](https://huggingface.co/openai/whisper-large-v2) model for multilingual automatic speech recognition with LoRA and 8-bit quantization in this [notebook](https://colab.research.google.com/drive/1DOkD_5OUjFa0r5Ik3SgywJLJtEo2qLxO?usp=sharing) (see this [notebook](https://colab.research.google.com/drive/1vhF8yueFqha3Y3CpTHN6q9EVcII9EYzs?usp=sharing) instead for an example of streaming a dataset).
| Model | Full Finetuning | PEFT-LoRA | PEFT-LoRA with Gradient Checkpoitning |
### Save compute and storage
PEFT can help you save storage by avoiding full finetuning of models on each of downstream task or dataset. In many cases, you're only finetuning a very small fraction of a model's parameters and each checkpoint is only a few MBs in size (instead of GBs). These smaller PEFT adapters demonstrate performance comparable to a fully finetuned model. If you have many datasets, you can save a lot of storage with a PEFT model and not have to worry about catastrophic forgetting or overfitting the backbone or base model.
## PEFT integrations
PEFT is widely supported across the Hugging Face ecosystem because of the massive efficiency it brings to training and inference.
### Diffusers
The iterative diffusion process consumes a lot of memory which can make it difficult to train. PEFT can help reduce the memory requirements and reduce the storage size of the final model checkpoint. For example, consider the memory required for training a Stable Diffusion model with LoRA on an A100 80GB GPU with more than 64GB of CPU RAM. The final model checkpoint size is only 8.8MB!
| Model | Full Finetuning | PEFT-LoRA | PEFT-LoRA with Gradient Checkpointing |
| --------- | ---- | ---- | ---- |
| CompVis/stable-diffusion-v1-4 | 27.5GB GPU / 3.97GB CPU | 15.5GB GPU / 3.84GB CPU | 8.12GB GPU / 3.77GB CPU |
> [!TIP]
> Take a look at the [examples/lora_dreambooth/train_dreambooth.py](examples/lora_dreambooth/train_dreambooth.py) training script to try training your own Stable Diffusion model with LoRA, and play around with the [smangrul/peft-lora-sd-dreambooth](https://huggingface.co/spaces/smangrul/peft-lora-sd-dreambooth) Space which is running on a T4 instance. Learn more about the PEFT integration in Diffusers in this [tutorial](https://huggingface.co/docs/peft/main/en/tutorial/peft_integrations#diffusers).
**Training**
An example of using LoRA for parameter efficient dreambooth training is given in `~examples/lora_dreambooth/train_dreambooth.py`
### Accelerate
```bash
export MODEL_NAME= "CompVis/stable-diffusion-v1-4" #"stabilityai/stable-diffusion-2-1"
export INSTANCE_DIR="path-to-instance-images"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"
[Accelerate](https://huggingface.co/docs/accelerate/index) is a library for distributed training and inference on various training setups and hardware (GPUs, TPUs, Apple Silicon, etc.). PEFT models work with Accelerate out of the box, making it really convenient to train really large models or use them for inference on consumer hardware with limited resources.
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--train_text_encoder \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--use_lora \
--lora_r 16 \
--lora_alpha 27 \
--lora_text_encoder_r 16 \
--lora_text_encoder_alpha 17 \
--learning_rate=1e-4 \
--gradient_accumulation_steps=1 \
--gradient_checkpointing \
--max_train_steps=800
```
### TRL
Try out the 🤗 Gradio Space which should run seamlessly on a T4 instance:
[smangrul/peft-lora-sd-dreambooth](https://huggingface.co/spaces/smangrul/peft-lora-sd-dreambooth).
PEFT can also be applied to training LLMs with RLHF components such as the ranker and policy. Get started by reading:
![peft lora dreambooth gradio space](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/peft_lora_dreambooth_gradio_space.png)
* [Fine-tune a Mistral-7b model with Direct Preference Optimization](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) with PEFT and the [TRL](https://huggingface.co/docs/trl/index) library to learn more about the Direct Preference Optimization (DPO) method and how to apply it to a LLM.
* [Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU](https://huggingface.co/blog/trl-peft) with PEFT and the [TRL](https://huggingface.co/docs/trl/index) library, and then try out the [gpt2-sentiment_peft.ipynb](https://github.com/huggingface/trl/blob/main/examples/notebooks/gpt2-sentiment.ipynb) notebook to optimize GPT2 to generate positive movie reviews.
* [StackLLaMA: A hands-on guide to train LLaMA with RLHF](https://huggingface.co/blog/stackllama) with PEFT, and then try out the [stack_llama/scripts](https://github.com/huggingface/trl/tree/main/examples/research_projects/stack_llama/scripts) for supervised finetuning, reward modeling, and RL finetuning.
### Parameter Efficient Tuning of LLMs for RLHF components such as Ranker and Policy [ToDo]
## Model support
### Save compute and storage even for medium and small models
Use this [Space](https://stevhliu-peft-methods.hf.space) or check out the [docs](https://huggingface.co/docs/peft/main/en/index) to find which models officially support a PEFT method out of the box. Even if you don't see a model listed below, you can manually configure the model config to enable PEFT for a model. Read the [New transformers architecture](https://huggingface.co/docs/peft/main/en/developer_guides/custom_models#new-transformers-architectures) guide to learn how.
Save storage by avoiding full finetuning of models on each of the downstream tasks/datasets,
With PEFT methods, users only need to store tiny checkpoints in the order of `MBs` all the while retaining
performance comparable to full finetuning.
## Contribute
An example of using LoRA for the task of adaping `LayoutLMForTokenClassification` on `FUNSD` dataset is given in `~examples/token_classification/PEFT_LoRA_LayoutLMForTokenClassification_on_FUNSD.py`. We can observe that with only `0.62 %` of parameters being trainable, we achieve performance (F1 0.777) comparable to full finetuning (F1 0.786) (without any hyerparam tuning runs for extracting more performance), and the checkpoint of this is only `2.8MB`. Now, if there are `N` such datasets, just have these PEFT models one for each dataset and save a lot of storage without having to worry about the problem of catastrophic forgetting or overfitting of backbone/base model.
Another example is fine-tuning `roberta-large` on `MRPC` GLUE dataset suing differenct PEFT methods. The notebooks are given in `~examples/sequence_classification`.
## PEFT + 🤗 Accelerate
PEFT models work with 🤗 Accelerate out of the box. Use 🤗 Accelerate for Distributed training on various hardware such as GPUs, Apple Silicon devices etc during training.
Use 🤗 Accelerate for inferencing on consumer hardware with small resources.
### Example of PEFT model training using 🤗 Accelerate's DeepSpeed integation
Currently DeepSpeed requires PR [ZeRO3 handling frozen weights](https://github.com/microsoft/DeepSpeed/pull/2653) to fix [[REQUEST] efficiently deal with frozen weights during training](https://github.com/microsoft/DeepSpeed/issues/2615) issue. Example is provided in `~examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py`.
a. First run `accelerate config --config_file ds_zero3_cpu.yaml` and answer the questionaire.
Below are the contents of the config file.
```
compute_environment: LOCAL_MACHINE
deepspeed_config:
gradient_accumulation_steps: 1
gradient_clipping: 1.0
offload_optimizer_device: cpu
offload_param_device: cpu
zero3_init_flag: true
zero3_save_16bit_model: true
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
machine_rank: 0
main_training_function: main
megatron_lm_config: {}
mixed_precision: 'no'
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
use_cpu: false
```
b. run the below command to launch example script
```
accelerate launch --config_file ds_zero3_cpu.yaml examples/peft_lora_seq2seq_accelerate_ds_zero3_offload.py
```
c. output logs:
```bash
GPU Memory before entering the train : 1916
GPU Memory consumed at the end of the train (end-begin): 66
GPU Peak Memory consumed during the train (max-begin): 7488
GPU Total Peak Memory consumed during the train (max): 9404
CPU Memory before entering the train : 19411
CPU Memory consumed at the end of the train (end-begin): 0
CPU Peak Memory consumed during the train (max-begin): 0
CPU Total Peak Memory consumed during the train (max): 19411
epoch=4: train_ppl=tensor(1.0705, device='cuda:0') train_epoch_loss=tensor(0.0681, device='cuda:0')
100%|████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:27<00:00, 3.92s/it]
GPU Memory before entering the eval : 1982
GPU Memory consumed at the end of the eval (end-begin): -66
GPU Peak Memory consumed during the eval (max-begin): 672
GPU Total Peak Memory consumed during the eval (max): 2654
CPU Memory before entering the eval : 19411
CPU Memory consumed at the end of the eval (end-begin): 0
CPU Peak Memory consumed during the eval (max-begin): 0
CPU Total Peak Memory consumed during the eval (max): 19411
accuracy=100.0
eval_preds[:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']
dataset['train'][label_column][:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']
```
### Example of PEFT model inference using 🤗 Accelerate's Big Model Inferencing capabilities
Example is provided in `~examples/causal_language_modeling/peft_lora_clm_accelerate_big_model_inference.ipynb`.
## Models support matrix
### Causal Language Modeling
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| GPT-2 | ✅ | ✅ | ✅ | ✅ |
| Bloom | ✅ | ✅ | ✅ | ✅ |
| OPT | ✅ | ✅ | ✅ | ✅ |
| GPT-Neo | ✅ | ✅ | ✅ | ✅ |
| GPT-J | ✅ | ✅ | ✅ | ✅ |
### Conditional Generation
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| T5 | ✅ | ✅ | ✅ | ✅ |
| BART | ✅ | ✅ | ✅ | ✅ |
### Sequence Classification
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| BERT | ✅ | ✅ | ✅ | ✅ |
| RoBERTa | ✅ | ✅ | ✅ | ✅ |
| GPT-2 | ✅ | ✅ | ✅ | ✅ |
| Bloom | ✅ | ✅ | ✅ | ✅ |
| OPT | ✅ | ✅ | ✅ | ✅ |
| GPT-Neo | ✅ | ✅ | ✅ | ✅ |
| GPT-J | ✅ | ✅ | ✅ | ✅ |
| Deberta | ✅ | | ✅ | ✅ |
| Deberta-v2 | ✅ | | ✅ | ✅ |
### Token Classification
| Model | LoRA | Prefix Tuning | P-Tuning | Prompt Tuning |
| --------- | ---- | ---- | ---- | ---- |
| BERT | ✅ | ✅ | | |
| RoBERTa | ✅ | ✅ | | |
| GPT-2 | ✅ | ✅ | | |
| Bloom | ✅ | ✅ | | |
| OPT | ✅ | ✅ | | |
| GPT-Neo | ✅ | ✅ | | |
| GPT-J | ✅ | ✅ | | |
| Deberta | ✅ | | | |
| Deberta-v2 | ✅ | | | |
## Caveats:
1. Below is an example of using PyTorch FSDP for training. However, it doesn't lead to
any GPU memory savings. Please refer issue [[FSDP] FSDP with CPU offload consumes 1.65X more GPU memory when training models with most of the params frozen](https://github.com/pytorch/pytorch/issues/91165).
```python
from peft.utils.other import fsdp_auto_wrap_policy
...
if os.environ.get("ACCELERATE_USE_FSDP", None) is not None:
accelerator.state.fsdp_plugin.auto_wrap_policy = fsdp_auto_wrap_policy(model)
model = accelerator.prepare(model)
```
Example of parameter efficient tuning with `mt0-xxl` base model using 🤗 Accelerate is provided in `~examples/conditional_generation/peft_lora_seq2seq_accelerate_fsdp.py`.
a. First run `accelerate config --config_file fsdp_config.yaml` and answer the questionaire.
Below are the contents of the config file.
```
command_file: null
commands: null
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: FSDP
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config:
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_backward_prefetch_policy: BACKWARD_PRE
fsdp_offload_params: true
fsdp_sharding_strategy: 1
fsdp_state_dict_type: FULL_STATE_DICT
fsdp_transformer_layer_cls_to_wrap: T5Block
gpu_ids: null
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
megatron_lm_config: {}
mixed_precision: 'no'
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_name: null
tpu_zone: null
use_cpu: false
```
b. run the below command to launch example script
```
accelerate launch --config_file fsdp_config.yaml examples/peft_lora_seq2seq_accelerate_fsdp.py
```
2. When using `P_TUNING` or `PROMPT_TUNING` with `SEQ_2_SEQ` task, remember to remove the `num_virtual_token` virtual prompt predictions from the left side of the model outputs during evaluations.
3. `P_TUNING` or `PROMPT_TUNING` doesn't support `generate` functionality of transformers bcause `generate` strictly requires `input_ids`/`decoder_input_ids` but
`P_TUNING`/`PROMPT_TUNING` appends soft prompt embeddings to `input_embeds` to create
new `input_embeds` to be given to the model. Therefore, `generate` doesn't support this yet.
## Backlog:
1. Explore and possibly integrate `(IA)^3` and `UniPELT`
2. Add tests
3. Add more use cases and examples
If you would like to contribute to PEFT, please check out our [contribution guide](https://huggingface.co/docs/peft/developer_guides/contributing).
## Citing 🤗 PEFT
If you use 🤗 PEFT in your publication, please cite it by using the following BibTeX entry.
To use 🤗 PEFT in your publication, please cite it by using the following BibTeX entry.
```bibtex
@Misc{peft,
title = {PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods},
author = {Sourab Mangrulkar, Sylvain Gugger},
author = {Sourab Mangrulkar and Sylvain Gugger and Lysandre Debut and Younes Belkada and Sayak Paul and Benjamin Bossan},
howpublished = {\url{https://github.com/huggingface/peft}},
year = {2022}
}
```
```

11
docker/README.md Normal file
View File

@ -0,0 +1,11 @@
# PEFT Docker images
Here we store all PEFT Docker images used in our testing infrastructure. We use python 3.8 for now on all our images.
- `peft-cpu`: PEFT compiled on CPU with all other HF libraries installed on main branch
- `peft-gpu`: PEFT complied for NVIDIA GPUs wih all other HF libraries installed on main branch
- `peft-gpu-bnb-source`: PEFT complied for NVIDIA GPUs with `bitsandbytes` and all other HF libraries installed from main branch
- `peft-gpu-bnb-latest`: PEFT complied for NVIDIA GPUs with `bitsandbytes` complied from main and all other HF libraries installed from latest PyPi
- `peft-gpu-bnb-multi-source`: PEFT complied for NVIDIA GPUs with `bitsandbytes` complied from `multi-backend` branch and all other HF libraries installed from main branch
`peft-gpu-bnb-source` and `peft-gpu-bnb-multi-source` are essentially the same, with the only difference being `bitsandbytes` compiled on another branch. Make sure to propagate the changes you applied on one file to the other!

View File

@ -0,0 +1,52 @@
# Builds GPU docker image of PyTorch
# Uses multi-staged approach to reduce size
# Stage 1
# Use base conda image to reduce time
FROM continuumio/miniconda3:latest AS compile-image
# Specify py version
ENV PYTHON_VERSION=3.8
# Install apt libs - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN apt-get update && \
apt-get install -y curl git wget software-properties-common git-lfs && \
apt-get clean && \
rm -rf /var/lib/apt/lists*
# Install audio-related libraries
RUN apt-get update && \
apt install -y ffmpeg
RUN apt install -y libsndfile1-dev
RUN git lfs install
# Create our conda env - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN conda create --name peft python=${PYTHON_VERSION} ipython jupyter pip
RUN python3 -m pip install --no-cache-dir --upgrade pip
# Below is copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
# We don't install pytorch here yet since CUDA isn't available
# instead we use the direct torch wheel
ENV PATH /opt/conda/envs/peft/bin:$PATH
# Activate our bash shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# Activate the conda env and install transformers + accelerate from source
RUN source activate peft && \
python3 -m pip install --no-cache-dir \
librosa \
"soundfile>=0.12.1" \
scipy \
git+https://github.com/huggingface/transformers \
git+https://github.com/huggingface/accelerate \
peft[test]@git+https://github.com/huggingface/peft
# Install apt libs
RUN apt-get update && \
apt-get install -y curl git wget && \
apt-get clean && \
rm -rf /var/lib/apt/lists*
RUN echo "source activate peft" >> ~/.profile
# Activate the virtualenv
CMD ["/bin/bash"]

View File

@ -0,0 +1,68 @@
# Builds GPU docker image of PyTorch
# Uses multi-staged approach to reduce size
# Stage 1
# Use base conda image to reduce time
FROM continuumio/miniconda3:latest AS compile-image
# Specify py version
ENV PYTHON_VERSION=3.8
# Install apt libs - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN apt-get update && \
apt-get install -y curl git wget software-properties-common git-lfs && \
apt-get clean && \
rm -rf /var/lib/apt/lists*
# Install audio-related libraries
RUN apt-get update && \
apt install -y ffmpeg
RUN apt install -y libsndfile1-dev
RUN git lfs install
# Create our conda env - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN conda create --name peft python=${PYTHON_VERSION} ipython jupyter pip
RUN python3 -m pip install --no-cache-dir --upgrade pip
# Below is copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
# We don't install pytorch here yet since CUDA isn't available
# instead we use the direct torch wheel
ENV PATH /opt/conda/envs/peft/bin:$PATH
# Activate our bash shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# Stage 2
FROM nvidia/cuda:12.1.0-devel-ubuntu22.04 AS build-image
COPY --from=compile-image /opt/conda /opt/conda
ENV PATH /opt/conda/bin:$PATH
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# Install apt libs
RUN apt-get update && \
apt-get install -y curl git wget cmake && \
apt-get clean && \
rm -rf /var/lib/apt/lists*
# Activate the conda env and install transformers + accelerate from latest pypi
# Also clone BNB and build it from source.
RUN source activate peft && \
python3 -m pip install -U --no-cache-dir \
librosa \
"soundfile>=0.12.1" \
scipy \
transformers \
accelerate \
peft \
optimum \
auto-gptq && \
git clone https://github.com/TimDettmers/bitsandbytes && cd bitsandbytes && \
cmake -B . -DCOMPUTE_BACKEND=cuda -S . && \
cmake --build . && \
pip install -e . && \
pip freeze | grep bitsandbytes
RUN echo "source activate peft" >> ~/.profile
# Activate the virtualenv
CMD ["/bin/bash"]

View File

@ -0,0 +1,68 @@
# Builds GPU docker image of PyTorch
# Uses multi-staged approach to reduce size
# Stage 1
# Use base conda image to reduce time
FROM continuumio/miniconda3:latest AS compile-image
# Specify py version
ENV PYTHON_VERSION=3.8
# Install apt libs - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN apt-get update && \
apt-get install -y curl git wget software-properties-common git-lfs && \
apt-get clean && \
rm -rf /var/lib/apt/lists*
# Install audio-related libraries
RUN apt-get update && \
apt install -y ffmpeg
RUN apt install -y libsndfile1-dev
RUN git lfs install
# Create our conda env - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN conda create --name peft python=${PYTHON_VERSION} ipython jupyter pip
RUN python3 -m pip install --no-cache-dir --upgrade pip
# Below is copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
# We don't install pytorch here yet since CUDA isn't available
# instead we use the direct torch wheel
ENV PATH /opt/conda/envs/peft/bin:$PATH
# Activate our bash shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# Stage 2
FROM nvidia/cuda:12.1.0-devel-ubuntu22.04 AS build-image
COPY --from=compile-image /opt/conda /opt/conda
ENV PATH /opt/conda/bin:$PATH
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# Install apt libs
RUN apt-get update && \
apt-get install -y curl git wget cmake && \
apt-get clean && \
rm -rf /var/lib/apt/lists*
# Activate the conda env and install transformers + accelerate from source
# Also clone BNB and build it from source.
RUN source activate peft && \
python3 -m pip install -U --no-cache-dir \
librosa \
"soundfile>=0.12.1" \
scipy \
git+https://github.com/huggingface/transformers \
git+https://github.com/huggingface/accelerate \
peft[test]@git+https://github.com/huggingface/peft \
optimum \
auto-gptq && \
git clone https://github.com/TimDettmers/bitsandbytes && cd bitsandbytes && git checkout multi-backend-refactor && \
cmake -B . -DCOMPUTE_BACKEND=cuda -S . && \
cmake --build . && \
pip install -e . && \
pip freeze | grep bitsandbytes
RUN echo "source activate peft" >> ~/.profile
# Activate the virtualenv
CMD ["/bin/bash"]

View File

@ -0,0 +1,68 @@
# Builds GPU docker image of PyTorch
# Uses multi-staged approach to reduce size
# Stage 1
# Use base conda image to reduce time
FROM continuumio/miniconda3:latest AS compile-image
# Specify py version
ENV PYTHON_VERSION=3.8
# Install apt libs - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN apt-get update && \
apt-get install -y curl git wget software-properties-common git-lfs && \
apt-get clean && \
rm -rf /var/lib/apt/lists*
# Install audio-related libraries
RUN apt-get update && \
apt install -y ffmpeg
RUN apt install -y libsndfile1-dev
RUN git lfs install
# Create our conda env - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN conda create --name peft python=${PYTHON_VERSION} ipython jupyter pip
RUN python3 -m pip install --no-cache-dir --upgrade pip
# Below is copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
# We don't install pytorch here yet since CUDA isn't available
# instead we use the direct torch wheel
ENV PATH /opt/conda/envs/peft/bin:$PATH
# Activate our bash shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# Stage 2
FROM nvidia/cuda:12.1.0-devel-ubuntu22.04 AS build-image
COPY --from=compile-image /opt/conda /opt/conda
ENV PATH /opt/conda/bin:$PATH
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# Install apt libs
RUN apt-get update && \
apt-get install -y curl git wget cmake && \
apt-get clean && \
rm -rf /var/lib/apt/lists*
# Activate the conda env and install transformers + accelerate from source
# Also clone BNB and build it from source.
RUN source activate peft && \
python3 -m pip install -U --no-cache-dir \
librosa \
"soundfile>=0.12.1" \
scipy \
git+https://github.com/huggingface/transformers \
git+https://github.com/huggingface/accelerate \
peft[test]@git+https://github.com/huggingface/peft \
optimum \
auto-gptq && \
git clone https://github.com/TimDettmers/bitsandbytes && cd bitsandbytes && \
cmake -B . -DCOMPUTE_BACKEND=cuda -S . && \
cmake --build . && \
pip install -e . && \
pip freeze | grep bitsandbytes
RUN echo "source activate peft" >> ~/.profile
# Activate the virtualenv
CMD ["/bin/bash"]

View File

@ -0,0 +1,83 @@
# Builds GPU docker image of PyTorch
# Uses multi-staged approach to reduce size
# Stage 1
# Use base conda image to reduce time
FROM continuumio/miniconda3:latest AS compile-image
# Specify py version
ENV PYTHON_VERSION=3.8
# Install apt libs - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN apt-get update && \
apt-get install -y curl git wget software-properties-common git-lfs && \
apt-get clean && \
rm -rf /var/lib/apt/lists*
# Install audio-related libraries
RUN apt-get update && \
apt install -y ffmpeg
RUN apt install -y libsndfile1-dev
RUN git lfs install
# Create our conda env - copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
RUN conda create --name peft python=${PYTHON_VERSION} ipython jupyter pip
RUN python3 -m pip install --no-cache-dir --upgrade pip
# Below is copied from https://github.com/huggingface/accelerate/blob/main/docker/accelerate-gpu/Dockerfile
# We don't install pytorch here yet since CUDA isn't available
# instead we use the direct torch wheel
ENV PATH /opt/conda/envs/peft/bin:$PATH
# Activate our bash shell
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
# Stage 2
FROM nvidia/cuda:12.2.2-devel-ubuntu22.04 AS build-image
COPY --from=compile-image /opt/conda /opt/conda
ENV PATH /opt/conda/bin:$PATH
RUN chsh -s /bin/bash
SHELL ["/bin/bash", "-c"]
RUN source activate peft && \
python3 -m pip install --no-cache-dir bitsandbytes optimum auto-gptq
# Add autoawq for quantization testing
RUN source activate peft && \
python3 -m pip install --no-cache-dir https://github.com/casper-hansen/AutoAWQ/releases/download/v0.2.4/autoawq-0.2.4-cp38-cp38-linux_x86_64.whl
RUN source activate peft && \
python3 -m pip install --no-cache-dir https://github.com/casper-hansen/AutoAWQ_kernels/releases/download/v0.0.6/autoawq_kernels-0.0.6-cp38-cp38-linux_x86_64.whl
# Install apt libs
RUN apt-get update && \
apt-get install -y curl git wget && \
apt-get clean && \
rm -rf /var/lib/apt/lists*
# Add eetq for quantization testing
RUN source activate peft && \
python3 -m pip install git+https://github.com/NetEase-FuXi/EETQ.git
# Activate the conda env and install transformers + accelerate from source
RUN source activate peft && \
python3 -m pip install -U --no-cache-dir \
librosa \
"soundfile>=0.12.1" \
scipy \
git+https://github.com/huggingface/transformers \
git+https://github.com/huggingface/accelerate \
peft[test]@git+https://github.com/huggingface/peft
# Add aqlm for quantization testing
RUN source activate peft && \
pip install aqlm[gpu]>=1.0.2
# Add HQQ for quantization testing
RUN source activate peft && \
pip install hqq
RUN source activate peft && \
pip freeze | grep transformers
RUN echo "source activate peft" >> ~/.profile
# Activate the virtualenv
CMD ["/bin/bash"]

19
docs/Makefile Normal file
View File

@ -0,0 +1,19 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SOURCEDIR = source
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

267
docs/README.md Normal file
View File

@ -0,0 +1,267 @@
<!---
Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Generating the documentation
To generate the documentation, you first have to build it. Several packages are necessary to build the doc,
you can install them with the following command, at the root of the code repository:
```bash
pip install -e ".[docs]"
```
Then you need to install our special tool that builds the documentation:
```bash
pip install git+https://github.com/huggingface/doc-builder
```
---
**NOTE**
You only need to generate the documentation to inspect it locally (if you're planning changes and want to
check how they look before committing for instance). You don't have to commit to the built documentation.
---
## Building the documentation
Once you have setup the `doc-builder` and additional packages, you can generate the documentation by
typing the following command:
```bash
doc-builder build peft docs/source/ --build_dir ~/tmp/test-build
```
You can adapt the `--build_dir` to set any temporary folder you prefer. This command will create it and generate
the MDX files that will be rendered as the documentation on the main website. You can inspect them in your favorite
Markdown editor.
## Previewing the documentation
To preview the docs, first install the `watchdog` module with:
```bash
pip install watchdog
```
Then run the following command:
```bash
doc-builder preview {package_name} {path_to_docs}
```
For example:
```bash
doc-builder preview peft docs/source
```
The docs will be viewable at [http://localhost:3000](http://localhost:3000). You can also preview the docs once you have opened a PR. You will see a bot add a comment to a link where the documentation with your changes lives.
---
**NOTE**
The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml` & restart `preview` command (`ctrl-c` to stop it & call `doc-builder preview ...` again).
---
## Adding a new element to the navigation bar
Accepted files are Markdown (.md or .mdx).
Create a file with its extension and put it in the source directory. You can then link it to the toc-tree by putting
the filename without the extension in the [`_toctree.yml`](https://github.com/huggingface/peft/blob/main/docs/source/_toctree.yml) file.
## Renaming section headers and moving sections
It helps to keep the old links working when renaming the section header and/or moving sections from one document to another. This is because the old links are likely to be used in Issues, Forums, and Social media and it'd make for a much more superior user experience if users reading those months later could still easily navigate to the originally intended information.
Therefore, we simply keep a little map of moved sections at the end of the document where the original section was. The key is to preserve the original anchor.
So if you renamed a section from: "Section A" to "Section B", then you can add at the end of the file:
```
Sections that were moved:
[ <a href="#section-b">Section A</a><a id="section-a"></a> ]
```
and of course, if you moved it to another file, then:
```
Sections that were moved:
[ <a href="../new-file#section-b">Section A</a><a id="section-a"></a> ]
```
Use the relative style to link to the new file so that the versioned docs continue to work.
## Writing Documentation - Specification
The `huggingface/peft` documentation follows the
[Google documentation](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) style for docstrings,
although we can write them directly in Markdown.
### Adding a new tutorial
Adding a new tutorial or section is done in two steps:
- Add a new file under `./source`. This file can either be ReStructuredText (.rst) or Markdown (.md).
- Link that file in `./source/_toctree.yml` on the correct toc-tree.
Make sure to put your new file under the proper section. It's unlikely to go in the first section (*Get Started*), so
depending on the intended targets (beginners, more advanced users, or researchers) it should go into sections two, three, or
four.
### Writing source documentation
Values that should be put in `code` should either be surrounded by backticks: \`like so\`. Note that argument names
and objects like True, None, or any strings should usually be put in `code`.
When mentioning a class, function, or method, it is recommended to use our syntax for internal links so that our tool
adds a link to its documentation with this syntax: \[\`XXXClass\`\] or \[\`function\`\]. This requires the class or
function to be in the main package.
If you want to create a link to some internal class or function, you need to
provide its path. For instance: \[\`utils.gather\`\]. This will be converted into a link with
`utils.gather` in the description. To get rid of the path and only keep the name of the object you are
linking to in the description, add a ~: \[\`~utils.gather\`\] will generate a link with `gather` in the description.
The same works for methods so you can either use \[\`XXXClass.method\`\] or \[~\`XXXClass.method\`\].
#### Defining arguments in a method
Arguments should be defined with the `Args:` (or `Arguments:` or `Parameters:`) prefix, followed by a line return and
an indentation. The argument should be followed by its type, with its shape if it is a tensor, a colon, and its
description:
```
Args:
n_layers (`int`): The number of layers of the model.
```
If the description is too long to fit in one line (more than 119 characters in total), another indentation is necessary
before writing the description after the argument.
Finally, to maintain uniformity if any *one* description is too long to fit on one line, the
rest of the parameters should follow suit and have an indention before their description.
Here's an example showcasing everything so far:
```
Args:
gradient_accumulation_steps (`int`, *optional*, default to 1):
The number of steps that should pass before gradients are accumulated. A number > 1 should be combined with `Accelerator.accumulate`.
cpu (`bool`, *optional*):
Whether or not to force the script to execute on CPU. Will ignore GPU available if set to `True` and force the execution on one process only.
```
For optional arguments or arguments with defaults we follow the following syntax: imagine we have a function with the
following signature:
```
def my_function(x: str = None, a: float = 1):
```
then its documentation should look like this:
```
Args:
x (`str`, *optional*):
This argument controls ... and has a description longer than 119 chars.
a (`float`, *optional*, defaults to 1):
This argument is used to ... and has a description longer than 119 chars.
```
Note that we always omit the "defaults to \`None\`" when None is the default for any argument. Also note that even
if the first line describing your argument type and its default gets long, you can't break it into several lines. You can
however write as many lines as you want in the indented description (see the example above with `input_ids`).
#### Writing a multi-line code block
Multi-line code blocks can be useful for displaying examples. They are done between two lines of three backticks as usual in Markdown:
````
```python
# first line of code
# second line
# etc
```
````
#### Writing a return block
The return block should be introduced with the `Returns:` prefix, followed by a line return and an indentation.
The first line should be the type of the return, followed by a line return. No need to indent further for the elements
building the return.
Here's an example of a single value return:
```
Returns:
`List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token.
```
Here's an example of a tuple return, comprising several objects:
```
Returns:
`tuple(torch.FloatTensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs:
- ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.FloatTensor` of shape `(1,)` --
Total loss is the sum of the masked language modeling loss and the next sequence prediction (classification) loss.
- **prediction_scores** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) --
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
```
## Styling the docstring
We have an automatic script running with the `make style` comment that will make sure that:
- the docstrings fully take advantage of the line width
- all code examples are formatted using black, like the code of the Transformers library
This script may have some weird failures if you make a syntax mistake or if you uncover a bug. Therefore, it's
recommended to commit your changes before running `make style`, so you can revert the changes done by that script
easily.
## Writing documentation examples
The syntax, for example, docstrings can look as follows:
```
Example:
```python
>>> import time
>>> from accelerate import Accelerator
>>> accelerator = Accelerator()
>>> if accelerator.is_main_process:
... time.sleep(2)
>>> else:
... print("I'm waiting for the main process to finish its sleep...")
>>> accelerator.wait_for_everyone()
>>> # Should print on every process at the same time
>>> print("Everyone is here")
```
```
The docstring should give a minimal, clear example of how the respective function
is to be used in inference and also include the expected (ideally sensible)
output.
Often, readers will try out the example before even going through the function
or class definitions. Therefore, it is of utmost importance that the example
works as expected.

7
docs/source/_config.py Normal file
View File

@ -0,0 +1,7 @@
# docstyle-ignore
INSTALL_CONTENT = """
# PEFT installation
! pip install peft accelerate transformers
# To install from source instead of the last release, comment the command above and uncomment the following one.
# ! pip install git+https://github.com/huggingface/peft.git
"""

127
docs/source/_toctree.yml Normal file
View File

@ -0,0 +1,127 @@
- title: Get started
sections:
- local: index
title: 🤗 PEFT
- local: quicktour
title: Quicktour
- local: install
title: Installation
- title: Tutorial
sections:
- local: tutorial/peft_model_config
title: Configurations and models
- local: tutorial/peft_integrations
title: Integrations
- title: PEFT method guides
sections:
- local: task_guides/prompt_based_methods
title: Prompt-based methods
- local: task_guides/lora_based_methods
title: LoRA methods
- local: task_guides/ia3
title: IA3
- title: Developer guides
sections:
- local: developer_guides/model_merging
title: Model merging
- local: developer_guides/quantization
title: Quantization
- local: developer_guides/lora
title: LoRA
- local: developer_guides/custom_models
title: Custom models
- local: developer_guides/low_level_api
title: Adapter injection
- local: developer_guides/mixed_models
title: Mixed adapter types
- local: developer_guides/torch_compile
title: torch.compile
- local: developer_guides/contributing
title: Contribute to PEFT
- local: developer_guides/troubleshooting
title: Troubleshooting
- local: developer_guides/checkpoint
title: PEFT checkpoint format
- title: 🤗 Accelerate integrations
sections:
- local: accelerate/deepspeed
title: DeepSpeed
- local: accelerate/fsdp
title: Fully Sharded Data Parallel
- title: Conceptual guides
sections:
- local: conceptual_guides/adapter
title: Adapters
- local: conceptual_guides/prompting
title: Soft prompts
- local: conceptual_guides/ia3
title: IA3
- local: conceptual_guides/oft
title: OFT/BOFT
- sections:
- sections:
- local: package_reference/auto_class
title: AutoPeftModel
- local: package_reference/peft_model
title: PEFT model
- local: package_reference/peft_types
title: PEFT types
- local: package_reference/config
title: Configuration
- local: package_reference/tuners
title: Tuner
title: Main classes
- sections:
- local: package_reference/adalora
title: AdaLoRA
- local: package_reference/ia3
title: IA3
- local: package_reference/llama_adapter
title: Llama-Adapter
- local: package_reference/loha
title: LoHa
- local: package_reference/lokr
title: LoKr
- local: package_reference/lora
title: LoRA
- local: package_reference/xlora
title: X-LoRA
- local: package_reference/adapter_utils
title: LyCORIS
- local: package_reference/multitask_prompt_tuning
title: Multitask Prompt Tuning
- local: package_reference/oft
title: OFT
- local: package_reference/boft
title: BOFT
- local: package_reference/poly
title: Polytropon
- local: package_reference/p_tuning
title: P-tuning
- local: package_reference/prefix_tuning
title: Prefix tuning
- local: package_reference/prompt_tuning
title: Prompt tuning
- local: package_reference/layernorm_tuning
title: Layernorm tuning
- local: package_reference/vera
title: VeRA
- local: package_reference/fourierft
title: FourierFT
- local: package_reference/vblora
title: VB-LoRA
title: Adapters
- sections:
- local: package_reference/merge_utils
title: Model merge
- local: package_reference/helpers
title: Helpers
title: Utilities
title: API reference

View File

@ -0,0 +1,447 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# DeepSpeed
[DeepSpeed](https://www.deepspeed.ai/) is a library designed for speed and scale for distributed training of large models with billions of parameters. At its core is the Zero Redundancy Optimizer (ZeRO) that shards optimizer states (ZeRO-1), gradients (ZeRO-2), and parameters (ZeRO-3) across data parallel processes. This drastically reduces memory usage, allowing you to scale your training to billion parameter models. To unlock even more memory efficiency, ZeRO-Offload reduces GPU compute and memory by leveraging CPU resources during optimization.
Both of these features are supported in 🤗 Accelerate, and you can use them with 🤗 PEFT.
## Compatibility with `bitsandbytes` quantization + LoRA
Below is a table that summarizes the compatibility between PEFT's LoRA, [`bitsandbytes`](https://github.com/TimDettmers/bitsandbytes) library and DeepSpeed Zero stages with respect to fine-tuning. DeepSpeed Zero-1 and 2 will have no effect at inference as stage 1 shards the optimizer states and stage 2 shards the optimizer states and gradients:
| DeepSpeed stage | Is compatible? |
|---|---|
| Zero-1 | 🟢 |
| Zero-2 | 🟢 |
| Zero-3 | 🟢 |
For DeepSpeed Stage 3 + QLoRA, please refer to the section [Use PEFT QLoRA and DeepSpeed with ZeRO3 for finetuning large models on multiple GPUs](#use-peft-qlora-and-deepspeed-with-zero3-for-finetuning-large-models-on-multiple-gpus) below.
For confirming these observations, we ran the SFT (Supervised Fine-tuning) [offical example scripts](https://github.com/huggingface/trl/tree/main/examples) of the [Transformers Reinforcement Learning (TRL) library](https://github.com/huggingface/trl) using QLoRA + PEFT and the accelerate configs available [here](https://github.com/huggingface/trl/tree/main/examples/accelerate_configs). We ran these experiments on a 2x NVIDIA T4 GPU.
# Use PEFT and DeepSpeed with ZeRO3 for finetuning large models on multiple devices and multiple nodes
This section of guide will help you learn how to use our DeepSpeed [training script](https://github.com/huggingface/peft/blob/main/examples/sft/train.py) for performing SFT. You'll configure the script to do SFT (supervised fine-tuning) of Llama-70B model with LoRA and ZeRO-3 on 8xH100 80GB GPUs on a single machine. You can configure it to scale to multiple machines by changing the accelerate config.
## Configuration
Start by running the following command to [create a DeepSpeed configuration file](https://huggingface.co/docs/accelerate/quicktour#launching-your-distributed-script) with 🤗 Accelerate. The `--config_file` flag allows you to save the configuration file to a specific location, otherwise it is saved as a `default_config.yaml` file in the 🤗 Accelerate cache.
The configuration file is used to set the default options when you launch the training script.
```bash
accelerate config --config_file deepspeed_config.yaml
```
You'll be asked a few questions about your setup, and configure the following arguments. In this example, you'll use ZeRO-3 so make sure you pick those options.
```bash
`zero_stage`: [0] Disabled, [1] optimizer state partitioning, [2] optimizer+gradient state partitioning and [3] optimizer+gradient+parameter partitioning
`gradient_accumulation_steps`: Number of training steps to accumulate gradients before averaging and applying them. Pass the same value as you would pass via cmd argument else you will encounter mismatch error.
`gradient_clipping`: Enable gradient clipping with value. Don't set this as you will be passing it via cmd arguments.
`offload_optimizer_device`: [none] Disable optimizer offloading, [cpu] offload optimizer to CPU, [nvme] offload optimizer to NVMe SSD. Only applicable with ZeRO >= Stage-2. Set this as `none` as don't want to enable offloading.
`offload_param_device`: [none] Disable parameter offloading, [cpu] offload parameters to CPU, [nvme] offload parameters to NVMe SSD. Only applicable with ZeRO Stage-3. Set this as `none` as don't want to enable offloading.
`zero3_init_flag`: Decides whether to enable `deepspeed.zero.Init` for constructing massive models. Only applicable with ZeRO Stage-3. Set this to `True`.
`zero3_save_16bit_model`: Decides whether to save 16-bit model weights when using ZeRO Stage-3. Set this to `True`.
`mixed_precision`: `no` for FP32 training, `fp16` for FP16 mixed-precision training and `bf16` for BF16 mixed-precision training. Set this to `True`.
```
Once this is done, the corresponding config should look like below and you can find it in config folder at [deepspeed_config.yaml](https://github.com/huggingface/peft/blob/main/examples/sft/configs/deepspeed_config.yaml):
```yml
compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
deepspeed_multinode_launcher: standard
gradient_accumulation_steps: 4
offload_optimizer_device: none
offload_param_device: none
zero3_init_flag: true
zero3_save_16bit_model: true
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
```
## Launch command
The launch command is available at [run_peft_deepspeed.sh](https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_deepspeed.sh) and it is also shown below:
```bash
accelerate launch --config_file "configs/deepspeed_config.yaml" train.py \
--seed 100 \
--model_name_or_path "meta-llama/Llama-2-70b-hf" \
--dataset_name "smangrul/ultrachat-10k-chatml" \
--chat_template_format "chatml" \
--add_special_tokens False \
--append_concat_token False \
--splits "train,test" \
--max_seq_len 2048 \
--num_train_epochs 1 \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \
--hub_strategy "every_save" \
--bf16 True \
--packing True \
--learning_rate 1e-4 \
--lr_scheduler_type "cosine" \
--weight_decay 1e-4 \
--warmup_ratio 0.0 \
--max_grad_norm 1.0 \
--output_dir "llama-sft-lora-deepspeed" \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 8 \
--gradient_accumulation_steps 4 \
--gradient_checkpointing True \
--use_reentrant False \
--dataset_text_field "content" \
--use_flash_attn True \
--use_peft_lora True \
--lora_r 8 \
--lora_alpha 16 \
--lora_dropout 0.1 \
--lora_target_modules "all-linear" \
--use_4bit_quantization False
```
Notice that we are using LoRA with rank=8, alpha=16 and targeting all linear layers. We are passing the deepspeed config file and finetuning 70B Llama model on a subset of the ultrachat dataset.
## The important parts
Let's dive a little deeper into the script so you can see what's going on, and understand how it works.
The first thing to know is that the script uses DeepSpeed for distributed training as the DeepSpeed config has been passed. The `SFTTrainer` class handles all the heavy lifting of creating the PEFT model using the peft config that is passed. After that, when you call `trainer.train()`, `SFTTrainer` internally uses 🤗 Accelerate to prepare the model, optimizer and trainer using the DeepSpeed config to create DeepSpeed engine which is then trained. The main code snippet is below:
```python
# trainer
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
peft_config=peft_config,
packing=data_args.packing,
dataset_kwargs={
"append_concat_token": data_args.append_concat_token,
"add_special_tokens": data_args.add_special_tokens,
},
dataset_text_field=data_args.dataset_text_field,
max_seq_length=data_args.max_seq_length,
)
trainer.accelerator.print(f"{trainer.model}")
# train
checkpoint = None
if training_args.resume_from_checkpoint is not None:
checkpoint = training_args.resume_from_checkpoint
trainer.train(resume_from_checkpoint=checkpoint)
# saving final model
trainer.save_model()
```
## Memory usage
In the above example, the memory consumed per GPU is 64 GB (80%) as seen in the screenshot below:
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/peft_deepspeed_mem_usage.png"/>
</div>
<small>GPU memory usage for the training run</small>
## More resources
You can also refer this blog post [Falcon 180B Finetuning using 🤗 PEFT and DeepSpeed](https://medium.com/@sourabmangrulkar/falcon-180b-finetuning-using-peft-and-deepspeed-b92643091d99) on how to finetune 180B Falcon model on 16 A100 GPUs on 2 machines.
# Use PEFT QLoRA and DeepSpeed with ZeRO3 for finetuning large models on multiple GPUs
In this section, we will look at how to use QLoRA and DeepSpeed Stage-3 for finetuning 70B llama model on 2X40GB GPUs.
For this, we first need `bitsandbytes>=0.43.0`, `accelerate>=0.28.0`, `transformers>4.38.2`, `trl>0.7.11` and `peft>0.9.0`. We need to set `zero3_init_flag` to true when using Accelerate config. Below is the config which can be found at [deepspeed_config_z3_qlora.yaml](https://github.com/huggingface/peft/blob/main/examples/sft/configs/deepspeed_config_z3_qlora.yaml):
```yml
compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
deepspeed_multinode_launcher: standard
offload_optimizer_device: none
offload_param_device: none
zero3_init_flag: true
zero3_save_16bit_model: true
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
```
Launch command is given below which is available at [run_peft_qlora_deepspeed_stage3.sh](https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_deepspeed.sh):
```
accelerate launch --config_file "configs/deepspeed_config_z3_qlora.yaml" train.py \
--seed 100 \
--model_name_or_path "meta-llama/Llama-2-70b-hf" \
--dataset_name "smangrul/ultrachat-10k-chatml" \
--chat_template_format "chatml" \
--add_special_tokens False \
--append_concat_token False \
--splits "train,test" \
--max_seq_len 2048 \
--num_train_epochs 1 \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \
--hub_strategy "every_save" \
--bf16 True \
--packing True \
--learning_rate 1e-4 \
--lr_scheduler_type "cosine" \
--weight_decay 1e-4 \
--warmup_ratio 0.0 \
--max_grad_norm 1.0 \
--output_dir "llama-sft-qlora-dsz3" \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 2 \
--gradient_accumulation_steps 2 \
--gradient_checkpointing True \
--use_reentrant True \
--dataset_text_field "content" \
--use_flash_attn True \
--use_peft_lora True \
--lora_r 8 \
--lora_alpha 16 \
--lora_dropout 0.1 \
--lora_target_modules "all-linear" \
--use_4bit_quantization True \
--use_nested_quant True \
--bnb_4bit_compute_dtype "bfloat16" \
--bnb_4bit_quant_storage_dtype "bfloat16"
```
Notice the new argument being passed `bnb_4bit_quant_storage_dtype` which denotes the data type for packing the 4-bit parameters. For example, when it is set to `bfloat16`, **32/4 = 8** 4-bit params are packed together post quantization.
In terms of training code, the important code changes are:
```diff
...
bnb_config = BitsAndBytesConfig(
load_in_4bit=args.use_4bit_quantization,
bnb_4bit_quant_type=args.bnb_4bit_quant_type,
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=args.use_nested_quant,
+ bnb_4bit_quant_storage=quant_storage_dtype,
)
...
model = AutoModelForCausalLM.from_pretrained(
args.model_name_or_path,
quantization_config=bnb_config,
trust_remote_code=True,
attn_implementation="flash_attention_2" if args.use_flash_attn else "eager",
+ torch_dtype=quant_storage_dtype or torch.float32,
)
```
Notice that `torch_dtype` for `AutoModelForCausalLM` is same as the `bnb_4bit_quant_storage` data type. That's it. Everything else is handled by Trainer and TRL.
## Memory usage
In the above example, the memory consumed per GPU is **36.6 GB**. Therefore, what took 8X80GB GPUs with DeepSpeed Stage 3+LoRA and a couple of 80GB GPUs with DDP+QLoRA now requires 2X40GB GPUs. This makes finetuning of large models more accessible.
# Use PEFT and DeepSpeed with ZeRO3 and CPU Offloading for finetuning large models on a single GPU
This section of guide will help you learn how to use our DeepSpeed [training script](https://github.com/huggingface/peft/blob/main/examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py). You'll configure the script to train a large model for conditional generation with ZeRO-3 and CPU Offload.
<Tip>
💡 To help you get started, check out our example training scripts for [causal language modeling](https://github.com/huggingface/peft/blob/main/examples/causal_language_modeling/peft_lora_clm_accelerate_ds_zero3_offload.py) and [conditional generation](https://github.com/huggingface/peft/blob/main/examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py). You can adapt these scripts for your own applications or even use them out of the box if your task is similar to the one in the scripts.
</Tip>
## Configuration
Start by running the following command to [create a DeepSpeed configuration file](https://huggingface.co/docs/accelerate/quicktour#launching-your-distributed-script) with 🤗 Accelerate. The `--config_file` flag allows you to save the configuration file to a specific location, otherwise it is saved as a `default_config.yaml` file in the 🤗 Accelerate cache.
The configuration file is used to set the default options when you launch the training script.
```bash
accelerate config --config_file ds_zero3_cpu.yaml
```
You'll be asked a few questions about your setup, and configure the following arguments. In this example, you'll use ZeRO-3 along with CPU-Offload so make sure you pick those options.
```bash
`zero_stage`: [0] Disabled, [1] optimizer state partitioning, [2] optimizer+gradient state partitioning and [3] optimizer+gradient+parameter partitioning
`gradient_accumulation_steps`: Number of training steps to accumulate gradients before averaging and applying them.
`gradient_clipping`: Enable gradient clipping with value.
`offload_optimizer_device`: [none] Disable optimizer offloading, [cpu] offload optimizer to CPU, [nvme] offload optimizer to NVMe SSD. Only applicable with ZeRO >= Stage-2.
`offload_param_device`: [none] Disable parameter offloading, [cpu] offload parameters to CPU, [nvme] offload parameters to NVMe SSD. Only applicable with ZeRO Stage-3.
`zero3_init_flag`: Decides whether to enable `deepspeed.zero.Init` for constructing massive models. Only applicable with ZeRO Stage-3.
`zero3_save_16bit_model`: Decides whether to save 16-bit model weights when using ZeRO Stage-3.
`mixed_precision`: `no` for FP32 training, `fp16` for FP16 mixed-precision training and `bf16` for BF16 mixed-precision training.
```
An example [configuration file](https://github.com/huggingface/peft/blob/main/examples/conditional_generation/accelerate_ds_zero3_cpu_offload_config.yaml) might look like the following. The most important thing to notice is that `zero_stage` is set to `3`, and `offload_optimizer_device` and `offload_param_device` are set to the `cpu`.
```yml
compute_environment: LOCAL_MACHINE
deepspeed_config:
gradient_accumulation_steps: 1
gradient_clipping: 1.0
offload_optimizer_device: cpu
offload_param_device: cpu
zero3_init_flag: true
zero3_save_16bit_model: true
zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
machine_rank: 0
main_training_function: main
megatron_lm_config: {}
mixed_precision: 'no'
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
use_cpu: false
```
## The important parts
Let's dive a little deeper into the script so you can see what's going on, and understand how it works.
Within the [`main`](https://github.com/huggingface/peft/blob/2822398fbe896f25d4dac5e468624dc5fd65a51b/examples/conditional_generation/peft_lora_seq2seq_accelerate_ds_zero3_offload.py#L103) function, the script creates an [`~accelerate.Accelerator`] class to initialize all the necessary requirements for distributed training.
<Tip>
💡 Feel free to change the model and dataset inside the `main` function. If your dataset format is different from the one in the script, you may also need to write your own preprocessing function.
</Tip>
The script also creates a configuration for the 🤗 PEFT method you're using, which in this case, is LoRA. The [`LoraConfig`] specifies the task type and important parameters such as the dimension of the low-rank matrices, the matrices scaling factor, and the dropout probability of the LoRA layers. If you want to use a different 🤗 PEFT method, make sure you replace `LoraConfig` with the appropriate [class](../package_reference/tuners).
```diff
def main():
+ accelerator = Accelerator()
model_name_or_path = "facebook/bart-large"
dataset_name = "twitter_complaints"
+ peft_config = LoraConfig(
task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
)
```
Throughout the script, you'll see the [`~accelerate.Accelerator.main_process_first`] and [`~accelerate.Accelerator.wait_for_everyone`] functions which help control and synchronize when processes are executed.
The [`get_peft_model`] function takes a base model and the [`peft_config`] you prepared earlier to create a [`PeftModel`]:
```diff
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
+ model = get_peft_model(model, peft_config)
```
Pass all the relevant training objects to 🤗 Accelerate's [`~accelerate.Accelerator.prepare`] which makes sure everything is ready for training:
```py
model, train_dataloader, eval_dataloader, test_dataloader, optimizer, lr_scheduler = accelerator.prepare(
model, train_dataloader, eval_dataloader, test_dataloader, optimizer, lr_scheduler
)
```
The next bit of code checks whether the DeepSpeed plugin is used in the `Accelerator`, and if the plugin exists, then we check if we are using ZeRO-3. This conditional flag is used when calling `generate` function call during inference for syncing GPUs when the model parameters are sharded:
```py
is_ds_zero_3 = False
if getattr(accelerator.state, "deepspeed_plugin", None):
is_ds_zero_3 = accelerator.state.deepspeed_plugin.zero_stage == 3
```
Inside the training loop, the usual `loss.backward()` is replaced by 🤗 Accelerate's [`~accelerate.Accelerator.backward`] which uses the correct `backward()` method based on your configuration:
```diff
for epoch in range(num_epochs):
with TorchTracemalloc() as tracemalloc:
model.train()
total_loss = 0
for step, batch in enumerate(tqdm(train_dataloader)):
outputs = model(**batch)
loss = outputs.loss
total_loss += loss.detach().float()
+ accelerator.backward(loss)
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
```
That is all! The rest of the script handles the training loop, evaluation, and even pushes it to the Hub for you.
## Train
Run the following command to launch the training script. Earlier, you saved the configuration file to `ds_zero3_cpu.yaml`, so you'll need to pass the path to the launcher with the `--config_file` argument like this:
```bash
accelerate launch --config_file ds_zero3_cpu.yaml examples/peft_lora_seq2seq_accelerate_ds_zero3_offload.py
```
You'll see some output logs that track memory usage during training, and once it's completed, the script returns the accuracy and compares the predictions to the labels:
```bash
GPU Memory before entering the train : 1916
GPU Memory consumed at the end of the train (end-begin): 66
GPU Peak Memory consumed during the train (max-begin): 7488
GPU Total Peak Memory consumed during the train (max): 9404
CPU Memory before entering the train : 19411
CPU Memory consumed at the end of the train (end-begin): 0
CPU Peak Memory consumed during the train (max-begin): 0
CPU Total Peak Memory consumed during the train (max): 19411
epoch=4: train_ppl=tensor(1.0705, device='cuda:0') train_epoch_loss=tensor(0.0681, device='cuda:0')
100%|████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:27<00:00, 3.92s/it]
GPU Memory before entering the eval : 1982
GPU Memory consumed at the end of the eval (end-begin): -66
GPU Peak Memory consumed during the eval (max-begin): 672
GPU Total Peak Memory consumed during the eval (max): 2654
CPU Memory before entering the eval : 19411
CPU Memory consumed at the end of the eval (end-begin): 0
CPU Peak Memory consumed during the eval (max-begin): 0
CPU Total Peak Memory consumed during the eval (max): 19411
accuracy=100.0
eval_preds[:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']
dataset['train'][label_column][:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']
```
# Caveats
1. Merging when using PEFT and DeepSpeed is currently unsupported and will raise error.
2. When using CPU offloading, the major gains from using PEFT to shrink the optimizer states and gradients to that of the adapter weights would be realized on CPU RAM and there won't be savings with respect to GPU memory.
3. DeepSpeed Stage 3 and qlora when used with CPU offloading leads to more GPU memory usage when compared to disabling CPU offloading.

View File

@ -0,0 +1,292 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Fully Sharded Data Parallel
[Fully sharded data parallel](https://pytorch.org/docs/stable/fsdp.html) (FSDP) is developed for distributed training of large pretrained models up to 1T parameters. FSDP achieves this by sharding the model parameters, gradients, and optimizer states across data parallel processes and it can also offload sharded model parameters to a CPU. The memory efficiency afforded by FSDP allows you to scale training to larger batch or model sizes.
Both of these features are supported in 🤗 Accelerate, and you can use them with 🤗 PEFT.
# Use PEFT and FSDP
This section of guide will help you learn how to use our DeepSpeed [training script](https://github.com/huggingface/peft/blob/main/examples/sft/train.py) for performing SFT. You'll configure the script to do SFT (supervised fine-tuning) of Llama-70B model with LoRA and FSDP on 8xH100 80GB GPUs on a single machine. You can configure it to scale to multiple machines by changing the accelerate config.
## Configuration
Start by running the following command to [create a FSDP configuration file](https://huggingface.co/docs/accelerate/quicktour#launching-your-distributed-script) with 🤗 Accelerate. The `--config_file` flag allows you to save the configuration file to a specific location, otherwise it is saved as a `default_config.yaml` file in the 🤗 Accelerate cache.
The configuration file is used to set the default options when you launch the training script.
```bash
accelerate config --config_file fsdp_config.yaml
```
You'll be asked a few questions about your setup, and configure the following arguments. In this example, you'll answer the questionnaire as shown in the image below.
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/fsdp-peft-config.png"/>
</div>
<small>Creating Accelerate's config to use FSDP</small>
Once this is done, the corresponding config should look like below and you can find it in config folder at [fsdp_config.yaml](https://github.com/huggingface/peft/blob/main/examples/sft/configs/fsdp_config.yaml):
```yml
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
fsdp_config:
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_backward_prefetch: BACKWARD_PRE
fsdp_cpu_ram_efficient_loading: true
fsdp_forward_prefetch: false
fsdp_offload_params: false
fsdp_sharding_strategy: FULL_SHARD
fsdp_state_dict_type: SHARDED_STATE_DICT
fsdp_sync_module_states: true
fsdp_use_orig_params: false
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
```
## Launch command
The launch command is available at [run_peft_fsdp.sh](https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_fsdp.sh) and it is also shown below:
```bash
accelerate launch --config_file "configs/fsdp_config.yaml" train.py \
--seed 100 \
--model_name_or_path "meta-llama/Llama-2-70b-hf" \
--dataset_name "smangrul/ultrachat-10k-chatml" \
--chat_template_format "chatml" \
--add_special_tokens False \
--append_concat_token False \
--splits "train,test" \
--max_seq_len 2048 \
--num_train_epochs 1 \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \
--hub_strategy "every_save" \
--bf16 True \
--packing True \
--learning_rate 1e-4 \
--lr_scheduler_type "cosine" \
--weight_decay 1e-4 \
--warmup_ratio 0.0 \
--max_grad_norm 1.0 \
--output_dir "llama-sft-lora-fsdp" \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 8 \
--gradient_accumulation_steps 4 \
--gradient_checkpointing True \
--use_reentrant False \
--dataset_text_field "content" \
--use_flash_attn True \
--use_peft_lora True \
--lora_r 8 \
--lora_alpha 16 \
--lora_dropout 0.1 \
--lora_target_modules "all-linear" \
--use_4bit_quantization False
```
Notice that we are using LoRA with rank=8, alpha=16 and targeting all linear layers. We are passing the FSDP config file and finetuning the 70B Llama model on a subset of the [ultrachat dataset](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k).
## The important parts
Let's dive a little deeper into the script so you can see what's going on, and understand how it works.
The first thing to know is that the script uses FSDP for distributed training as the FSDP config has been passed. The `SFTTrainer` class handles all the heavy lifting of creating PEFT model using the peft config that is passed. After that when you call `trainer.train()`, Trainer internally uses 🤗 Accelerate to prepare model, optimizer and trainer using the FSDP config to create FSDP wrapped model which is then trained. The main code snippet is below:
```python
# trainer
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
peft_config=peft_config,
packing=data_args.packing,
dataset_kwargs={
"append_concat_token": data_args.append_concat_token,
"add_special_tokens": data_args.add_special_tokens,
},
dataset_text_field=data_args.dataset_text_field,
max_seq_length=data_args.max_seq_length,
)
trainer.accelerator.print(f"{trainer.model}")
if model_args.use_peft_lora:
# handle PEFT+FSDP case
trainer.model.print_trainable_parameters()
if getattr(trainer.accelerator.state, "fsdp_plugin", None):
from peft.utils.other import fsdp_auto_wrap_policy
fsdp_plugin = trainer.accelerator.state.fsdp_plugin
fsdp_plugin.auto_wrap_policy = fsdp_auto_wrap_policy(trainer.model)
# train
checkpoint = None
if training_args.resume_from_checkpoint is not None:
checkpoint = training_args.resume_from_checkpoint
trainer.train(resume_from_checkpoint=checkpoint)
# saving final model
if trainer.is_fsdp_enabled:
trainer.accelerator.state.fsdp_plugin.set_state_dict_type("FULL_STATE_DICT")
trainer.save_model()
```
Here, one main thing to note currently when using FSDP with PEFT is that `use_orig_params` needs to be `False` to realize GPU memory savings. Due to `use_orig_params=False`, the auto wrap policy for FSDP needs to change so that trainable and non-trainable parameters are wrapped separately. This is done by the code snippt below which uses the util function `fsdp_auto_wrap_policy` from PEFT:
```
if getattr(trainer.accelerator.state, "fsdp_plugin", None):
from peft.utils.other import fsdp_auto_wrap_policy
fsdp_plugin = trainer.accelerator.state.fsdp_plugin
fsdp_plugin.auto_wrap_policy = fsdp_auto_wrap_policy(trainer.model)
```
## Memory usage
In the above example, the memory consumed per GPU is 72-80 GB (90-98%) as seen in the screenshot below. The slight increase in GPU memory at the end is when saving the model using `FULL_STATE_DICT` state dict type instead of the `SHARDED_STATE_DICT` so that the model has adapter weights that can be loaded normally with `from_pretrained` method during inference:
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/peft_fsdp_mem_usage.png"/>
</div>
<small>GPU memory usage for the training run</small>
# Use PEFT QLoRA and FSDP for finetuning large models on multiple GPUs
In this section, we will look at how to use QLoRA and FSDP for finetuning 70B llama model on 2X24GB GPUs. [Answer.AI](https://www.answer.ai/) in collaboration with bitsandbytes and Hugging Face 🤗 open sourced code enabling the usage of FSDP+QLoRA and explained the whole process in their insightful blogpost [You can now train a 70b language model at home](https://www.answer.ai/posts/2024-03-06-fsdp-qlora.html). This is now integrated in Hugging Face ecosystem.
For this, we first need `bitsandbytes>=0.43.0`, `accelerate>=0.28.0`, `transformers>4.38.2`, `trl>0.7.11` and `peft>0.9.0`. We need to set `fsdp_cpu_ram_efficient_loading=true`, `fsdp_use_orig_params=false` and `fsdp_offload_params=true`(cpu offloading) when using Accelerate config. When not using accelerate launcher, you can alternately set the environment variable `export FSDP_CPU_RAM_EFFICIENT_LOADING=true`. Here, we will be using accelerate config and below is the config which can be found at [fsdp_config_qlora.yaml](https://github.com/huggingface/peft/blob/main/examples/sft/configs/fsdp_config_qlora.yaml):
```yml
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
fsdp_config:
fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
fsdp_backward_prefetch: BACKWARD_PRE
fsdp_cpu_ram_efficient_loading: true
fsdp_forward_prefetch: false
fsdp_offload_params: true
fsdp_sharding_strategy: FULL_SHARD
fsdp_state_dict_type: SHARDED_STATE_DICT
fsdp_sync_module_states: true
fsdp_use_orig_params: false
machine_rank: 0
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
```
Launch command is given below which is available at [run_peft_qlora_fsdp.sh](https://github.com/huggingface/peft/blob/main/examples/sft/run_peft_qlora_fsdp.sh):
```
accelerate launch --config_file "configs/fsdp_config_qlora.yaml" train.py \
--seed 100 \
--model_name_or_path "meta-llama/Llama-2-70b-hf" \
--dataset_name "smangrul/ultrachat-10k-chatml" \
--chat_template_format "chatml" \
--add_special_tokens False \
--append_concat_token False \
--splits "train,test" \
--max_seq_len 2048 \
--num_train_epochs 1 \
--logging_steps 5 \
--log_level "info" \
--logging_strategy "steps" \
--eval_strategy "epoch" \
--save_strategy "epoch" \
--push_to_hub \
--hub_private_repo True \
--hub_strategy "every_save" \
--bf16 True \
--packing True \
--learning_rate 1e-4 \
--lr_scheduler_type "cosine" \
--weight_decay 1e-4 \
--warmup_ratio 0.0 \
--max_grad_norm 1.0 \
--output_dir "llama-sft-qlora-fsdp" \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 2 \
--gradient_accumulation_steps 2 \
--gradient_checkpointing True \
--use_reentrant True \
--dataset_text_field "content" \
--use_flash_attn True \
--use_peft_lora True \
--lora_r 8 \
--lora_alpha 16 \
--lora_dropout 0.1 \
--lora_target_modules "all-linear" \
--use_4bit_quantization True \
--use_nested_quant True \
--bnb_4bit_compute_dtype "bfloat16" \
--bnb_4bit_quant_storage_dtype "bfloat16"
```
Notice the new argument being passed, `bnb_4bit_quant_storage_dtype`, which denotes the data type for packing the 4-bit parameters. For example, when it is set to `bfloat16`, **16/4 = 4** 4-bit params are packed together post quantization. When using mixed precision training with `bfloat16`, `bnb_4bit_quant_storage_dtype` can be either `bfloat16` for pure `bfloat16` finetuning, or `float32` for automatic mixed precision (this consumes more GPU memory). When using mixed precision training with `float16`, `bnb_4bit_quant_storage_dtype` should be set to `float32` for stable automatic mixed precision training.
In terms of training code, the important code changes are:
```diff
...
bnb_config = BitsAndBytesConfig(
load_in_4bit=args.use_4bit_quantization,
bnb_4bit_quant_type=args.bnb_4bit_quant_type,
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=args.use_nested_quant,
+ bnb_4bit_quant_storage=quant_storage_dtype,
)
...
model = AutoModelForCausalLM.from_pretrained(
args.model_name_or_path,
quantization_config=bnb_config,
trust_remote_code=True,
attn_implementation="flash_attention_2" if args.use_flash_attn else "eager",
+ torch_dtype=quant_storage_dtype or torch.float32,
)
```
Notice that `torch_dtype` for `AutoModelForCausalLM` is same as the `bnb_4bit_quant_storage` data type. That's it. Everything else is handled by Trainer and TRL.
## Memory usage
In the above example, the memory consumed per GPU is **19.6 GB** while CPU RAM usage is around **107 GB**. When disabling CPU offloading, the GPU memory usage is **35.6 GB/ GPU**. Therefore, what took 16X80GB GPUs for full finetuning, 8X80GB GPUs with FSDP+LoRA, and a couple of 80GB GPUs with DDP+QLoRA, now requires 2X24GB GPUs. This makes finetuning of large models more accessible.
## More resources
You can also refer the [llama-recipes](https://github.com/facebookresearch/llama-recipes/?tab=readme-ov-file#fine-tuning) repo and [Getting started with Llama](https://llama.meta.com/get-started/#fine-tuning) guide on how to finetune using FSDP and PEFT.
## Caveats
1. Merging when using PEFT and FSDP is currently unsupported and will raise error.
2. Passing `modules_to_save` config parameter to is untested at present.
3. GPU Memory saving when using CPU Offloading is untested at present.
4. When using FSDP+QLoRA, `paged_adamw_8bit` currently results in an error when saving a checkpoint.
5. DoRA training with FSDP should work (albeit at lower speed than LoRA). If combined with bitsandbytes (QDoRA), 4-bit quantization should also work, but 8-bit quantization has known issues and is not recommended.

View File

@ -0,0 +1,107 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Adapters
Adapter-based methods add extra trainable parameters after the attention and fully-connected layers of a frozen pretrained model to reduce memory-usage and speed up training. The method varies depending on the adapter, it could simply be an extra added layer or it could be expressing the weight updates ∆W as a low-rank decomposition of the weight matrix. Either way, the adapters are typically small but demonstrate comparable performance to a fully finetuned model and enable training larger models with fewer resources.
This guide will give you a brief overview of the adapter methods supported by PEFT (if you're interested in learning more details about a specific method, take a look at the linked paper).
## Low-Rank Adaptation (LoRA)
<Tip>
LoRA is one of the most popular PEFT methods and a good starting point if you're just getting started with PEFT. It was originally developed for large language models but it is a tremendously popular training method for diffusion models because of its efficiency and effectiveness.
</Tip>
As mentioned briefly earlier, [LoRA](https://hf.co/papers/2106.09685) is a technique that accelerates finetuning large models while consuming less memory.
LoRA represents the weight updates ∆W with two smaller matrices (called *update matrices*) through low-rank decomposition. These new matrices can be trained to adapt to the new data while keeping the overall number of parameters low. The original weight matrix remains frozen and doesn't receive any further updates. To produce the final results, the original and extra adapted weights are combined. You could also merge the adapter weights with the base model to eliminate inference latency.
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/lora_animated.gif"/>
</div>
This approach has a number of advantages:
* LoRA makes finetuning more efficient by drastically reducing the number of trainable parameters.
* The original pretrained weights are kept frozen, which means you can have multiple lightweight and portable LoRA models for various downstream tasks built on top of them.
* LoRA is orthogonal to other parameter-efficient methods and can be combined with many of them.
* Performance of models finetuned using LoRA is comparable to the performance of fully finetuned models.
In principle, LoRA can be applied to any subset of weight matrices in a neural network to reduce the number of trainable parameters. However, for simplicity and further parameter efficiency, LoRA is typically only applied to the attention blocks in Transformer models. The resulting number of trainable parameters in a LoRA model depends on the size of the update matrices, which is determined mainly by the rank `r` and the shape of the original weight matrix.
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/lora.png"/>
</div>
<small><a href="https://hf.co/papers/2103.10385">Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation</a></small>
## Mixture of LoRA Experts (X-LoRA)
[X-LoRA](https://arxiv.org/abs/2402.07148) is a mixture of experts method for LoRA which works by using dense or sparse gating to dynamically activate LoRA experts. The LoRA experts as well as the base model are frozen during training, resulting in a low parameter count as only the gating layers must be trained. In particular, the gating layers output scalings which (depending on config) are granular on the layer and token level. Additionally, during inference, X-LoRA dynamically activates LoRA adapters to recall knowledge and effectively mix them:
The below graphic demonstrates how the scalings change for different prompts for each token. This highlights the activation of different adapters as the generation progresses and the sequence creates new context.
![Token-by-token scalings](https://github.com/EricLBuehler/xlora/raw/master/res/token_by_token_scalings.gif)
For each step, X-LoRA requires the base model to be run twice: first, to get hidden states without any LoRA adapters, and secondly, the hidden states are used to calculate scalings which are applied to the LoRA adapters and the model is run a second time. The output of the second run is the result of the model step.
Ultimately, X-LoRA allows the model to reflect upon it's knowledge because of the dual forward pass scheme, and dynamically reconfigure the architecture.
## Low-Rank Hadamard Product (LoHa)
Low-rank decomposition can impact performance because the weight updates are limited to the low-rank space, which can constrain a model's expressiveness. However, you don't necessarily want to use a larger rank because it increases the number of trainable parameters. To address this, [LoHa](https://huggingface.co/papers/2108.06098) (a method originally developed for computer vision) was applied to diffusion models where the ability to generate diverse images is an important consideration. LoHa should also work with general model types, but the embedding layers aren't currently implemented in PEFT.
LoHa uses the [Hadamard product](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)) (element-wise product) instead of the matrix product. ∆W is represented by four smaller matrices instead of two - like in LoRA - and each pair of these low-rank matrices are combined with the Hadamard product. As a result, ∆W can have the same number of trainable parameters but a higher rank and expressivity.
## Low-Rank Kronecker Product (LoKr)
[LoKr](https://hf.co/papers/2309.14859) is very similar to LoRA and LoHa, and it is also mainly applied to diffusion models, though you could also use it with other model types. LoKr replaces the matrix product with the [Kronecker product](https://en.wikipedia.org/wiki/Kronecker_product) instead. The Kronecker product decomposition creates a block matrix which preserves the rank of the original weight matrix. Another benefit of the Kronecker product is that it can be vectorized by stacking the matrix columns. This can speed up the process because you're avoiding fully reconstructing ∆W.
## Orthogonal Finetuning (OFT)
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/oft.png"/>
</div>
<small><a href="https://hf.co/papers/2306.07280">Controlling Text-to-Image Diffusion by Orthogonal Finetuning</a></small>
[OFT](https://hf.co/papers/2306.07280) is a method that primarily focuses on preserving a pretrained model's generative performance in the finetuned model. It tries to maintain the same cosine similarity (hyperspherical energy) between all pairwise neurons in a layer because this better captures the semantic information among neurons. This means OFT is more capable at preserving the subject and it is better for controllable generation (similar to [ControlNet](https://huggingface.co/docs/diffusers/using-diffusers/controlnet)).
OFT preserves the hyperspherical energy by learning an orthogonal transformation for neurons to keep the cosine similarity between them unchanged. In practice, this means taking the matrix product of an orthogonal matrix with the pretrained weight matrix. However, to be parameter-efficient, the orthogonal matrix is represented as a block-diagonal matrix with rank `r` blocks. Whereas LoRA reduces the number of trainable parameters with low-rank structures, OFT reduces the number of trainable parameters with a sparse block-diagonal matrix structure.
## Orthogonal Butterfly (BOFT)
[BOFT](https://hf.co/papers/2311.06243) is a method that primarily focuses on preserving a pretrained model's generative performance in the finetuned model. It tries to maintain the same cosine similarity (hyperspherical energy) between all pairwise neurons in a layer because this better captures the semantic information among neurons. This means OFT is more capable at preserving the subject and it is better for controllable generation (similar to [ControlNet](https://huggingface.co/docs/diffusers/using-diffusers/controlnet)).
OFT preserves the hyperspherical energy by learning an orthogonal transformation for neurons to keep the cosine similarity between them unchanged. In practice, this means taking the matrix product of an orthogonal matrix with the pretrained weight matrix. However, to be parameter-efficient, the orthogonal matrix is represented as a block-diagonal matrix with rank `r` blocks. Whereas LoRA reduces the number of trainable parameters with low-rank structures, OFT reduces the number of trainable parameters with a sparse block-diagonal matrix structure.
## Adaptive Low-Rank Adaptation (AdaLoRA)
[AdaLoRA](https://hf.co/papers/2303.10512) manages the parameter budget introduced from LoRA by allocating more parameters - in other words, a higher rank `r` - for important weight matrices that are better adapted for a task and pruning less important ones. The rank is controlled by a method similar to singular value decomposition (SVD). The ∆W is parameterized with two orthogonal matrices and a diagonal matrix which contains singular values. This parametrization method avoids iteratively applying SVD which is computationally expensive. Based on this method, the rank of ∆W is adjusted according to an importance score. ∆W is divided into triplets and each triplet is scored according to its contribution to model performance. Triplets with low importance scores are pruned and triplets with high importance scores are kept for finetuning.
## Llama-Adapter
[Llama-Adapter](https://hf.co/papers/2303.16199) is a method for adapting Llama into a instruction-following model. To help adapt the model for instruction-following, the adapter is trained with a 52K instruction-output dataset.
A set of of learnable adaption prompts are prefixed to the input instruction tokens. These are inserted into the upper layers of the model because it is better to learn with the higher-level semantics of the pretrained model. The instruction-output tokens prefixed to the input guide the adaption prompt to generate a contextual response.
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/llama-adapter.png"/>
</div>
<small><a href="https://hf.co/papers/2303.16199">LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention</a></small>
To avoid adding noise to the tokens, the adapter uses zero-initialized attention. On top of this, the adapter adds a learnable gating factor (initialized with zeros) to progressively add information to the model during training. This prevents overwhelming the model's pretrained knowledge with the newly learned instructions.

View File

@ -0,0 +1,68 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# IA3
This conceptual guide gives a brief overview of [IA3](https://arxiv.org/abs/2205.05638), a parameter-efficient fine tuning technique that is
intended to improve over [LoRA](./lora).
To make fine-tuning more efficient, IA3 (Infused Adapter by Inhibiting and Amplifying Inner Activations)
rescales inner activations with learned vectors. These learned vectors are injected in the attention and feedforward modules
in a typical transformer-based architecture. These learned vectors are the only trainable parameters during fine-tuning, and thus the original
weights remain frozen. Dealing with learned vectors (as opposed to learned low-rank updates to a weight matrix like LoRA)
keeps the number of trainable parameters much smaller.
Being similar to LoRA, IA3 carries many of the same advantages:
* IA3 makes fine-tuning more efficient by drastically reducing the number of trainable parameters. (For T0, an IA3 model only has about 0.01% trainable parameters, while even LoRA has > 0.1%)
* The original pre-trained weights are kept frozen, which means you can have multiple lightweight and portable IA3 models for various downstream tasks built on top of them.
* Performance of models fine-tuned using IA3 is comparable to the performance of fully fine-tuned models.
* IA3 does not add any inference latency because adapter weights can be merged with the base model.
In principle, IA3 can be applied to any subset of weight matrices in a neural network to reduce the number of trainable
parameters. Following the authors' implementation, IA3 weights are added to the key, value and feedforward layers
of a Transformer model. To be specific, for transformer models, IA3 weights are added to the outputs of key and value layers, and to the input of the second feedforward layer
in each transformer block.
Given the target layers for injecting IA3 parameters, the number of trainable parameters
can be determined based on the size of the weight matrices.
## Common IA3 parameters in PEFT
As with other methods supported by PEFT, to fine-tune a model using IA3, you need to:
1. Instantiate a base model.
2. Create a configuration (`IA3Config`) where you define IA3-specific parameters.
3. Wrap the base model with `get_peft_model()` to get a trainable `PeftModel`.
4. Train the `PeftModel` as you normally would train the base model.
`IA3Config` allows you to control how IA3 is applied to the base model through the following parameters:
- `target_modules`: The modules (for example, attention blocks) to apply the IA3 vectors.
- `feedforward_modules`: The list of modules to be treated as feedforward layers in `target_modules`. While learned vectors are multiplied with
the output activation for attention blocks, the vectors are multiplied with the input for classic feedforward layers. Note that `feedforward_modules` must be a subset of `target_modules`.
- `modules_to_save`: List of modules apart from IA3 layers to be set as trainable and saved in the final checkpoint. These typically include model's custom head that is randomly initialized for the fine-tuning task.
## Example Usage
For the task of sequence classification, one can initialize the IA3 config for a Llama model as follows:
```py
peft_config = IA3Config(
task_type=TaskType.SEQ_CLS, target_modules=["k_proj", "v_proj", "down_proj"], feedforward_modules=["down_proj"]
)
```

View File

@ -0,0 +1,107 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Orthogonal Finetuning (OFT and BOFT)
This conceptual guide gives a brief overview of [OFT](https://arxiv.org/abs/2306.07280) and [BOFT](https://arxiv.org/abs/2311.06243), a parameter-efficient fine-tuning technique that utilizes orthogonal matrix to multiplicatively transform the pretrained weight matrices.
To achieve efficient fine-tuning, OFT represents the weight updates with an orthogonal transformation. The orthogonal transformation is parameterized by an orthogonal matrix multiplied to the pretrained weight matrix. These new matrices can be trained to adapt to the new data while keeping the overall number of changes low. The original weight matrix remains frozen and doesnt receive any further adjustments. To produce the final results, both the original and the adapted weights are multiplied togethor.
Orthogonal Butterfly (BOFT) generalizes OFT with Butterfly factorization and further improves its parameter efficiency and finetuning flexibility. In short, OFT can be viewed as a special case of BOFT. Different from LoRA that uses additive low-rank weight updates, BOFT uses multiplicative orthogonal weight updates. The comparison is shown below.
<div class="flex justify-center">
<img src="https://raw.githubusercontent.com/wy1iu/butterfly-oft/main/assets/BOFT_comparison.png"/>
</div>
BOFT has some advantages compared to LoRA:
* BOFT proposes a simple yet generic way to finetune pretrained models to downstream tasks, yielding a better preservation of pretraining knowledge and a better parameter efficiency.
* Through the orthogonality, BOFT introduces a structural constraint, i.e., keeping the [hyperspherical energy](https://arxiv.org/abs/1805.09298) unchanged during finetuning. This can effectively reduce the forgetting of pretraining knowledge.
* BOFT uses the butterfly factorization to efficiently parameterize the orthogonal matrix, which yields a compact yet expressive learning space (i.e., hypothesis class).
* The sparse matrix decomposition in BOFT brings in additional inductive biases that are beneficial to generalization.
In principle, BOFT can be applied to any subset of weight matrices in a neural network to reduce the number of trainable parameters. Given the target layers for injecting BOFT parameters, the number of trainable parameters can be determined based on the size of the weight matrices.
## Merge OFT/BOFT weights into the base model
Similar to LoRA, the weights learned by OFT/BOFT can be integrated into the pretrained weight matrices using the merge_and_unload() function. This function merges the adapter weights with the base model which allows you to effectively use the newly merged model as a standalone model.
<div class="flex justify-center">
<img src="https://raw.githubusercontent.com/wy1iu/butterfly-oft/main/assets/boft_merge.png"/>
</div>
This works because during training, the orthogonal weight matrix (R in the diagram above) and the pretrained weight matrices are separate. But once training is complete, these weights can actually be merged (multiplied) into a new weight matrix that is equivalent.
## Utils for OFT / BOFT
### Common OFT / BOFT parameters in PEFT
As with other methods supported by PEFT, to fine-tune a model using OFT or BOFT, you need to:
1. Instantiate a base model.
2. Create a configuration (`OFTConfig` or `BOFTConfig`) where you define OFT/BOFT-specific parameters.
3. Wrap the base model with `get_peft_model()` to get a trainable `PeftModel`.
4. Train the `PeftModel` as you normally would train the base model.
### BOFT-specific paramters
`BOFTConfig` allows you to control how OFT/BOFT is applied to the base model through the following parameters:
- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_size` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only
specify either `boft_block_size` or `boft_block_num`, but not both simultaneously or leaving both to 0, because `boft_block_size` x `boft_block_num` must equal the layer's input dimension.
- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. **Note**, please choose `boft_block_num` to be divisible by most layer's input dimension (`in_features`), e.g., 4, 8, 16. Also, please only
specify either `boft_block_size` or `boft_block_num`, but not both simultaneously or leaving both to 0, because `boft_block_size` x `boft_block_num` must equal the layer's input dimension.
- `boft_n_butterfly_factor`: the number of butterfly factors. **Note**, for `boft_n_butterfly_factor=1`, BOFT is the same as vanilla OFT, for `boft_n_butterfly_factor=2`, the effective block size of OFT becomes twice as big and the number of blocks become half.
- `bias`: specify if the `bias` parameters should be trained. Can be `"none"`, `"all"` or `"boft_only"`.
- `boft_dropout`: specify the probability of multiplicative dropout.
- `target_modules`: The modules (for example, attention blocks) to inject the OFT/BOFT matrices.
- `modules_to_save`: List of modules apart from OFT/BOFT matrices to be set as trainable and saved in the final checkpoint. These typically include model's custom head that is randomly initialized for the fine-tuning task.
## BOFT Example Usage
For an example of the BOFT method application to various downstream tasks, please refer to the following guides:
Take a look at the following step-by-step guides on how to finetune a model with BOFT:
- [Dreambooth finetuning with BOFT](../task_guides/boft_dreambooth)
- [Controllable generation finetuning with BOFT (ControlNet)](../task_guides/boft_controlnet)
For the task of image classification, one can initialize the BOFT config for a DinoV2 model as follows:
```py
import transformers
from transformers import AutoModelForSeq2SeqLM, BOFTConfig
from peft import BOFTConfig, get_peft_model
config = BOFTConfig(
boft_block_size=4,
boft_n_butterfly_factor=2,
target_modules=["query", "value", "key", "output.dense", "mlp.fc1", "mlp.fc2"],
boft_dropout=0.1,
bias="boft_only",
modules_to_save=["classifier"],
)
model = transformers.Dinov2ForImageClassification.from_pretrained(
"facebook/dinov2-large",
num_labels=100,
)
boft_model = get_peft_model(model, config)
```

View File

@ -0,0 +1,77 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Soft prompts
Training large pretrained language models is very time-consuming and compute-intensive. As they continue to grow in size, there is increasing interest in more efficient training methods such as *prompting*. Prompting primes a frozen pretrained model for a specific downstream task by including a text prompt that describes the task or even demonstrates an example of the task. With prompting, you can avoid fully training a separate model for each downstream task, and use the same frozen pretrained model instead. This is a lot easier because you can use the same model for several different tasks, and it is significantly more efficient to train and store a smaller set of prompt parameters than to train all the model's parameters.
There are two categories of prompting methods:
- hard prompts are manually handcrafted text prompts with discrete input tokens; the downside is that it requires a lot of effort to create a good prompt
- soft prompts are learnable tensors concatenated with the input embeddings that can be optimized to a dataset; the downside is that they aren't human readable because you aren't matching these "virtual tokens" to the embeddings of a real word
This conceptual guide provides a brief overview of the soft prompt methods included in 🤗 PEFT: prompt tuning, prefix tuning, P-tuning, and multitask prompt tuning.
## Prompt tuning
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/prompt-tuning.png"/>
</div>
<small>Only train and store a significantly smaller set of task-specific prompt parameters <a href="https://hf.co/papers/2104.08691">(image source)</a>.</small>
[Prompt tuning](https://hf.co/papers/2104.08691) was developed for text classification tasks on T5 models, and all downstream tasks are cast as a text generation task. For example, sequence classification usually assigns a single class label to a sequence of text. By casting it as a text generation task, the tokens that make up the class label are *generated*. Prompts are added to the input as a series of tokens. Typically, the model parameters are fixed which means the prompt tokens are also fixed by the model parameters.
The key idea behind prompt tuning is that prompt tokens have their own parameters that are updated independently. This means you can keep the pretrained model's parameters frozen, and only update the gradients of the prompt token embeddings. The results are comparable to the traditional method of training the entire model, and prompt tuning performance scales as model size increases.
Take a look at [Prompt tuning for causal language modeling](../task_guides/clm-prompt-tuning) for a step-by-step guide on how to train a model with prompt tuning.
## Prefix tuning
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/prefix-tuning.png"/>
</div>
<small>Optimize the prefix parameters for each task <a href="https://hf.co/papers/2101.00190">(image source)</a>.</small>
[Prefix tuning](https://hf.co/papers/2101.00190) was designed for natural language generation (NLG) tasks on GPT models. It is very similar to prompt tuning; prefix tuning also prepends a sequence of task-specific vectors to the input that can be trained and updated while keeping the rest of the pretrained model's parameters frozen.
The main difference is that the prefix parameters are inserted in **all** of the model layers, whereas prompt tuning only adds the prompt parameters to the model input embeddings. The prefix parameters are also optimized by a separate feed-forward network (FFN) instead of training directly on the soft prompts because it causes instability and hurts performance. The FFN is discarded after updating the soft prompts.
As a result, the authors found that prefix tuning demonstrates comparable performance to fully finetuning a model, despite having 1000x fewer parameters, and it performs even better in low-data settings.
Take a look at [Prefix tuning for conditional generation](../task_guides/seq2seq-prefix-tuning) for a step-by-step guide on how to train a model with prefix tuning.
## P-tuning
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/p-tuning.png"/>
</div>
<small>Prompt tokens can be inserted anywhere in the input sequence, and they are optimized by a prompt encoder <a href="https://hf.co/papers/2103.10385">(image source)</a>.</small>
[P-tuning](https://hf.co/papers/2103.10385) is designed for natural language understanding (NLU) tasks and all language models.
It is another variation of a soft prompt method; P-tuning also adds a trainable embedding tensor that can be optimized to find better prompts, and it uses a prompt encoder (a bidirectional long-short term memory network or LSTM) to optimize the prompt parameters. Unlike prefix tuning though:
- the prompt tokens can be inserted anywhere in the input sequence, and it isn't restricted to only the beginning
- the prompt tokens are only added to the input instead of adding them to every layer of the model
- introducing *anchor* tokens can improve performance because they indicate characteristics of a component in the input sequence
The results suggest that P-tuning is more efficient than manually crafting prompts, and it enables GPT-like models to compete with BERT-like models on NLU tasks.
Take a look at [P-tuning for sequence classification](../task_guides/ptuning-seq-classification) for a step-by-step guide on how to train a model with P-tuning.
## Multitask prompt tuning
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/mpt.png"/>
</div>
<small><a href="https://hf.co/papers/2303.02861">Multitask prompt tuning enables parameter-efficient transfer learning</a>.</small>
[Multitask prompt tuning (MPT)](https://hf.co/papers/2303.02861) learns a single prompt from data for multiple task types that can be shared for different target tasks. Other existing approaches learn a separate soft prompt for each task that need to be retrieved or aggregated for adaptation to target tasks. MPT consists of two stages:
1. source training - for each task, its soft prompt is decomposed into task-specific vectors. The task-specific vectors are multiplied together to form another matrix W, and the Hadamard product is used between W and a shared prompt matrix P to generate a task-specific prompt matrix. The task-specific prompts are distilled into a single prompt matrix that is shared across all tasks. This prompt is trained with multitask training.
2. target adaptation - to adapt the single prompt for a target task, a target prompt is initialized and expressed as the Hadamard product of the shared prompt matrix and the task-specific low-rank prompt matrix.
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/mpt-decomposition.png"/>
</div>
<small><a href="https://hf.co/papers/2103.10385">Prompt decomposition</a>.</small>

View File

@ -0,0 +1,250 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# PEFT checkpoint format
This document describes how PEFT's checkpoint files are structured and how to convert between the PEFT format and other formats.
## PEFT files
PEFT (parameter-efficient fine-tuning) methods only update a small subset of a model's parameters rather than all of them. This is nice because checkpoint files can generally be much smaller than the original model files and are easier to store and share. However, this also means that to load a PEFT model, you need to have the original model available as well.
When you call [`~PeftModel.save_pretrained`] on a PEFT model, the PEFT model saves three files, described below:
1. `adapter_model.safetensors` or `adapter_model.bin`
By default, the model is saved in the `safetensors` format, a secure alternative to the `bin` format, which is known to be susceptible to [security vulnerabilities](https://huggingface.co/docs/hub/security-pickle) because it uses the pickle utility under the hood. Both formats store the same `state_dict` though, and are interchangeable.
The `state_dict` only contains the parameters of the adapter module, not the base model. To illustrate the difference in size, a normal BERT model requires ~420MB of disk space, whereas an IA³ adapter on top of this BERT model only requires ~260KB.
2. `adapter_config.json`
The `adapter_config.json` file contains the configuration of the adapter module, which is necessary to load the model. Below is an example of an `adapter_config.json` for an IA³ adapter with standard settings applied to a BERT model:
```json
{
"auto_mapping": {
"base_model_class": "BertModel",
"parent_library": "transformers.models.bert.modeling_bert"
},
"base_model_name_or_path": "bert-base-uncased",
"fan_in_fan_out": false,
"feedforward_modules": [
"output.dense"
],
"inference_mode": true,
"init_ia3_weights": true,
"modules_to_save": null,
"peft_type": "IA3",
"revision": null,
"target_modules": [
"key",
"value",
"output.dense"
],
"task_type": null
}
```
The configuration file contains:
- the adapter module type stored, `"peft_type": "IA3"`
- information about the base model like `"base_model_name_or_path": "bert-base-uncased"`
- the revision of the model (if any), `"revision": null`
If the base model is not a pretrained Transformers model, the latter two entries will be `null`. Other than that, the settings are all related to the specific IA³ adapter that was used to fine-tune the model.
3. `README.md`
The generated `README.md` is the model card of a PEFT model and contains a few pre-filled entries. The intent of this is to make it easier to share the model with others and to provide some basic information about the model. This file is not needed to load the model.
## Convert to PEFT format
When converting from another format to the PEFT format, we require both the `adapter_model.safetensors` (or `adapter_model.bin`) file and the `adapter_config.json` file.
### adapter_model
For the model weights, it is important to use the correct mapping from parameter name to value for PEFT to load the file. Getting this mapping right is an exercise in checking the implementation details, as there is no generally agreed upon format for PEFT adapters.
Fortunately, figuring out this mapping is not overly complicated for common base cases. Let's look at a concrete example, the [`LoraLayer`](https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/layer.py):
```python
# showing only part of the code
class LoraLayer(BaseTunerLayer):
# All names of layers that may contain (trainable) adapter weights
adapter_layer_names = ("lora_A", "lora_B", "lora_embedding_A", "lora_embedding_B")
# All names of other parameters that may contain adapter-related parameters
other_param_names = ("r", "lora_alpha", "scaling", "lora_dropout")
def __init__(self, base_layer: nn.Module, **kwargs) -> None:
self.base_layer = base_layer
self.r = {}
self.lora_alpha = {}
self.scaling = {}
self.lora_dropout = nn.ModuleDict({})
self.lora_A = nn.ModuleDict({})
self.lora_B = nn.ModuleDict({})
# For Embedding layer
self.lora_embedding_A = nn.ParameterDict({})
self.lora_embedding_B = nn.ParameterDict({})
# Mark the weight as unmerged
self._disable_adapters = False
self.merged_adapters = []
self.use_dora: dict[str, bool] = {}
self.lora_magnitude_vector: Optional[torch.nn.ParameterDict] = None # for DoRA
self._caches: dict[str, Any] = {}
self.kwargs = kwargs
```
In the `__init__` code used by all `LoraLayer` classes in PEFT, there are a bunch of parameters used to initialize the model, but only a few are relevant for the checkpoint file: `lora_A`, `lora_B`, `lora_embedding_A`, and `lora_embedding_B`. These parameters are listed in the class attribute `adapter_layer_names` and contain the learnable parameters, so they must be included in the checkpoint file. All the other parameters, like the rank `r`, are derived from the `adapter_config.json` and must be included there (unless the default value is used).
Let's check the `state_dict` of a PEFT LoRA model applied to BERT. When printing the first five keys using the default LoRA settings (the remaining keys are the same, just with different layer numbers), we get:
- `base_model.model.encoder.layer.0.attention.self.query.lora_A.weight`
- `base_model.model.encoder.layer.0.attention.self.query.lora_B.weight`
- `base_model.model.encoder.layer.0.attention.self.value.lora_A.weight`
- `base_model.model.encoder.layer.0.attention.self.value.lora_B.weight`
- `base_model.model.encoder.layer.1.attention.self.query.lora_A.weight`
- etc.
Let's break this down:
- By default, for BERT models, LoRA is applied to the `query` and `value` layers of the attention module. This is why you see `attention.self.query` and `attention.self.value` in the key names for each layer.
- LoRA decomposes the weights into two low-rank matrices, `lora_A` and `lora_B`. This is where `lora_A` and `lora_B` come from in the key names.
- These LoRA matrices are implemented as `nn.Linear` layers, so the parameters are stored in the `.weight` attribute (`lora_A.weight`, `lora_B.weight`).
- By default, LoRA isn't applied to BERT's embedding layer, so there are _no entries_ for `lora_A_embedding` and `lora_B_embedding`.
- The keys of the `state_dict` always start with `"base_model.model."`. The reason is that, in PEFT, we wrap the base model inside a tuner-specific model (`LoraModel` in this case), which itself is wrapped in a general PEFT model (`PeftModel`). For this reason, these two prefixes are added to the keys. When converting to the PEFT format, it is required to add these prefixes.
<Tip>
This last point is not true for prefix tuning techniques like prompt tuning. There, the extra embeddings are directly stored in the `state_dict` without any prefixes added to the keys.
</Tip>
When inspecting the parameter names in the loaded model, you might be surprised to find that they look a bit different, e.g. `base_model.model.encoder.layer.0.attention.self.query.lora_A.default.weight`. The difference is the *`.default`* part in the second to last segment. This part exists because PEFT generally allows the addition of multiple adapters at once (using an `nn.ModuleDict` or `nn.ParameterDict` to store them). For example, if you add another adapter called "other", the key for that adapter would be `base_model.model.encoder.layer.0.attention.self.query.lora_A.other.weight`.
When you call [`~PeftModel.save_pretrained`], the adapter name is stripped from the keys. The reason is that the adapter name is not an important part of the model architecture; it is just an arbitrary name. When loading the adapter, you could choose a totally different name, and the model would still work the same way. This is why the adapter name is not stored in the checkpoint file.
<Tip>
If you call `save_pretrained("some/path")` and the adapter name is not `"default"`, the adapter is stored in a sub-directory with the same name as the adapter. So if the name is "other", it would be stored inside of `some/path/other`.
</Tip>
In some circumstances, deciding which values to add to the checkpoint file can become a bit more complicated. For example, in PEFT, DoRA is implemented as a special case of LoRA. If you want to convert a DoRA model to PEFT, you should create a LoRA checkpoint with extra entries for DoRA. You can see this in the `__init__` of the previous `LoraLayer` code:
```python
self.lora_magnitude_vector: Optional[torch.nn.ParameterDict] = None # for DoRA
```
This indicates that there is an optional extra parameter per layer for DoRA.
### adapter_config
All the other information needed to load a PEFT model is contained in the `adapter_config.json` file. Let's check this file for a LoRA model applied to BERT:
```json
{
"alpha_pattern": {},
"auto_mapping": {
"base_model_class": "BertModel",
"parent_library": "transformers.models.bert.modeling_bert"
},
"base_model_name_or_path": "bert-base-uncased",
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layer_replication": null,
"layers_pattern": null,
"layers_to_transform": null,
"loftq_config": {},
"lora_alpha": 8,
"lora_dropout": 0.0,
"megatron_config": null,
"megatron_core": "megatron.core",
"modules_to_save": null,
"peft_type": "LORA",
"r": 8,
"rank_pattern": {},
"revision": null,
"target_modules": [
"query",
"value"
],
"task_type": null,
"use_dora": false,
"use_rslora": false
}
```
This contains a lot of entries, and at first glance, it could feel overwhelming to figure out all the right values to put in there. However, most of the entries are not necessary to load the model. This is either because they use the default values and don't need to be added or because they only affect the initialization of the LoRA weights, which is irrelevant when it comes to loading the model. If you find that you don't know what a specific parameter does, e.g., `"use_rslora",` don't add it, and you should be fine. Also note that as more options are added, this file will get more entries in the future, but it should be backward compatible.
At the minimum, you should include the following entries:
```json
{
"target_modules": ["query", "value"],
"peft_type": "LORA"
}
```
However, adding as many entries as possible, like the rank `r` or the `base_model_name_or_path` (if it's a Transformers model) is recommended. This information can help others understand the model better and share it more easily. To check which keys and values are expected, check out the [config.py](https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/config.py) file (as an example, this is the config file for LoRA) in the PEFT source code.
## Model storage
In some circumstances, you might want to store the whole PEFT model, including the base weights. This can be necessary if, for instance, the base model is not available to the users trying to load the PEFT model. You can merge the weights first or convert it into a Transformer model.
### Merge the weights
The most straightforward way to store the whole PEFT model is to merge the adapter weights into the base weights:
```python
merged_model = model.merge_and_unload()
merged_model.save_pretrained(...)
```
There are some disadvantages to this approach, though:
- Once [`~LoraModel.merge_and_unload`] is called, you get a basic model without any PEFT-specific functionality. This means you can't use any of the PEFT-specific methods anymore.
- You cannot unmerge the weights, load multiple adapters at once, disable the adapter, etc.
- Not all PEFT methods support merging weights.
- Some PEFT methods may generally allow merging, but not with specific settings (e.g. when using certain quantization techniques).
- The whole model will be much larger than the PEFT model, as it will contain all the base weights as well.
But inference with a merged model should be a bit faster.
### Convert to a Transformers model
Another way to save the whole model, assuming the base model is a Transformers model, is to use this hacky approach to directly insert the PEFT weights into the base model and save it, which only works if you "trick" Transformers into believing the PEFT model is not a PEFT model. This only works with LoRA because other adapters are not implemented in Transformers.
```python
model = ... # the PEFT model
...
# after you finish training the model, save it in a temporary location
model.save_pretrained(<temp_location>)
# now load this model directly into a transformers model, without the PEFT wrapper
# the PEFT weights are directly injected into the base model
model_loaded = AutoModel.from_pretrained(<temp_location>)
# now make the loaded model believe that it is _not_ a PEFT model
model_loaded._hf_peft_config_loaded = False
# now when we save it, it will save the whole model
model_loaded.save_pretrained(<final_location>)
# or upload to Hugging Face Hub
model_loaded.push_to_hub(<final_location>)
```

View File

@ -0,0 +1,92 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Contribute to PEFT
We are happy to accept contributions to PEFT. If you plan to contribute, please read this to make the process as smooth as possible.
## Installation
For code contributions to PEFT, you should choose the ["source"](../install#source) installation method.
If you are new to creating a pull request, follow the [Creating a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request) guide by GitHub.
## Tests and code quality checks
Regardless of the contribution type (unless its only about the docs), you should run tests and code quality checks before creating a PR to ensure your contribution doesnt break anything and follows the project standards.
We provide a Makefile to execute the necessary tests. Run the code below for the unit test:
```sh
make test
```
Run one of the following to either only check or check and fix code quality and style:
```sh
make quality # just check
make style # check and fix
```
You can also set up [`pre-commit`](https://pre-commit.com/) to run these fixes
automatically as Git commit hooks.
```bash
$ pip install pre-commit
$ pre-commit install
```
Running all the tests can take a couple of minutes, so during development it can be more efficient to only run tests specific to your change:
```sh
pytest tests/ -k <name-of-test>
```
This should finish much quicker and allow for faster iteration. However, you should still run the whole test suite before creating a PR because your change can inadvertently break tests that at first glance are unrelated.
If your change is specific to a hardware setting (e.g., it requires CUDA), take a look at [tests/test_gpu_examples.py](https://github.com/huggingface/peft/blob/1c1c7fdaa6e6abaa53939b865dee1eded82ad032/tests/test_gpu_examples.py) and [tests/test_common_gpu.py](https://github.com/huggingface/peft/blob/1c1c7fdaa6e6abaa53939b865dee1eded82ad032/tests/test_common_gpu.py) to see if it makes sense to add tests there. If your change could have an effect on saving and loading models, please run the tests with the `--regression` flag to trigger regression tests.
It can happen that while youre working on your PR, the underlying code base changes due to other changes being merged. If that happens especially when there is a merge conflict please update your branch with the latest changes. This can be a merge or a rebase, and we'll squash and merge the PR once its ready.
## PR description
When opening a PR, please provide a nice description of the change you're proposing. If it relates to other issues or PRs, please reference them. Providing a good description not only helps the reviewers review your code better and faster, it can also be used later (as a basis) for the commit message which helps with long term maintenance of the project.
If your code makes some non-trivial changes, it may also be a good idea to add comments to the code to explain those changes. For example, if you had to iterate on your implementation multiple times because the most obvious way didnt work, its a good indication that a code comment is needed.
## Bugfixes
Please give a description of the circumstances that led to the bug. If there is an existing issue, please link to it (e.g., “Resolves #12345”).
Ideally when a bugfix is provided, it should be accompanied by a test for the bug. The test should fail with the current code and pass with the bugfix. Add a comment to the test that references the issue or PR. Without a test, it is more difficult to prevent regressions in the future.
## Add a new fine-tuning method
New parameter-efficient fine-tuning methods are developed all the time. If you would like to add a new and promising method to PEFT, please follow these steps.
1. Before you start to implement the new method, please open a GitHub issue with your proposal. This way, the maintainers can give you some early feedback.
2. Please add a link to the source (usually a paper) of the method. Some evidence should be provided there is general interest in using the method. We will not add new methods that are freshly published, but there is no evidence of demand for it.
3. When implementing the method, it makes sense to look for existing implementations that already exist as a guide. Moreover, when you structure your code, please take inspiration from the other PEFT methods. For example, if your method is similar to LoRA, it makes sense to structure your code similarly or even reuse some functions or classes where it makes sense (some code duplication is okay, but dont overdo it).
4. Ideally, in addition to the implementation of the new method, there should also be examples (notebooks, scripts), documentation, and an extensive test suite that proves the method works with a variety of tasks. However, this can be more challenging so it is acceptable to only provide the implementation and at least one working example. Documentation and tests can be added in follow up PRs.
5. Once you have something that seems to be working, dont hesitate to create a draft PR even if its not in a mergeable state yet. The maintainers are happy to give you feedback and guidance along the way.
## Add other features
It is best if you first open an issue on GitHub with a proposal to add the new feature. This way, you can discuss with the maintainers if it makes sense to add the feature before spending too much time on implementing it.
New features should generally be accompanied by tests and documentation or examples. Without the latter, users will have a hard time discovering your cool new feature.
Changes to the code should be implemented in a backward-compatible way. For example, existing code should continue to work the same way after the feature is merged.

View File

@ -0,0 +1,310 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Custom models
Some fine-tuning techniques, such as prompt tuning, are specific to language models. That means in 🤗 PEFT, it is
assumed a 🤗 Transformers model is being used. However, other fine-tuning techniques - like
[LoRA](../conceptual_guides/lora) - are not restricted to specific model types.
In this guide, we will see how LoRA can be applied to a multilayer perceptron, a computer vision model from the [timm](https://huggingface.co/docs/timm/index) library, or a new 🤗 Transformers architecture.
## Multilayer perceptron
Let's assume that we want to fine-tune a multilayer perceptron with LoRA. Here is the definition:
```python
from torch import nn
class MLP(nn.Module):
def __init__(self, num_units_hidden=2000):
super().__init__()
self.seq = nn.Sequential(
nn.Linear(20, num_units_hidden),
nn.ReLU(),
nn.Linear(num_units_hidden, num_units_hidden),
nn.ReLU(),
nn.Linear(num_units_hidden, 2),
nn.LogSoftmax(dim=-1),
)
def forward(self, X):
return self.seq(X)
```
This is a straightforward multilayer perceptron with an input layer, a hidden layer, and an output layer.
<Tip>
For this toy example, we choose an exceedingly large number of hidden units to highlight the efficiency gains
from PEFT, but those gains are in line with more realistic examples.
</Tip>
There are a few linear layers in this model that could be tuned with LoRA. When working with common 🤗 Transformers
models, PEFT will know which layers to apply LoRA to, but in this case, it is up to us as a user to choose the layers.
To determine the names of the layers to tune:
```python
print([(n, type(m)) for n, m in MLP().named_modules()])
```
This should print:
```
[('', __main__.MLP),
('seq', torch.nn.modules.container.Sequential),
('seq.0', torch.nn.modules.linear.Linear),
('seq.1', torch.nn.modules.activation.ReLU),
('seq.2', torch.nn.modules.linear.Linear),
('seq.3', torch.nn.modules.activation.ReLU),
('seq.4', torch.nn.modules.linear.Linear),
('seq.5', torch.nn.modules.activation.LogSoftmax)]
```
Let's say we want to apply LoRA to the input layer and to the hidden layer, those are `'seq.0'` and `'seq.2'`. Moreover,
let's assume we want to update the output layer without LoRA, that would be `'seq.4'`. The corresponding config would
be:
```python
from peft import LoraConfig
config = LoraConfig(
target_modules=["seq.0", "seq.2"],
modules_to_save=["seq.4"],
)
```
With that, we can create our PEFT model and check the fraction of parameters trained:
```python
from peft import get_peft_model
model = MLP()
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()
# prints trainable params: 56,164 || all params: 4,100,164 || trainable%: 1.369798866581922
```
Finally, we can use any training framework we like, or write our own fit loop, to train the `peft_model`.
For a complete example, check out [this notebook](https://github.com/huggingface/peft/blob/main/examples/multilayer_perceptron/multilayer_perceptron_lora.ipynb).
## timm models
The [timm](https://huggingface.co/docs/timm/index) library contains a large number of pretrained computer vision models.
Those can also be fine-tuned with PEFT. Let's check out how this works in practice.
To start, ensure that timm is installed in the Python environment:
```bash
python -m pip install -U timm
```
Next we load a timm model for an image classification task:
```python
import timm
num_classes = ...
model_id = "timm/poolformer_m36.sail_in1k"
model = timm.create_model(model_id, pretrained=True, num_classes=num_classes)
```
Again, we need to make a decision about what layers to apply LoRA to. Since LoRA supports 2D conv layers, and since
those are a major building block of this model, we should apply LoRA to the 2D conv layers. To identify the names of
those layers, let's look at all the layer names:
```python
print([(n, type(m)) for n, m in model.named_modules()])
```
This will print a very long list, we'll only show the first few:
```
[('', timm.models.metaformer.MetaFormer),
('stem', timm.models.metaformer.Stem),
('stem.conv', torch.nn.modules.conv.Conv2d),
('stem.norm', torch.nn.modules.linear.Identity),
('stages', torch.nn.modules.container.Sequential),
('stages.0', timm.models.metaformer.MetaFormerStage),
('stages.0.downsample', torch.nn.modules.linear.Identity),
('stages.0.blocks', torch.nn.modules.container.Sequential),
('stages.0.blocks.0', timm.models.metaformer.MetaFormerBlock),
('stages.0.blocks.0.norm1', timm.layers.norm.GroupNorm1),
('stages.0.blocks.0.token_mixer', timm.models.metaformer.Pooling),
('stages.0.blocks.0.token_mixer.pool', torch.nn.modules.pooling.AvgPool2d),
('stages.0.blocks.0.drop_path1', torch.nn.modules.linear.Identity),
('stages.0.blocks.0.layer_scale1', timm.models.metaformer.Scale),
('stages.0.blocks.0.res_scale1', torch.nn.modules.linear.Identity),
('stages.0.blocks.0.norm2', timm.layers.norm.GroupNorm1),
('stages.0.blocks.0.mlp', timm.layers.mlp.Mlp),
('stages.0.blocks.0.mlp.fc1', torch.nn.modules.conv.Conv2d),
('stages.0.blocks.0.mlp.act', torch.nn.modules.activation.GELU),
('stages.0.blocks.0.mlp.drop1', torch.nn.modules.dropout.Dropout),
('stages.0.blocks.0.mlp.norm', torch.nn.modules.linear.Identity),
('stages.0.blocks.0.mlp.fc2', torch.nn.modules.conv.Conv2d),
('stages.0.blocks.0.mlp.drop2', torch.nn.modules.dropout.Dropout),
('stages.0.blocks.0.drop_path2', torch.nn.modules.linear.Identity),
('stages.0.blocks.0.layer_scale2', timm.models.metaformer.Scale),
('stages.0.blocks.0.res_scale2', torch.nn.modules.linear.Identity),
('stages.0.blocks.1', timm.models.metaformer.MetaFormerBlock),
('stages.0.blocks.1.norm1', timm.layers.norm.GroupNorm1),
('stages.0.blocks.1.token_mixer', timm.models.metaformer.Pooling),
('stages.0.blocks.1.token_mixer.pool', torch.nn.modules.pooling.AvgPool2d),
...
('head.global_pool.flatten', torch.nn.modules.linear.Identity),
('head.norm', timm.layers.norm.LayerNorm2d),
('head.flatten', torch.nn.modules.flatten.Flatten),
('head.drop', torch.nn.modules.linear.Identity),
('head.fc', torch.nn.modules.linear.Linear)]
]
```
Upon closer inspection, we see that the 2D conv layers have names such as `"stages.0.blocks.0.mlp.fc1"` and
`"stages.0.blocks.0.mlp.fc2"`. How can we match those layer names specifically? You can write a [regular
expressions](https://docs.python.org/3/library/re.html) to match the layer names. For our case, the regex
`r".*\.mlp\.fc\d"` should do the job.
Furthermore, as in the first example, we should ensure that the output layer, in this case the classification head, is
also updated. Looking at the end of the list printed above, we can see that it's named `'head.fc'`. With that in mind,
here is our LoRA config:
```python
config = LoraConfig(target_modules=r".*\.mlp\.fc\d", modules_to_save=["head.fc"])
```
Then we only need to create the PEFT model by passing our base model and the config to `get_peft_model`:
```python
peft_model = get_peft_model(model, config)
peft_model.print_trainable_parameters()
# prints trainable params: 1,064,454 || all params: 56,467,974 || trainable%: 1.88505789139876
```
This shows us that we only need to train less than 2% of all parameters, which is a huge efficiency gain.
For a complete example, check out [this notebook](https://github.com/huggingface/peft/blob/main/examples/image_classification/image_classification_timm_peft_lora.ipynb).
## New transformers architectures
When new popular transformers architectures are released, we do our best to quickly add them to PEFT. If you come across a transformers model that is not supported out of the box, don't worry, it will most likely still work if the config is set correctly. Specifically, you have to identify the layers that should be adapted and set them correctly when initializing the corresponding config class, e.g. `LoraConfig`. Here are some tips to help with this.
As a first step, it is a good idea is to check the existing models for inspiration. You can find them inside of [constants.py](https://github.com/huggingface/peft/blob/main/src/peft/utils/constants.py) in the PEFT repository. Often, you'll find a similar architecture that uses the same names. For example, if the new model architecture is a variation of the "mistral" model and you want to apply LoRA, you can see that the entry for "mistral" in `TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING` contains `["q_proj", "v_proj"]`. This tells you that for "mistral" models, the `target_modules` for LoRA should be `["q_proj", "v_proj"]`:
```python
from peft import LoraConfig, get_peft_model
my_mistral_model = ...
config = LoraConfig(
target_modules=["q_proj", "v_proj"],
..., # other LoRA arguments
)
peft_model = get_peft_model(my_mistral_model, config)
```
If that doesn't help, check the existing modules in your model architecture with the `named_modules` method and try to identify the attention layers, especially the key, query, and value layers. Those will often have names such as `c_attn`, `query`, `q_proj`, etc. The key layer is not always adapted, and ideally, you should check whether including it results in better performance.
Additionally, linear layers are common targets to be adapted (e.g. in [QLoRA paper](https://arxiv.org/abs/2305.14314), authors suggest to adapt them as well). Their names will often contain the strings `fc` or `dense`.
If you want to add a new model to PEFT, please create an entry in [constants.py](https://github.com/huggingface/peft/blob/main/src/peft/utils/constants.py) and open a pull request on the [repository](https://github.com/huggingface/peft/pulls). Don't forget to update the [README](https://github.com/huggingface/peft#models-support-matrix) as well.
## Verify parameters and layers
You can verify whether you've correctly applied a PEFT method to your model in a few ways.
* Check the fraction of parameters that are trainable with the [`~PeftModel.print_trainable_parameters`] method. If this number is lower or higher than expected, check the model `repr` by printing the model. This shows the names of all the layer types in the model. Ensure that only the intended target layers are replaced by the adapter layers. For example, if LoRA is applied to `nn.Linear` layers, then you should only see `lora.Linear` layers being used.
```py
peft_model.print_trainable_parameters()
```
* Another way you can view the adapted layers is to use the `targeted_module_names` attribute to list the name of each module that was adapted.
```python
print(peft_model.targeted_module_names)
```
## Unsupported module types
Methods like LoRA only work if the target modules are supported by PEFT. For example, it's possible to apply LoRA to `nn.Linear` and `nn.Conv2d` layers, but not, for instance, to `nn.LSTM`. If you find a layer class you want to apply PEFT to is not supported, you can:
- define a custom mapping to dynamically dispatch custom modules in LoRA
- open an [issue](https://github.com/huggingface/peft/issues) and request the feature where maintainers will implement it or guide you on how to implement it yourself if demand for this module type is sufficiently high
### Experimental support for dynamic dispatch of custom modules in LoRA
> [!WARNING]
> This feature is experimental and subject to change, depending on its reception by the community. We will introduce a public and stable API if there is significant demand for it.
PEFT supports an experimental API for custom module types for LoRA. Let's assume you have a LoRA implementation for LSTMs. Normally, you would not be able to tell PEFT to use it, even if it would theoretically work with PEFT. However, this is possible with dynamic dispatch of custom layers.
The experimental API currently looks like this:
```python
class MyLoraLSTMLayer:
...
base_model = ... # load the base model that uses LSTMs
# add the LSTM layer names to target_modules
config = LoraConfig(..., target_modules=["lstm"])
# define a mapping from base layer type to LoRA layer type
custom_module_mapping = {nn.LSTM: MyLoraLSTMLayer}
# register the new mapping
config._register_custom_module(custom_module_mapping)
# after registration, create the PEFT model
peft_model = get_peft_model(base_model, config)
# do training
```
<Tip>
When you call [`get_peft_model`], you will see a warning because PEFT does not recognize the targeted module type. In this case, you can ignore this warning.
</Tip>
By supplying a custom mapping, PEFT first checks the base model's layers against the custom mapping and dispatches to the custom LoRA layer type if there is a match. If there is no match, PEFT checks the built-in LoRA layer types for a match.
Therefore, this feature can also be used to override existing dispatch logic, e.g. if you want to use your own LoRA layer for `nn.Linear` instead of using the one provided by PEFT.
When creating your custom LoRA module, please follow the same rules as the [existing LoRA modules](https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/layer.py). Some important constraints to consider:
- The custom module should inherit from `nn.Module` and `peft.tuners.lora.layer.LoraLayer`.
- The `__init__` method of the custom module should have the positional arguments `base_layer` and `adapter_name`. After this, there are additional `**kwargs` that you are free to use or ignore.
- The learnable parameters should be stored in an `nn.ModuleDict` or `nn.ParameterDict`, where the key corresponds to the name of the specific adapter (remember that a model can have more than one adapter at a time).
- The name of these learnable parameter attributes should start with `"lora_"`, e.g. `self.lora_new_param = ...`.
- Some methods are optional, e.g. you only need to implement `merge` and `unmerge` if you want to support weight merging.
Currently, the information about the custom module does not persist when you save the model. When loading the model, you have to register the custom modules again.
```python
# saving works as always and includes the parameters of the custom modules
peft_model.save_pretrained(<model-path>)
# loading the model later:
base_model = ...
# load the LoRA config that you saved earlier
config = LoraConfig.from_pretrained(<model-path>)
# register the custom module again, the same way as the first time
custom_module_mapping = {nn.LSTM: MyLoraLSTMLayer}
config._register_custom_module(custom_module_mapping)
# pass the config instance to from_pretrained:
peft_model = PeftModel.from_pretrained(model, tmp_path / "lora-custom-module", config=config)
```
If you use this feature and find it useful, or if you encounter problems, let us know by creating an issue or a discussion on GitHub. This allows us to estimate the demand for this feature and add a public API if it is sufficiently high.

View File

@ -0,0 +1,387 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# LoRA
LoRA is low-rank decomposition method to reduce the number of trainable parameters which speeds up finetuning large models and uses less memory. In PEFT, using LoRA is as easy as setting up a [`LoraConfig`] and wrapping it with [`get_peft_model`] to create a trainable [`PeftModel`].
This guide explores in more detail other options and features for using LoRA.
## Initialization
The initialization of LoRA weights is controlled by the parameter `init_lora_weights` in [`LoraConfig`]. By default, PEFT initializes LoRA weights with Kaiming-uniform for weight A and zeros for weight B resulting in an identity transform (same as the reference [implementation](https://github.com/microsoft/LoRA)).
It is also possible to pass `init_lora_weights="gaussian"`. As the name suggests, this initializes weight A with a Gaussian distribution and zeros for weight B (this is how [Diffusers](https://huggingface.co/docs/diffusers/index) initializes LoRA weights).
```py
from peft import LoraConfig
config = LoraConfig(init_lora_weights="gaussian", ...)
```
There is also an option to set `init_lora_weights=False` which is useful for debugging and testing. This should be the only time you use this option. When choosing this option, the LoRA weights are initialized such that they do *not* result in an identity transform.
```py
from peft import LoraConfig
config = LoraConfig(init_lora_weights=False, ...)
```
### PiSSA
[PiSSA](https://arxiv.org/abs/2404.02948) initializes the LoRA adapter using the principal singular values and singular vectors. This straightforward modification allows PiSSA to converge more rapidly than LoRA and ultimately attain superior performance. Moreover, PiSSA reduces the quantization error compared to QLoRA, leading to further enhancements.
Configure the initialization method to "pissa", which may take several minutes to execute SVD on the pre-trained model:
```python
from peft import LoraConfig
config = LoraConfig(init_lora_weights="pissa", ...)
```
Alternatively, execute fast SVD, which takes only a few seconds. The number of iterations determines the trade-off between the error and computation time:
```python
lora_config = LoraConfig(init_lora_weights="pissa_niter_[number of iters]", ...)
```
For detailed instruction on using PiSSA, please follow [these instructions](https://github.com/fxmeng/peft/tree/main/examples/pissa_finetuning).
### OLoRA
[OLoRA](https://arxiv.org/abs/2406.01775) utilizes QR decomposition to initialize the LoRA adapters. OLoRA translates the base weights of the model by a factor of their QR decompositions, i.e., it mutates the weights before performing any training on them. This approach significantly improves stability, accelerates convergence speed, and ultimately achieves superior performance.
You just need to pass a single additional option to use OLoRA:
```python
from peft import LoraConfig
config = LoraConfig(init_lora_weights="olora", ...)
```
For more advanced usage, please refer to our [documentation](https://github.com/huggingface/peft/tree/main/examples/olora_finetuning).
### LoftQ
#### Standard approach
When quantizing the base model for QLoRA training, consider using the [LoftQ initialization](https://arxiv.org/abs/2310.08659), which has been shown to improve performance when training quantized models. The idea is that the LoRA weights are initialized such that the quantization error is minimized. To use LoftQ, follow [these instructions](https://github.com/huggingface/peft/tree/main/examples/loftq_finetuning).
In general, for LoftQ to work best, it is recommended to target as many layers with LoRA as possible, since those not targeted cannot have LoftQ applied. This means that passing `LoraConfig(..., target_modules="all-linear")` will most likely give the best results. Also, you should use `nf4` as quant type in your quantization config when using 4bit quantization, i.e. `BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")`.
#### A more convenient way
An easier but more limited way to apply LoftQ initialization is to use the convenience function `replace_lora_weights_loftq`. This takes the quantized PEFT model as input and replaces the LoRA weights in-place with their LoftQ-initialized counterparts.
```python
from peft import replace_lora_weights_loftq
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(load_in_4bit=True, ...)
base_model = AutoModelForCausalLM.from_pretrained(..., quantization_config=bnb_config)
# note: don't pass init_lora_weights="loftq" or loftq_config!
lora_config = LoraConfig(task_type="CAUSAL_LM")
peft_model = get_peft_model(base_model, lora_config)
replace_lora_weights_loftq(peft_model)
```
`replace_lora_weights_loftq` also allows you to pass a `callback` argument to give you more control over which layers should be modified or not, which empirically can improve the results quite a lot. To see a more elaborate example of this, check out [this notebook](https://github.com/huggingface/peft/blob/main/examples/loftq_finetuning/LoftQ_weight_replacement.ipynb).
`replace_lora_weights_loftq` implements only one iteration step of LoftQ. This means that only the LoRA weights are updated, instead of iteratevily updating LoRA weights and quantized base model weights. This may lead to lower performance but has the advantage that we can use the original quantized weights derived from the base model, instead of having to keep an extra copy of modified quantized weights. Whether this tradeoff is worthwhile depends on the use case.
At the moment, `replace_lora_weights_loftq` has these additional limitations:
- Model files must be stored as a `safetensors` file.
- Only bitsandbytes 4bit quantization is supported.
<Tip>
Learn more about how PEFT works with quantization in the [Quantization](quantization) guide.
</Tip>
### Rank-stabilized LoRA
Another way to initialize [`LoraConfig`] is with the [rank-stabilized LoRA (rsLoRA)](https://huggingface.co/papers/2312.03732) method. The LoRA architecture scales each adapter during every forward pass by a fixed scalar which is set at initialization and depends on the rank `r`. The scalar is given by `lora_alpha/r` in the original implementation, but rsLoRA uses `lora_alpha/math.sqrt(r)` which stabilizes the adapters and increases the performance potential from using a higher `r`.
```py
from peft import LoraConfig
config = LoraConfig(use_rslora=True, ...)
```
### Weight-Decomposed Low-Rank Adaptation (DoRA)
This technique decomposes the updates of the weights into two parts, magnitude and direction. Direction is handled by normal LoRA, whereas the magnitude is handled by a separate learnable parameter. This can improve the performance of LoRA, especially at low ranks. For more information on DoRA, see https://arxiv.org/abs/2402.09353.
```py
from peft import LoraConfig
config = LoraConfig(use_dora=True, ...)
```
If parts of the model or the DoRA adapter are offloaded to CPU you can get a significant speedup at the cost of some temporary (ephemeral) VRAM overhead by using `ephemeral_gpu_offload=True` in `config.runtime_config`.
```py
from peft import LoraConfig, LoraRuntimeConfig
config = LoraConfig(use_dora=True, runtime_config=LoraRuntimeConfig(ephemeral_gpu_offload=True), ...)
```
A `PeftModel` with a DoRA adapter can also be loaded with `ephemeral_gpu_offload=True` flag using the `from_pretrained` method as well as the `load_adapter` method.
```py
from peft import PeftModel
model = PeftModel.from_pretrained(base_model, peft_model_id, ephemeral_gpu_offload=True)
```
#### Caveats
- DoRA only supports linear and Conv2d layers at the moment.
- DoRA introduces a bigger overhead than pure LoRA, so it is recommended to merge weights for inference, see [`LoraModel.merge_and_unload`].
- DoRA should work with weights quantized with bitsandbytes ("QDoRA"). However, issues have been reported when using QDoRA with DeepSpeed Zero2.
### QLoRA-style training
The default LoRA settings in PEFT add trainable weights to the query and value layers of each attention block. But [QLoRA](https://hf.co/papers/2305.14314), which adds trainable weights to all the linear layers of a transformer model, can provide performance equal to a fully finetuned model. To apply LoRA to all the linear layers, like in QLoRA, set `target_modules="all-linear"` (easier than specifying individual modules by name which can vary depending on the architecture).
```py
config = LoraConfig(target_modules="all-linear", ...)
```
### Memory efficient Layer Replication with LoRA
An approach used to improve the performance of models is to expand a model by duplicating layers in the model to build a larger model from a pretrained model of a given size. For example increasing a 7B model to a 10B model as described in the [SOLAR](https://arxiv.org/abs/2312.15166) paper. PEFT LoRA supports this kind of expansion in a memory efficient manner that supports further fine-tuning using LoRA adapters attached to the layers post replication of the layers. The replicated layers do not take additional memory as they share the underlying weights so the only additional memory required is the memory for the adapter weights. To use this feature you would create a config with the `layer_replication` argument.
```py
config = LoraConfig(layer_replication=[[0,4], [2,5]], ...)
```
Assuming the original model had 5 layers `[0, 1, 2 ,3, 4]`, this would create a model with 7 layers arranged as `[0, 1, 2, 3, 2, 3, 4]`. This follows the [mergekit](https://github.com/arcee-ai/mergekit) pass through merge convention where sequences of layers specified as start inclusive and end exclusive tuples are stacked to build the final model. Each layer in the final model gets its own distinct set of LoRA adapters.
[Fewshot-Metamath-OrcaVicuna-Mistral-10B](https://huggingface.co/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B) is an example of a model trained using this method on Mistral-7B expanded to 10B. The
[adapter_config.json](https://huggingface.co/abacusai/Fewshot-Metamath-OrcaVicuna-Mistral-10B/blob/main/adapter_config.json) shows a sample LoRA adapter config applying this method for fine-tuning.
## Optimizers
LoRA training can optionally include special purpose optimizers. Currently the only such optimizer is LoRA+.
### LoRA+ optimized LoRA
LoRA training can be optimized using [LoRA+](https://arxiv.org/abs/2402.12354), which uses different learning rates for the adapter matrices A and B, shown to increase finetuning speed by up to 2x and performance by 1-2%.
```py
from peft import LoraConfig, get_peft_model
from peft.optimizers import create_loraplus_optimizer
from transformers import Trainer
import bitsandbytes as bnb
base_model = ...
config = LoraConfig(...)
model = get_peft_model(base_model, config)
optimizer = create_loraplus_optimizer(
model=model,
optimizer_cls=bnb.optim.Adam8bit,
lr=5e-5,
loraplus_lr_ratio=16,
)
scheduler = None
...
trainer = Trainer(
...,
optimizers=(optimizer, scheduler),
)
```
## Merge LoRA weights into the base model
While LoRA is significantly smaller and faster to train, you may encounter latency issues during inference due to separately loading the base model and the LoRA adapter. To eliminate latency, use the [`~LoraModel.merge_and_unload`] function to merge the adapter weights with the base model. This allows you to use the newly merged model as a standalone model. The [`~LoraModel.merge_and_unload`] function doesn't keep the adapter weights in memory.
Below is a diagram that explains the intuition of LoRA adapter merging:
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/lora_diagram.png"/>
</div>
We show in the snippets below how to run that using PEFT.
```py
from transformers import AutoModelForCausalLM
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
peft_model_id = "alignment-handbook/zephyr-7b-sft-lora"
model = PeftModel.from_pretrained(base_model, peft_model_id)
model.merge_and_unload()
```
If you need to keep a copy of the weights so you can unmerge the adapter later or delete and load different ones, you should use the [`~LoraModel.merge_adapter`] function instead. Now you have the option to use [`~LoraModel.unmerge_adapter`] to return the base model.
```py
from transformers import AutoModelForCausalLM
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
peft_model_id = "alignment-handbook/zephyr-7b-sft-lora"
model = PeftModel.from_pretrained(base_model, peft_model_id)
model.merge_adapter()
# unmerge the LoRA layers from the base model
model.unmerge_adapter()
```
The [`~LoraModel.add_weighted_adapter`] function is useful for merging multiple LoRAs into a new adapter based on a user provided weighting scheme in the `weights` parameter. Below is an end-to-end example.
First load the base model:
```python
from transformers import AutoModelForCausalLM
from peft import PeftModel
import torch
base_model = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-v0.1", torch_dtype=torch.float16, device_map="auto"
)
```
Then we load the first adapter:
```python
peft_model_id = "alignment-handbook/zephyr-7b-sft-lora"
model = PeftModel.from_pretrained(base_model, peft_model_id, adapter_name="sft")
```
Then load a different adapter and merge it with the first one:
```python
weighted_adapter_name = "sft-dpo"
model.load_adapter("alignment-handbook/zephyr-7b-dpo-lora", adapter_name="dpo")
model.add_weighted_adapter(
adapters=["sft", "dpo"],
weights=[0.7, 0.3],
adapter_name=weighted_adapter_name,
combination_type="linear"
)
model.set_adapter(weighted_adapter_name)
```
<Tip>
There are several supported methods for `combination_type`. Refer to the [documentation](../package_reference/lora#peft.LoraModel.add_weighted_adapter) for more details. Note that "svd" as the `combination_type` is not supported when using `torch.float16` or `torch.bfloat16` as the datatype.
</Tip>
Now, perform inference:
```python
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
prompt = "Hey, are you conscious? Can you talk to me?"
inputs = tokenizer(prompt, return_tensors="pt")
inputs = {k: v.to("cuda") for k, v in inputs.items()}
with torch.no_grad():
generate_ids = model.generate(**inputs, max_length=30)
outputs = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(outputs)
```
## Load adapters
Adapters can be loaded onto a pretrained model with [`~PeftModel.load_adapter`], which is useful for trying out different adapters whose weights aren't merged. Set the active adapter weights with the [`~LoraModel.set_adapter`] function.
```py
from transformers import AutoModelForCausalLM
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
peft_model_id = "alignment-handbook/zephyr-7b-sft-lora"
model = PeftModel.from_pretrained(base_model, peft_model_id)
# load different adapter
model.load_adapter("alignment-handbook/zephyr-7b-dpo-lora", adapter_name="dpo")
# set adapter as active
model.set_adapter("dpo")
```
To return the base model, you could use [`~LoraModel.unload`] to unload all of the LoRA modules or [`~LoraModel.delete_adapter`] to delete the adapter entirely.
```py
# unload adapter
model.unload()
# delete adapter
model.delete_adapter("dpo")
```
## Inference with different LoRA adapters in the same batch
Normally, each inference batch has to use the same adapter(s) in PEFT. This can sometimes be annoying, because we may have batches that contain samples intended to be used with different LoRA adapters. For example, we could have a base model that works well in English and two more LoRA adapters, one for French and one for German. Usually, we would have to split our batches such that each batch only contains samples of one of the languages, we cannot combine different languages in the same batch.
Thankfully, it is possible to mix different LoRA adapters in the same batch using the `adapter_name` argument. Below, we show an example of how this works in practice. First, let's load the base model, English, and the two adapters, French and German, like this:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
model_id = ...
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
# load the LoRA adapter for French
peft_model = PeftModel.from_pretrained(model, <path>, adapter_name="adapter_fr")
# next, load the LoRA adapter for German
peft_model.load_adapter(<path>, adapter_name="adapter_de")
```
Now, we want to generate text on a sample that contains all three languages: The first three samples are in English, the next three are in French, and the last three are in German. We can use the `adapter_names` argument to specify which adapter to use for each sample. Since our base model is used for English, we use the special string `"__base__"` for these samples. For the next three samples, we indicate the adapter name of the French LoRA fine-tune, in this case `"adapter_fr"`. For the last three samples, we indicate the adapter name of the German LoRA fine-tune, in this case `"adapter_de"`. This way, we can use the base model and the two adapters in a single batch.
```python
inputs = tokenizer(
[
"Hello, my dog is cute",
"Hello, my cat is awesome",
"Hello, my fish is great",
"Salut, mon chien est mignon",
"Salut, mon chat est génial",
"Salut, mon poisson est super",
"Hallo, mein Hund ist süß",
"Hallo, meine Katze ist toll",
"Hallo, mein Fisch ist großartig",
],
return_tensors="pt",
padding=True,
)
adapter_names = [
"__base__", "__base__", "__base__",
"adapter_fr", "adapter_fr", "adapter_fr",
"adapter_de", "adapter_de", "adapter_de",
]
output = peft_model.generate(**inputs, adapter_names=adapter_names, max_new_tokens=20)
```
Note that the order does not matter here, i.e. the samples in the batch don't need to be grouped by adapter as in the example above. We just need to ensure that the `adapter_names` argument is aligned correctly with the samples.
Additionally, the same approach also works with the `modules_to_save` feature, which allows for saving and reusing specific neural network layers, such as custom heads for classification tasks, across different LoRA adapters.
### Caveats
Using this features has some drawbacks, namely:
- It only works for inference, not for training.
- Disabling adapters using the `with model.disable_adapter()` context takes precedence over `adapter_names`.
- You cannot pass `adapter_names` when some adapter weights where merged with base weight using the `merge_adapter` method. Please unmerge all adapters first by calling `model.unmerge_adapter()`.
- For obvious reasons, this cannot be used after calling `merge_and_unload()`, since all the LoRA adapters will be merged into the base weights in this case.
- This feature does not currently work with DoRA, so set `use_dora=False` in your `LoraConfig` if you want to use it.
- The `modules_to_save` feature is currently only supported for the layers of types `Linear`, `Embedding`, `Conv2d` and `Conv1d`.
- There is an expected overhead for inference with `adapter_names`, especially if the amount of different adapters in the batch is high. This is because the batch size is effectively reduced to the number of samples per adapter. If runtime performance is your top priority, try the following:
- Increase the batch size.
- Try to avoid having a large number of different adapters in the same batch, prefer homogeneous batches. This can be achieved by buffering samples with the same adapter and only perform inference with a small handfull of different adapters.
- Take a look at alternative implementations such as [LoRAX](https://github.com/predibase/lorax), [punica](https://github.com/punica-ai/punica), or [S-LoRA](https://github.com/S-LoRA/S-LoRA), which are specialized to work with a large number of different adapters.

View File

@ -0,0 +1,126 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Adapter injection
With PEFT, you can inject trainable adapters into any `torch` module which allows you to use adapter methods without relying on the modeling classes in PEFT. Currently, PEFT supports injecting [LoRA](../conceptual_guides/adapter#low-rank-adaptation-lora), [AdaLoRA](../conceptual_guides/adapter#adaptive-low-rank-adaptation-adalora), and [IA3](../conceptual_guides/ia3) into models because for these adapters, inplace modification of the model is sufficient for finetuning it.
Check the table below to see when you should inject adapters.
| Pros | Cons |
|---|---|
| the model is modified inplace, keeping all the original attributes and methods | manually write the `from_pretrained` and `save_pretrained` utility functions from Hugging Face to save and load adapters |
| works for any `torch` module and modality | doesn't work with any of the utility methods provided by `PeftModel` such as disabling and merging adapters |
## Creating a new PEFT model
To perform the adapter injection, use the [`inject_adapter_in_model`] method. This method takes 3 arguments, the PEFT config, the model, and an optional adapter name. You can also attach multiple adapters to the model if you call [`inject_adapter_in_model`] multiple times with different adapter names.
For example, to inject LoRA adapters into the `linear` submodule of the `DummyModel` module:
```python
import torch
from peft import inject_adapter_in_model, LoraConfig
class DummyModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.embedding = torch.nn.Embedding(10, 10)
self.linear = torch.nn.Linear(10, 10)
self.lm_head = torch.nn.Linear(10, 10)
def forward(self, input_ids):
x = self.embedding(input_ids)
x = self.linear(x)
x = self.lm_head(x)
return x
lora_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
bias="none",
target_modules=["linear"],
)
model = DummyModel()
model = inject_adapter_in_model(lora_config, model)
dummy_inputs = torch.LongTensor([[0, 1, 2, 3, 4, 5, 6, 7]])
dummy_outputs = model(dummy_inputs)
```
Print the model to see that the adapters have been correctly injected.
```bash
DummyModel(
(embedding): Embedding(10, 10)
(linear): Linear(
in_features=10, out_features=10, bias=True
(lora_dropout): ModuleDict(
(default): Dropout(p=0.1, inplace=False)
)
(lora_A): ModuleDict(
(default): Linear(in_features=10, out_features=64, bias=False)
)
(lora_B): ModuleDict(
(default): Linear(in_features=64, out_features=10, bias=False)
)
(lora_embedding_A): ParameterDict()
(lora_embedding_B): ParameterDict()
)
(lm_head): Linear(in_features=10, out_features=10, bias=True)
)
```
## Saving the model
To only save the adapter, use the [`get_peft_model_state_dict`] function:
```python
from peft import get_peft_model_state_dict
peft_state_dict = get_peft_model_state_dict(model)
print(peft_state_dict)
```
Otherwise, `model.state_dict()` returns the full state dict of the model.
## Loading the model
After loading the saved `state_dict`, it can be applied using the [`set_peft_model_state_dict`] function:
```python
from peft import set_peft_model_state_dict
model = DummyModel()
model = inject_adapter_in_model(lora_config, model)
outcome = set_peft_model_state_dict(model, peft_state_dict)
# check that there were no wrong keys
print(outcome.unexpected_keys)
```
If injecting the adapter is slow or you need to load a large number of adapters, you may use an optimization that allows to create an "empty" adapter on meta device and only fills the weights with real weights when the [`set_peft_model_state_dict`] is called. To do this, pass `low_cpu_mem_usage=True` to both [`inject_adapter_in_model`] and [`set_peft_model_state_dict`].
```python
model = DummyModel()
model = inject_adapter_in_model(lora_config, model, low_cpu_mem_usage=True)
print(model.linear.lora_A["default"].weight.device.type == "meta") # should be True
set_peft_model_state_dict(model, peft_state_dict, low_cpu_mem_usage=True)
print(model.linear.lora_A["default"].weight.device.type == "cpu") # should be True
```

View File

@ -0,0 +1,37 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Mixed adapter types
Normally, it isn't possible to mix different adapter types in 🤗 PEFT. You can create a PEFT model with two different LoRA adapters (which can have different config options), but it is not possible to combine a LoRA and LoHa adapter. With [`PeftMixedModel`] however, this works as long as the adapter types are compatible. The main purpose of allowing mixed adapter types is to combine trained adapters for inference. While it is possible to train a mixed adapter model, this has not been tested and is not recommended.
To load different adapter types into a PEFT model, use [`PeftMixedModel`] instead of [`PeftModel`]:
```py
from peft import PeftMixedModel
base_model = ... # load the base model, e.g. from transformers
# load first adapter, which will be called "default"
peft_model = PeftMixedModel.from_pretrained(base_model, <path_to_adapter1>)
peft_model.load_adapter(<path_to_adapter2>, adapter_name="other")
peft_model.set_adapter(["default", "other"])
```
The [`~PeftMixedModel.set_adapter`] method is necessary to activate both adapters, otherwise only the first adapter would be active. You can keep adding more adapters by calling [`~PeftModel.add_adapter`] repeatedly.
[`PeftMixedModel`] does not support saving and loading mixed adapters. The adapters should already be trained, and loading the model requires a script to be run each time.
## Tips
- Not all adapter types can be combined. See [`peft.tuners.mixed.COMPATIBLE_TUNER_TYPES`](https://github.com/huggingface/peft/blob/1c1c7fdaa6e6abaa53939b865dee1eded82ad032/src/peft/tuners/mixed/model.py#L35) for a list of compatible types. An error will be raised if you try to combine incompatible adapter types.
- It is possible to mix multiple adapters of the same type which can be useful for combining adapters with very different configs.
- If you want to combine a lot of different adapters, the most performant way to do it is to consecutively add the same adapter types. For example, add LoRA1, LoRA2, LoHa1, LoHa2 in this order, instead of LoRA1, LoHa1, LoRA2, and LoHa2. While the order can affect the output, there is no inherently *best* order, so it is best to choose the fastest one.

View File

@ -0,0 +1,157 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Model merging
Training a model for each task can be costly, take up storage space, and the models aren't able to learn new information to improve their performance. Multitask learning can overcome some of these limitations by training a model to learn several tasks, but it is expensive to train and designing a dataset for it is challenging. *Model merging* offers a solution to these challenges by combining multiple pretrained models into one model, giving it the combined abilities of each individual model without any additional training.
PEFT provides several methods for merging models like a linear or SVD combination. This guide focuses on two methods that are more efficient for merging LoRA adapters by eliminating redundant parameters:
* [TIES](https://hf.co/papers/2306.01708) - TrIm, Elect, and Merge (TIES) is a three-step method for merging models. First, redundant parameters are trimmed, then conflicting signs are resolved into an aggregated vector, and finally the parameters whose signs are the same as the aggregate sign are averaged. This method takes into account that some values (redundant and sign disagreement) can degrade performance in the merged model.
* [DARE](https://hf.co/papers/2311.03099) - Drop And REscale is a method that can be used to prepare for other model merging methods like TIES. It works by randomly dropping parameters according to a drop rate and rescaling the remaining parameters. This helps to reduce the number of redundant and potentially interfering parameters among multiple models.
Models are merged with the [`~LoraModel.add_weighted_adapter`] method, and the specific model merging method is specified in the `combination_type` parameter.
## Merge method
With TIES and DARE, merging is enabled by setting `combination_type` and `density` to a value of the weights to keep from the individual models. For example, let's merge three finetuned [TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T) models: [tinyllama_lora_nobots](https://huggingface.co/smangrul/tinyllama_lora_norobots), [tinyllama_lora_sql](https://huggingface.co/smangrul/tinyllama_lora_sql), and [tinyllama_lora_adcopy](https://huggingface.co/smangrul/tinyllama_lora_adcopy).
<Tip warninig={true}>
When you're attempting to merge fully trained models with TIES, you should be aware of any special tokens each model may have added to the embedding layer which are not a part of the original checkpoint's vocabulary. This may cause an issue because each model may have added a special token to the same embedding position. If this is the case, you should use the [`~transformers.PreTrainedModel.resize_token_embeddings`] method to avoid merging the special tokens at the same embedding index.
<br>
This shouldn't be an issue if you're only merging LoRA adapters trained from the same base model.
</Tip>
Load a base model and can use the [`~PeftModel.load_adapter`] method to load and assign each adapter a name:
```py
from peft import PeftConfig, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
config = PeftConfig.from_pretrained("smangrul/tinyllama_lora_norobots")
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, load_in_4bit=True, device_map="auto").eval()
tokenizer = AutoTokenizer.from_pretrained("smangrul/tinyllama_lora_norobots")
model = PeftModel.from_pretrained(model, "smangrul/tinyllama_lora_norobots", adapter_name="norobots")
_ = model.load_adapter("smangrul/tinyllama_lora_sql", adapter_name="sql")
_ = model.load_adapter("smangrul/tinyllama_lora_adcopy", adapter_name="adcopy")
```
Set the adapters, weights, `adapter_name`, `combination_type`, and `density` with the [`~LoraModel.add_weighted_adapter`] method.
<hfoptions id="merge-method">
<hfoption id="TIES">
Weight values greater than `1.0` typically produce better results because they preserve the correct scale. A good default starting value for the weights is to set all values to `1.0`.
```py
adapters = ["norobots", "adcopy", "sql"]
weights = [2.0, 1.0, 1.0]
adapter_name = "merge"
density = 0.2
model.add_weighted_adapter(adapters, weights, adapter_name, combination_type="ties", density=density)
```
</hfoption>
<hfoption id="DARE">
```py
adapters = ["norobots", "adcopy", "sql"]
weights = [2.0, 0.3, 0.7]
adapter_name = "merge"
density = 0.2
model.add_weighted_adapter(adapters, weights, adapter_name, combination_type="dare_ties", density=density)
```
</hfoption>
</hfoptions>
Set the newly merged model as the active model with the [`~LoraModel.set_adapter`] method.
```py
model.set_adapter("merge")
```
Now you can use the merged model as an instruction-tuned model to write ad copy or SQL queries!
<hfoptions id="ties">
<hfoption id="instruct">
```py
messages = [
{"role": "user", "content": "Write an essay about Generative AI."},
]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt")
inputs = {k: v.to("cuda") for k, v in inputs.items()}
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=True, top_p=0.95, temperature=0.2, repetition_penalty=1.2, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0]))
```
</hfoption>
<hfoption id="ad copy">
```py
messages = [
{"role": "system", "content": "Create a text ad given the following product and description."},
{"role": "user", "content": "Product: Sony PS5 PlayStation Console\nDescription: The PS5 console unleashes new gaming possibilities that you never anticipated."},
]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt")
inputs = {k: v.to("cuda") for k, v in inputs.items()}
outputs = model.generate(**inputs, max_new_tokens=128, do_sample=True, top_p=0.95, temperature=0.2, repetition_penalty=1.2, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0]))
```
</hfoption>
<hfoption id="SQL">
```py
text = """Table: 2-11365528-2
Columns: ['Team', 'Head Coach', 'President', 'Home Ground', 'Location']
Natural Query: Who is the Head Coach of the team whose President is Mario Volarevic?
SQL Query:"""
inputs = tokenizer(text, return_tensors="pt")
inputs = {k: v.to("cuda") for k, v in inputs.items()}
outputs = model.generate(**inputs, max_new_tokens=64, repetition_penalty=1.1, eos_token_id=tokenizer("</s>").input_ids[-1])
print(tokenizer.decode(outputs[0]))
```
</hfoption>
</hfoptions>
## Merging (IA)³ Models
The (IA)³ models facilitate linear merging of adapters. To merge adapters in an (IA)³ model, utilize the `add_weighted_adapter` method from the `IA3Model` class. This method is analogous to the `add_weighted_adapter` method used in `LoraModel`, with the key difference being the absence of the `combination_type` parameter. For example, to merge three (IA)³ adapters into a PEFT model, you would proceed as follows:
```py
adapters = ["adapter1", "adapter2", "adapter3"]
weights = [0.4, 0.3, 0.3]
adapter_name = "merge"
model.add_weighted_adapter(adapters, weights, adapter_name)
```
It is recommended that the weights sum to 1.0 to preserve the scale of the model. The merged model can then be set as the active model using the `set_adapter` method:
```py
model.set_adapter("merge")
```

View File

@ -0,0 +1,195 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Quantization
Quantization represents data with fewer bits, making it a useful technique for reducing memory-usage and accelerating inference especially when it comes to large language models (LLMs). There are several ways to quantize a model including:
* optimizing which model weights are quantized with the [AWQ](https://hf.co/papers/2306.00978) algorithm
* independently quantizing each row of a weight matrix with the [GPTQ](https://hf.co/papers/2210.17323) algorithm
* quantizing to 8-bit and 4-bit precision with the [bitsandbytes](https://github.com/TimDettmers/bitsandbytes) library
* quantizing to as low as 2-bit precision with the [AQLM](https://arxiv.org/abs/2401.06118) algorithm
However, after a model is quantized it isn't typically further trained for downstream tasks because training can be unstable due to the lower precision of the weights and activations. But since PEFT methods only add *extra* trainable parameters, this allows you to train a quantized model with a PEFT adapter on top! Combining quantization with PEFT can be a good strategy for training even the largest models on a single GPU. For example, [QLoRA](https://hf.co/papers/2305.14314) is a method that quantizes a model to 4-bits and then trains it with LoRA. This method allows you to finetune a 65B parameter model on a single 48GB GPU!
In this guide, you'll see how to quantize a model to 4-bits and train it with LoRA.
## Quantize a model
[bitsandbytes](https://github.com/TimDettmers/bitsandbytes) is a quantization library with a Transformers integration. With this integration, you can quantize a model to 8 or 4-bits and enable many other options by configuring the [`~transformers.BitsAndBytesConfig`] class. For example, you can:
* set `load_in_4bit=True` to quantize the model to 4-bits when you load it
* set `bnb_4bit_quant_type="nf4"` to use a special 4-bit data type for weights initialized from a normal distribution
* set `bnb_4bit_use_double_quant=True` to use a nested quantization scheme to quantize the already quantized weights
* set `bnb_4bit_compute_dtype=torch.bfloat16` to use bfloat16 for faster computation
```py
import torch
from transformers import BitsAndBytesConfig
config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.bfloat16,
)
```
Pass the `config` to the [`~transformers.AutoModelForCausalLM.from_pretrained`] method.
```py
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", quantization_config=config)
```
Next, you should call the [`~peft.utils.prepare_model_for_kbit_training`] function to preprocess the quantized model for training.
```py
from peft import prepare_model_for_kbit_training
model = prepare_model_for_kbit_training(model)
```
Now that the quantized model is ready, let's set up a configuration.
## LoraConfig
Create a [`LoraConfig`] with the following parameters (or choose your own):
```py
from peft import LoraConfig
config = LoraConfig(
r=16,
lora_alpha=8,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
```
Then use the [`get_peft_model`] function to create a [`PeftModel`] from the quantized model and configuration.
```py
from peft import get_peft_model
model = get_peft_model(model, config)
```
You're all set for training with whichever training method you prefer!
### LoftQ initialization
[LoftQ](https://hf.co/papers/2310.08659) initializes LoRA weights such that the quantization error is minimized, and it can improve performance when training quantized models. To get started, follow [these instructions](https://github.com/huggingface/peft/tree/main/examples/loftq_finetuning).
In general, for LoftQ to work best, it is recommended to target as many layers with LoRA as possible, since those not targeted cannot have LoftQ applied. This means that passing `LoraConfig(..., target_modules="all-linear")` will most likely give the best results. Also, you should use `nf4` as quant type in your quantization config when using 4bit quantization, i.e. `BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4")`.
### QLoRA-style training
QLoRA adds trainable weights to all the linear layers in the transformer architecture. Since the attribute names for these linear layers can vary across architectures, set `target_modules` to `"all-linear"` to add LoRA to all the linear layers:
```py
config = LoraConfig(target_modules="all-linear", ...)
```
## AQLM quantization
Additive Quantization of Language Models ([AQLM](https://arxiv.org/abs/2401.06118)) is a Large Language Models compression method. It quantizes multiple weights together and takes advantage of interdependencies between them. AQLM represents groups of 8-16 weights as a sum of multiple vector codes. This allows it to compress models down to as low as 2-bit with considerably low accuracy losses.
Since the AQLM quantization process is computationally expensive, a use of prequantized models is recommended. A partial list of available models can be found in the official aqlm [repository](https://github.com/Vahe1994/AQLM).
The models support LoRA adapter tuning. To tune the quantized model you'll need to install the `aqlm` inference library: `pip install aqlm>=1.0.2`. Finetuned LoRA adapters shall be saved separately, as merging them with AQLM quantized weights is not possible.
```py
quantized_model = AutoModelForCausalLM.from_pretrained(
"BlackSamorez/Mixtral-8x7b-AQLM-2Bit-1x16-hf-test-dispatch",
torch_dtype="auto", device_map="auto", low_cpu_mem_usage=True,
)
peft_config = LoraConfig(...)
quantized_model = get_peft_model(quantized_model, peft_config)
```
You can refer to the [Google Colab](https://colab.research.google.com/drive/12GTp1FCj5_0SnnNQH18h_2XFh9vS_guX?usp=sharing) example for an overview of AQLM+LoRA finetuning.
## EETQ quantization
You can also perform LoRA fine-tuning on EETQ quantized models. [EETQ](https://github.com/NetEase-FuXi/EETQ) package offers simple and efficient way to perform 8-bit quantization, which is claimed to be faster than the `LLM.int8()` algorithm. First, make sure that you have a transformers version that is compatible with EETQ (e.g. by installing it from latest pypi or from source).
```py
import torch
from transformers import EetqConfig
config = EetqConfig("int8")
```
Pass the `config` to the [`~transformers.AutoModelForCausalLM.from_pretrained`] method.
```py
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", quantization_config=config)
```
and create a `LoraConfig` and pass it to `get_peft_model`:
```py
from peft import LoraConfig, get_peft_model
config = LoraConfig(
r=16,
lora_alpha=8,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(model, config)
```
## HQQ quantization
The models that is quantized using Half-Quadratic Quantization of Large Machine Learning Models ([HQQ](https://mobiusml.github.io/hqq_blog/)) support LoRA adapter tuning. To tune the quantized model, you'll need to install the `hqq` library with: `pip install hqq`.
```python
from hqq.engine.hf import HQQModelForCausalLM
quantized_model = HQQModelForCausalLM.from_quantized(save_dir_or_hfhub, device='cuda')
peft_config = LoraConfig(...)
quantized_model = get_peft_model(quantized_model, peft_config)
```
Or using transformers version that is compatible with HQQ (e.g. by installing it from latest pypi or from source).
```python
from transformers import HqqConfig, AutoModelForCausalLM
quant_config = HqqConfig(nbits=4, group_size=64)
quantized_model = AutoModelForCausalLM.from_pretrained(save_dir_or_hfhub, device_map=device_map, quantization_config=quant_config)
peft_config = LoraConfig(...)
quantized_model = get_peft_model(quantized_model, peft_config)
```
## Next steps
If you're interested in learning more about quantization, the following may be helpful:
* Learn more about details about QLoRA and check out some benchmarks on its impact in the [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes) blog post.
* Read more about different quantization schemes in the Transformers [Quantization](https://hf.co/docs/transformers/main/quantization) guide.

View File

@ -0,0 +1,76 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# torch.compile
In PEFT, [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) works for some but not all features. The reason why it won't always work is because PEFT is highly dynamic in certain places (loading and switching between multiple adapters, for instance), which can cause trouble for `torch.compile`. In other places, `torch.compile` may work, but won't be as fast as expected because of graph breaks.
If you don't see an error, it doesn't necessarily mean that `torch.compile` worked correctly. It might give you an output, but the output is incorrect. This guide describes what works with `torch.compile` and what doesn't.
> [!TIP]
> Unless indicated otherwise, the default `torch.compile` settings were used.
## Training and inference with `torch.compile`
These features **work** with `torch.compile`. Everything listed below was tested with a causal LM:
- Training with `Trainer` from 🤗 transformers
- Training with a custom PyTorch loop
- Inference
- Generation
The following adapters were tested successfully:
- AdaLoRA
- BOFT
- IA³
- Layer Norm Tuning
- LoHa
- LoRA
- LoRA + DoRA
- OFT
- VeRA
- HRA
The following adapters **don't work** correctly for training or inference when using `torch.compile`:
- LoKr
- LoRA targeting embedding layers
## Advanced PEFT features with `torch.compile`
Below are some of the more advanced PEFT features that **work**. They were all tested with LoRA.
- `modules_to_save` (i.e. `config = LoraConfig(..., modules_to_save=...)`)
- Merging adapters (one or multiple)
- Merging multiple adapters into one adapter (i.e. calling `model.add_weighted_adapter(...)`)
Generally, we can expect that if a feature works correctly with LoRA and is also supported by other adapter types, it should also work for that adapter type.
The more advanced PEFT features below **don't work** in conjunction with `torch.compile`. Tests were run with LoRA:
- Using PEFT adapters with quantization (bitsandbytes)
- Inference with multiple adapters
- Unloading (i.e. calling `model.merge_and_unload()`)
- Disabling adapters (i.e. using `with model.disable_adapter()`)
- Mixed adapter batches (i.e. calling `model(batch, adapter_names=["__base__", "default", "other", ...])`)
## Test cases
All the use cases listed above are tested inside of [`peft/tests/test_torch_compile.py`](https://github.com/huggingface/peft/blob/main/tests/test_torch_compile.py). If you want to check in more detail how we tested a certain feature, please go to that file and check the test that corresponds to your use case.
> [!TIP]
> If you have another use case where you know that `torch.compile` does or does not work with PEFT, please contribute by letting us know or by opening a PR to add this use case to the covered test cases.

View File

@ -0,0 +1,286 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Troubleshooting
If you encounter any issue when using PEFT, please check the following list of common issues and their solutions.
## Examples don't work
Examples often rely on the most recent package versions, so please ensure they're up-to-date. In particular, check the following package versions:
- `peft`
- `transformers`
- `accelerate`
- `torch`
In general, you can update the package version by running this command inside your Python environment:
```bash
python -m pip install -U <package_name>
```
Installing PEFT from source is useful for keeping up with the latest developments:
```bash
python -m pip install git+https://github.com/huggingface/peft
```
## ValueError: Attempting to unscale FP16 gradients
This error probably occurred because the model was loaded with `torch_dtype=torch.float16` and then used in an automatic mixed precision (AMP) context, e.g. by setting `fp16=True` in the [`~transformers.Trainer`] class from 🤗 Transformers. The reason is that when using AMP, trainable weights should never use fp16. To make this work without loading the whole model in fp32, add the following to your code:
```python
peft_model = get_peft_model(...)
# add this:
for param in model.parameters():
if param.requires_grad:
param.data = param.data.float()
# proceed as usual
trainer = Trainer(model=peft_model, fp16=True, ...)
trainer.train()
```
Alternatively, you can use the [`~utils.cast_mixed_precision_params`] function to correctly cast the weights:
```python
from peft import cast_mixed_precision_params
peft_model = get_peft_model(...)
cast_mixed_precision_params(peft_model, dtype=torch.float16)
# proceed as usual
trainer = Trainer(model=peft_model, fp16=True, ...)
trainer.train()
```
<Tip>
Starting from PEFT verion v0.12.0, PEFT automatically promotes the dtype of adapter weights from `torch.float16` and `torch.bfloat16` to `torch.float32` where appropriate. To _prevent_ this behavior, you can pass `autocast_adapter_dtype=False` to [`~get_peft_model`], to [`~PeftModel.from_pretrained`], and to [`~PeftModel.load_adapter`].
</Tip>
## Bad results from a loaded PEFT model
There can be several reasons for getting a poor result from a loaded PEFT model which are listed below. If you're still unable to troubleshoot the problem, see if anyone else had a similar [issue](https://github.com/huggingface/peft/issues) on GitHub, and if you can't find any, open a new issue.
When opening an issue, it helps a lot if you provide a minimal code example that reproduces the issue. Also, please report if the loaded model performs at the same level as the model did before fine-tuning, if it performs at a random level, or if it is only slightly worse than expected. This information helps us identify the problem more quickly.
### Random deviations
If your model outputs are not exactly the same as previous runs, there could be an issue with random elements. For example:
1. please ensure it is in `.eval()` mode, which is important, for instance, if the model uses dropout
2. if you use [`~transformers.GenerationMixin.generate`] on a language model, there could be random sampling, so obtaining the same result requires setting a random seed
3. if you used quantization and merged the weights, small deviations are expected due to rounding errors
### Incorrectly loaded model
Please ensure that you load the model correctly. A common error is trying to load a _trained_ model with [`get_peft_model`] which is incorrect. Instead, the loading code should look like this:
```python
from peft import PeftModel, PeftConfig
base_model = ... # to load the base model, use the same code as when you trained it
config = PeftConfig.from_pretrained(peft_model_id)
peft_model = PeftModel.from_pretrained(base_model, peft_model_id)
```
### Randomly initialized layers
For some tasks, it is important to correctly configure `modules_to_save` in the config to account for randomly initialized layers.
As an example, this is necessary if you use LoRA to fine-tune a language model for sequence classification because 🤗 Transformers adds a randomly initialized classification head on top of the model. If you do not add this layer to `modules_to_save`, the classification head won't be saved. The next time you load the model, you'll get a _different_ randomly initialized classification head, resulting in completely different results.
PEFT tries to correctly guess the `modules_to_save` if you provide the `task_type` argument in the config. This should work for transformers models that follow the standard naming scheme. It is always a good idea to double check though because we can't guarantee all models follow the naming scheme.
When you load a transformers model that has randomly initialized layers, you should see a warning along the lines of:
```
Some weights of <MODEL> were not initialized from the model checkpoint at <ID> and are newly initialized: [<LAYER_NAMES>].
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
```
The mentioned layers should be added to `modules_to_save` in the config to avoid the described problem.
### Extending the vocabulary
For many language fine-tuning tasks, extending the model's vocabulary is necessary since new tokens are being introduced. This requires extending the embedding layer to account for the new tokens and also storing the embedding layer in addition to the adapter weights when saving the adapter.
Save the embedding layer by adding it to the `target_modules` of the config. The embedding layer name must follow the standard naming scheme from Transformers. For example, the Mistral config could look like this:
```python
config = LoraConfig(..., target_modules=["embed_tokens", "lm_head", "q_proj", "v_proj"])
```
Once added to `target_modules`, PEFT automatically stores the embedding layer when saving the adapter if the model has the [`~transformers.PreTrainedModel.get_input_embeddings`] and [`~transformers.PreTrainedModel.get_output_embeddings`]. This is generally the case for Transformers models.
If the model's embedding layer doesn't follow the Transformer's naming scheme, you can still save it by manually passing `save_embedding_layers=True` when saving the adapter:
```python
model = get_peft_model(...)
# train the model
model.save_pretrained("my_adapter", save_embedding_layers=True)
```
For inference, load the base model first and resize it the same way you did before you trained the model. After you've resized the base model, you can load the PEFT checkpoint.
For a complete example, please check out [this notebook](https://github.com/huggingface/peft/blob/main/examples/causal_language_modeling/peft_lora_clm_with_additional_tokens.ipynb).
### Check layer and model status
Sometimes a PEFT model can end up in a bad state, especially when handling multiple adapters. There can be some confusion around what adapters exist, which one is active, which one is merged, etc. To help investigate this issue, call the [`~peft.PeftModel.get_layer_status`] and the [`~peft.PeftModel.get_model_status`] methods.
The [`~peft.PeftModel.get_layer_status`] method gives you a detailed overview of each targeted layer's active, merged, and available adapters.
```python
>>> from transformers import AutoModel
>>> from peft import get_peft_model, LoraConfig
>>> model_id = "google/flan-t5-small"
>>> model = AutoModel.from_pretrained(model_id)
>>> model = get_peft_model(model, LoraConfig())
>>> model.get_layer_status()
[TunerLayerStatus(name='model.encoder.block.0.layer.0.SelfAttention.q',
module_type='lora.Linear',
enabled=True,
active_adapters=['default'],
merged_adapters=[],
requires_grad={'default': True},
available_adapters=['default']),
TunerLayerStatus(name='model.encoder.block.0.layer.0.SelfAttention.v',
module_type='lora.Linear',
enabled=True,
active_adapters=['default'],
merged_adapters=[],
requires_grad={'default': True},
available_adapters=['default']),
...]
>>> model.get_model_status()
TunerModelStatus(
base_model_type='T5Model',
adapter_model_type='LoraModel',
peft_types={'default': 'LORA'},
trainable_params=344064,
total_params=60855680,
num_adapter_layers=48,
enabled=True,
active_adapters=['default'],
merged_adapters=[],
requires_grad={'default': True},
available_adapters=['default'],
)
```
In the model state output, you should look out for entries that say `"irregular"`. This means PEFT detected an inconsistent state in the model. For instance, if `merged_adapters="irregular"`, it means that for at least one adapter, it was merged on some target modules but not on others. The inference results will most likely be incorrect as a result.
The best way to resolve this issue is to reload the whole model and adapter checkpoint(s). Ensure that you don't perform any incorrect operations on the model, e.g. manually merging adapters on some modules but not others.
Convert the layer status into a pandas `DataFrame` for an easier visual inspection.
```python
from dataclasses import asdict
import pandas as pd
df = pd.DataFrame(asdict(layer) for layer in model.get_layer_status())
```
It is possible to get this information for non-PEFT models if they are using PEFT layers under the hood, but some information like the `base_model_type` or the `peft_types` cannot be determined in that case. As an example, you can call this on a [diffusers](https://huggingface.co/docs/diffusers/index) model like so:
```python
>>> import torch
>>> from diffusers import StableDiffusionPipeline
>>> from peft import get_model_status, get_layer_status
>>> path = "runwayml/stable-diffusion-v1-5"
>>> lora_id = "takuma104/lora-test-text-encoder-lora-target"
>>> pipe = StableDiffusionPipeline.from_pretrained(path, torch_dtype=torch.float16)
>>> pipe.load_lora_weights(lora_id, adapter_name="adapter-1")
>>> pipe.load_lora_weights(lora_id, adapter_name="adapter-2")
>>> pipe.set_lora_device(["adapter-2"], "cuda")
>>> get_layer_status(pipe.text_encoder)
[TunerLayerStatus(name='text_model.encoder.layers.0.self_attn.k_proj',
module_type='lora.Linear',
enabled=True,
active_adapters=['adapter-2'],
merged_adapters=[],
requires_grad={'adapter-1': False, 'adapter-2': True},
available_adapters=['adapter-1', 'adapter-2'],
devices={'adapter-1': ['cpu'], 'adapter-2': ['cuda']}),
TunerLayerStatus(name='text_model.encoder.layers.0.self_attn.v_proj',
module_type='lora.Linear',
enabled=True,
active_adapters=['adapter-2'],
merged_adapters=[],
requires_grad={'adapter-1': False, 'adapter-2': True},
devices={'adapter-1': ['cpu'], 'adapter-2': ['cuda']}),
...]
>>> get_model_status(pipe.unet)
TunerModelStatus(
base_model_type='other',
adapter_model_type='None',
peft_types={},
trainable_params=797184,
total_params=861115332,
num_adapter_layers=128,
enabled=True,
active_adapters=['adapter-2'],
merged_adapters=[],
requires_grad={'adapter-1': False, 'adapter-2': True},
available_adapters=['adapter-1', 'adapter-2'],
devices={'adapter-1': ['cpu'], 'adapter-2': ['cuda']},
)
```
## Speed
### Loading adapter weights is slow
Loading adapters like LoRA weights should generally be fast compared to loading the base model. However, there can be use cases where the adapter weights are quite large or where users need to load a large number of adapters -- the loading time can add up in this case. The reason for this is that the adapter weights are first initialized and then overridden by the loaded weights, which is wasteful. To speed up the loading time, you can pass the `low_cpu_mem_usage=True` argument to [`~PeftModel.from_pretrained`] and [`~PeftModel.load_adapter`].
<Tip>
If this option works well across different use casese, it may become the default for adapter loading in the future.
</Tip>
## Reproducibility
### Models using batch norm
When loading a trained PEFT model where the base model uses batch norm (e.g. `torch.nn.BatchNorm1d` or `torch.nn.BatchNorm2d`), you may find that you cannot reproduce the exact same outputs. This is because the batch norm layers keep track of running stats during training, but these stats are not part of the PEFT checkpoint. Therefore, when you load the PEFT model, the running stats of the base model will be used (i.e. from before training with PEFT).
Depending on your use case, this may not be a big deal. If, however, you need your outputs to be 100% reproducible, you can achieve this by adding the batch norm layers to `modules_to_save`. Below is an example of this using resnet and LoRA. Notice that we set `modules_to_save=["classifier", "normalization"]`. We need the `"classifier"` argument because our task is image classification, and we add the `"normalization"` argument to ensure that the batch norm layers are saved in the PEFT checkpoint.
```python
from transformers import AutoModelForImageClassification
from peft import LoraConfig, get_peft_model
model_id = "microsoft/resnet-18"
base_model = AutoModelForImageClassification.from_pretrained(self.model_id)
config = LoraConfig(
target_modules=["convolution"],
modules_to_save=["classifier", "normalization"],
),
```
Depending on the type of model you use, the batch norm layers could have different names than `"normalization"`, so please ensure that the name matches your model architecture.

49
docs/source/index.md Normal file
View File

@ -0,0 +1,49 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# PEFT
🤗 PEFT (Parameter-Efficient Fine-Tuning) is a library for efficiently adapting large pretrained models to various downstream applications without fine-tuning all of a model's parameters because it is prohibitively costly. PEFT methods only fine-tune a small number of (extra) model parameters - significantly decreasing computational and storage costs - while yielding performance comparable to a fully fine-tuned model. This makes it more accessible to train and store large language models (LLMs) on consumer hardware.
PEFT is integrated with the Transformers, Diffusers, and Accelerate libraries to provide a faster and easier way to load, train, and use large models for inference.
<div class="mt-10">
<div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-2 md:gap-y-4 md:gap-x-5">
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="quicktour"
><div class="w-full text-center bg-gradient-to-br from-blue-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Get started</div>
<p class="text-gray-700">Start here if you're new to 🤗 PEFT to get an overview of the library's main features, and how to train a model with a PEFT method.</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./task_guides/image_classification_lora"
><div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">How-to guides</div>
<p class="text-gray-700">Practical guides demonstrating how to apply various PEFT methods across different types of tasks like image classification, causal language modeling, automatic speech recognition, and more. Learn how to use 🤗 PEFT with the DeepSpeed and Fully Sharded Data Parallel scripts.</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./conceptual_guides/lora"
><div class="w-full text-center bg-gradient-to-br from-pink-400 to-pink-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Conceptual guides</div>
<p class="text-gray-700">Get a better theoretical understanding of how LoRA and various soft prompting methods help reduce the number of trainable parameters to make training more efficient.</p>
</a>
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./package_reference/config"
><div class="w-full text-center bg-gradient-to-br from-purple-400 to-purple-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Reference</div>
<p class="text-gray-700">Technical descriptions of how 🤗 PEFT classes and methods work.</p>
</a>
</div>
</div>
<iframe
src="https://stevhliu-peft-methods.hf.space"
frameborder="0"
width="850"
height="620"
></iframe>

47
docs/source/install.md Normal file
View File

@ -0,0 +1,47 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Installation
Before you start, you will need to setup your environment, install the appropriate packages, and configure 🤗 PEFT. 🤗 PEFT is tested on **Python 3.8+**.
🤗 PEFT is available on PyPI, as well as GitHub:
## PyPI
To install 🤗 PEFT from PyPI:
```bash
pip install peft
```
## Source
New features that haven't been released yet are added every day, which also means there may be some bugs. To try them out, install from the GitHub repository:
```bash
pip install git+https://github.com/huggingface/peft
```
If you're working on contributing to the library or wish to play with the source code and see live
results as you run the code, an editable version can be installed from a locally-cloned version of the
repository:
```bash
git clone https://github.com/huggingface/peft
cd peft
pip install -e .
```

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# AdaLoRA
[AdaLoRA](https://hf.co/papers/2303.10512) is a method for optimizing the number of trainable parameters to assign to weight matrices and layers, unlike LoRA, which distributes parameters evenly across all modules. More parameters are budgeted for important weight matrices and layers while less important ones receive fewer parameters.
The abstract from the paper is:
*Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. However, common practice fine-tunes all of the parameters in a pre-trained model, which becomes prohibitive when a large number of downstream tasks are present. Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e.g., low-rank increments. These methods often evenly distribute the budget of incremental updates across all pre-trained weight matrices, and overlook the varying importance of different weight parameters. As a consequence, the fine-tuning performance is suboptimal. To bridge this gap, we propose AdaLoRA, which adaptively allocates the parameter budget among weight matrices according to their importance score. In particular, AdaLoRA parameterizes the incremental updates in the form of singular value decomposition. Such a novel approach allows us to effectively prune the singular values of unimportant updates, which is essentially to reduce their parameter budget but circumvent intensive exact SVD computations. We conduct extensive experiments with several pre-trained models on natural language processing, question answering, and natural language generation to validate the effectiveness of AdaLoRA. Results demonstrate that AdaLoRA manifests notable improvement over baselines, especially in the low budget settings. Our code is publicly available at https://github.com/QingruZhang/AdaLoRA*.
## AdaLoraConfig
[[autodoc]] tuners.adalora.config.AdaLoraConfig
## AdaLoraModel
[[autodoc]] tuners.adalora.model.AdaLoraModel

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# LyCORIS
[LyCORIS](https://hf.co/papers/2309.14859) (Lora beYond Conventional methods, Other Rank adaptation Implementations for Stable diffusion) are LoRA-like matrix decomposition adapters that modify the cross-attention layer of the UNet. The [LoHa](loha) and [LoKr](lokr) methods inherit from the `Lycoris` classes here.
## LycorisConfig
[[autodoc]] tuners.lycoris_utils.LycorisConfig
## LycorisLayer
[[autodoc]] tuners.lycoris_utils.LycorisLayer
## LycorisTuner
[[autodoc]] tuners.lycoris_utils.LycorisTuner

View File

@ -0,0 +1,48 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# AutoPeftModels
The `AutoPeftModel` classes loads the appropriate PEFT model for the task type by automatically inferring it from the configuration file. They are designed to quickly and easily load a PEFT model in a single line of code without having to worry about which exact model class you need or manually loading a [`PeftConfig`].
## AutoPeftModel
[[autodoc]] auto.AutoPeftModel
- from_pretrained
## AutoPeftModelForCausalLM
[[autodoc]] auto.AutoPeftModelForCausalLM
## AutoPeftModelForSeq2SeqLM
[[autodoc]] auto.AutoPeftModelForSeq2SeqLM
## AutoPeftModelForSequenceClassification
[[autodoc]] auto.AutoPeftModelForSequenceClassification
## AutoPeftModelForTokenClassification
[[autodoc]] auto.AutoPeftModelForTokenClassification
## AutoPeftModelForQuestionAnswering
[[autodoc]] auto.AutoPeftModelForQuestionAnswering
## AutoPeftModelForFeatureExtraction
[[autodoc]] auto.AutoPeftModelForFeatureExtraction

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# BOFT
[Orthogonal Butterfly (BOFT)](https://hf.co/papers/2311.06243) is a generic method designed for finetuning foundation models. It improves the paramter efficiency of the finetuning paradigm -- Orthogonal Finetuning (OFT), by taking inspiration from Cooley-Tukey fast Fourier transform, showing favorable results across finetuning different foundation models, including large vision transformers, large language models and text-to-image diffusion models.
The abstract from the paper is:
*Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language*.
## BOFTConfig
[[autodoc]] tuners.boft.config.BOFTConfig
## BOFTModel
[[autodoc]] tuners.boft.model.BOFTModel

View File

@ -0,0 +1,22 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Configuration
[`PeftConfigMixin`] is the base configuration class for storing the adapter configuration of a [`PeftModel`], and [`PromptLearningConfig`] is the base configuration class for soft prompt methods (p-tuning, prefix tuning, and prompt tuning). These base classes contain methods for saving and loading model configurations from the Hub, specifying the PEFT method to use, type of task to perform, and model configurations like number of layers and number of attention heads.
## PeftConfigMixin
[[autodoc]] config.PeftConfigMixin
- all
## PeftConfig
[[autodoc]] PeftConfig
- all
## PromptLearningConfig
[[autodoc]] PromptLearningConfig
- all

View File

@ -0,0 +1,38 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# FourierFT: Discrete Fourier Transformation Fine-Tuning
[FourierFT](https://huggingface.co/papers/2405.03003) is a parameter-efficient fine-tuning technique that leverages Discrete Fourier Transform to compress the model's tunable weights. This method outperforms LoRA in the GLUE benchmark and common ViT classification tasks using much less parameters.
FourierFT currently has the following constraints:
- Only `nn.Linear` layers are supported.
- Quantized layers are not supported.
If these constraints don't work for your use case, consider other methods instead.
The abstract from the paper is:
> Low-rank adaptation (LoRA) has recently gained much interest in fine-tuning foundation models. It effectively reduces the number of trainable parameters by incorporating low-rank matrices A and B to represent the weight change, i.e., Delta W=BA. Despite LoRA's progress, it faces storage challenges when handling extensive customization adaptations or larger base models. In this work, we aim to further compress trainable parameters by enjoying the powerful expressiveness of the Fourier transform. Specifically, we introduce FourierFT, which treats Delta W as a matrix in the spatial domain and learns only a small fraction of its spectral coefficients. With the trained spectral coefficients, we implement the inverse discrete Fourier transform to recover Delta W. Empirically, our FourierFT method shows comparable or better performance with fewer parameters than LoRA on various tasks, including natural language understanding, natural language generation, instruction tuning, and image classification. For example, when performing instruction tuning on the LLaMA2-7B model, FourierFT surpasses LoRA with only 0.064M trainable parameters, compared to LoRA's 33.5M.
## FourierFTConfig
[[autodoc]] tuners.fourierft.config.FourierFTConfig
## FourierFTModel
[[autodoc]] tuners.fourierft.model.FourierFTModel

View File

@ -0,0 +1,17 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Helper methods
A collection of helper functions for PEFT.
## Checking if a model is a PEFT model
[[autodoc]] helpers.check_if_peft_model
- all
## Temporarily Rescaling Adapter Scale in LoraLayer Modules
[[autodoc]] helpers.rescale_adapter_scale
- all

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# IA3
Infused Adapter by Inhibiting and Amplifying Inner Activations, or [IA3](https://hf.co/papers/2205.05638), is a method that adds three learned vectors to rescale the keys and values of the self-attention and encoder-decoder attention layers, and the intermediate activation of the position-wise feed-forward network.
The abstract from the paper is:
*Few-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unseen task without any gradient-based training by feeding a small number of training examples as part of the input. ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. Parameter-efficient fine-tuning (PEFT) (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and PEFT and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs. Along the way, we introduce a new PEFT method called (IA)^3 that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters. We also propose a simple recipe based on the T0 model called T-Few that can be applied to new tasks without task-specific tuning or modifications. We validate the effectiveness of T-Few on completely unseen tasks by applying it to the RAFT benchmark, attaining super-human performance for the first time and outperforming the state-of-the-art by 6% absolute. All of the code used in our experiments is publicly available*.
## IA3Config
[[autodoc]] tuners.ia3.config.IA3Config
## IA3Model
[[autodoc]] tuners.ia3.model.IA3Model

View File

@ -0,0 +1,34 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# LayerNorm Tuning
LayerNorm Tuning ([LN Tuning](https://huggingface.co/papers/2312.11420)) is a PEFT method that only fine-tunes the parameters of the LayerNorm layers in a model.
The paper has tested the performance of this method on large language models and has shown that it can achieve strong performance with a significant reduction in the number of trainable parameters and GPU memory usage.
However, the method is not limited to language models and can be applied to any model that uses LayerNorm layers.
In this implementation, the default is that all layernorm layers inside a model is finetuned, but it could be used to target other layer types such as `MLP` or `Attention` layers, this can be done by specifying the `target_modules` in the `LNTuningConfig`.
The abstract from the paper is:
*This paper introduces an efficient strategy to transform Large Language Models (LLMs) into Multi-Modal Large Language Models (MLLMs). By conceptualizing this transformation as a domain adaptation process, i.e., transitioning from text understanding to embracing multiple modalities, we intriguingly note that, within each attention block, tuning LayerNorm suffices to yield strong performance. Moreover, when benchmarked against other tuning approaches like full parameter finetuning or LoRA, its benefits on efficiency are substantial. For example, when compared to LoRA on a 13B model scale, performance can be enhanced by an average of over 20% across five multi-modal tasks, and meanwhile, results in a significant reduction of trainable parameters by 41.9% and a decrease in GPU memory usage by 17.6%. On top of this LayerNorm strategy, we showcase that selectively tuning only with conversational data can improve efficiency further. Beyond these empirical outcomes, we provide a comprehensive analysis to explore the role of LayerNorm in adapting LLMs to the multi-modal domain and improving the expressive power of the model.*
## LNTuningConfig
[[autodoc]] tuners.ln_tuning.config.LNTuningConfig
## LNTuningModel
[[autodoc]] tuners.ln_tuning.model.LNTuningModel

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Llama-Adapter
[Llama-Adapter](https://hf.co/papers/2303.16199) is a PEFT method specifically designed for turning Llama into an instruction-following model. The Llama model is frozen and only a set of adaptation prompts prefixed to the input instruction tokens are learned. Since randomly initialized modules inserted into the model can cause the model to lose some of its existing knowledge, Llama-Adapter uses zero-initialized attention with zero gating to progressively add the instructional prompts to the model.
The abstract from the paper is:
*We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the input text tokens at higher transformer layers. Then, a zero-init attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With efficient training, LLaMA-Adapter generates high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Furthermore, our approach can be simply extended to multi-modal input, e.g., images, for image-conditioned LLaMA, which achieves superior reasoning capacity on ScienceQA. We release our code at https://github.com/ZrrSkywalker/LLaMA-Adapter*.
## AdaptionPromptConfig
[[autodoc]] tuners.adaption_prompt.config.AdaptionPromptConfig
## AdaptionPromptModel
[[autodoc]] tuners.adaption_prompt.model.AdaptionPromptModel

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# LoHa
Low-Rank Hadamard Product ([LoHa](https://huggingface.co/papers/2108.06098)), is similar to LoRA except it approximates the large weight matrix with more low-rank matrices and combines them with the Hadamard product. This method is even more parameter-efficient than LoRA and achieves comparable performance.
The abstract from the paper is:
*In this work, we propose a communication-efficient parameterization, FedPara, for federated learning (FL) to overcome the burdens on frequent model uploads and downloads. Our method re-parameterizes weight parameters of layers using low-rank weights followed by the Hadamard product. Compared to the conventional low-rank parameterization, our FedPara method is not restricted to low-rank constraints, and thereby it has a far larger capacity. This property enables to achieve comparable performance while requiring 3 to 10 times lower communication costs than the model with the original layers, which is not achievable by the traditional low-rank methods. The efficiency of our method can be further improved by combining with other efficient FL optimizers. In addition, we extend our method to a personalized FL application, pFedPara, which separates parameters into global and local ones. We show that pFedPara outperforms competing personalized FL methods with more than three times fewer parameters*.
## LoHaConfig
[[autodoc]] tuners.loha.config.LoHaConfig
## LoHaModel
[[autodoc]] tuners.loha.model.LoHaModel

View File

@ -0,0 +1,27 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# LoKr
Low-Rank Kronecker Product ([LoKr](https://hf.co/papers/2309.14859)), is a LoRA-variant method that approximates the large weight matrix with two low-rank matrices and combines them with the Kronecker product. LoKr also provides an optional third low-rank matrix to provide better control during fine-tuning.
## LoKrConfig
[[autodoc]] tuners.lokr.config.LoKrConfig
## LoKrModel
[[autodoc]] tuners.lokr.model.LoKrModel

View File

@ -0,0 +1,35 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# LoRA
Low-Rank Adaptation ([LoRA](https://huggingface.co/papers/2309.15223)) is a PEFT method that decomposes a large matrix into two smaller low-rank matrices in the attention layers. This drastically reduces the number of parameters that need to be fine-tuned.
The abstract from the paper is:
*We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we present a method based on low-rank decomposition to train a rescoring BERT model and adapt it to new domains using only a fraction (0.08%) of the pretrained parameters. These inserted matrices are optimized through a discriminative training objective along with a correlation-based regularization loss. The proposed low-rank adaptation Rescore-BERT (LoRB) architecture is evaluated on LibriSpeech and internal datasets with decreased training times by factors between 5.4 and 3.6.*.
## LoraConfig
[[autodoc]] tuners.lora.config.LoraConfig
## LoraModel
[[autodoc]] tuners.lora.model.LoraModel
## Utility
[[autodoc]] utils.loftq_utils.replace_lora_weights_loftq

View File

@ -0,0 +1,33 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Model merge
PEFT provides several internal utilities for [merging LoRA adapters](../developer_guides/model_merging) with the TIES and DARE methods.
[[autodoc]] utils.merge_utils.prune
[[autodoc]] utils.merge_utils.calculate_majority_sign_mask
[[autodoc]] utils.merge_utils.disjoint_merge
[[autodoc]] utils.merge_utils.task_arithmetic
[[autodoc]] utils.merge_utils.ties
[[autodoc]] utils.merge_utils.dare_linear
[[autodoc]] utils.merge_utils.dare_ties

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Multitask prompt tuning
[Multitask prompt tuning](https://huggingface.co/papers/2303.02861) decomposes the soft prompts of each task into a single learned transferable prompt instead of a separate prompt for each task. The single learned prompt can be adapted for each task by multiplicative low rank updates.
The abstract from the paper is:
*Prompt tuning, in which a base pretrained model is adapted to each task via conditioning on learned prompt vectors, has emerged as a promising approach for efficiently adapting large language models to multiple downstream tasks. However, existing methods typically learn soft prompt vectors from scratch, and it has not been clear how to exploit the rich cross-task knowledge with prompt vectors in a multitask learning setting. We propose multitask prompt tuning (MPT), which first learns a single transferable prompt by distilling knowledge from multiple task-specific source prompts. We then learn multiplicative low rank updates to this shared prompt to efficiently adapt it to each downstream target task. Extensive experiments on 23 NLP datasets demonstrate that our proposed approach outperforms the state-of-the-art methods, including the full finetuning baseline in some cases, despite only tuning 0.035% as many task-specific parameters*.
## MultitaskPromptTuningConfig
[[autodoc]] tuners.multitask_prompt_tuning.config.MultitaskPromptTuningConfig
## MultitaskPromptEmbedding
[[autodoc]] tuners.multitask_prompt_tuning.model.MultitaskPromptEmbedding

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# OFT
[Orthogonal Finetuning (OFT)](https://hf.co/papers/2306.07280) is a method developed for adapting text-to-image diffusion models. It works by reparameterizing the pretrained weight matrices with it's orthogonal matrix to preserve information in the pretrained model. To reduce the number of parameters, OFT introduces a block-diagonal structure in the orthogonal matrix.
The abstract from the paper is:
*Large text-to-image diffusion models have impressive capabilities in generating photorealistic images from text prompts. How to effectively guide or control these powerful models to perform different downstream tasks becomes an important open problem. To tackle this challenge, we introduce a principled finetuning method -- Orthogonal Finetuning (OFT), for adapting text-to-image diffusion models to downstream tasks. Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere. We find that this property is crucial for preserving the semantic generation ability of text-to-image diffusion models. To improve finetuning stability, we further propose Constrained Orthogonal Finetuning (COFT) which imposes an additional radius constraint to the hypersphere. Specifically, we consider two important finetuning text-to-image tasks: subject-driven generation where the goal is to generate subject-specific images given a few images of a subject and a text prompt, and controllable generation where the goal is to enable the model to take in additional control signals. We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed*.
## OFTConfig
[[autodoc]] tuners.oft.config.OFTConfig
## OFTModel
[[autodoc]] tuners.oft.model.OFTModel

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# P-tuning
[P-tuning](https://hf.co/papers/2103.10385) adds trainable prompt embeddings to the input that is optimized by a prompt encoder to find a better prompt, eliminating the need to manually design prompts. The prompt tokens can be added anywhere in the input sequence, and p-tuning also introduces anchor tokens for improving performance.
The abstract from the paper is:
*While GPTs with traditional fine-tuning fail to achieve strong results on natural language understanding (NLU), we show that GPTs can be better than or comparable to similar-sized BERTs on NLU tasks with a novel method P-tuning -- which employs trainable continuous prompt embeddings. On the knowledge probing (LAMA) benchmark, the best GPT recovers 64\% (P@1) of world knowledge without any additional text provided during test time, which substantially improves the previous best by 20+ percentage points. On the SuperGlue benchmark, GPTs achieve comparable and sometimes better performance to similar-sized BERTs in supervised learning. Importantly, we find that P-tuning also improves BERTs' performance in both few-shot and supervised settings while largely reducing the need for prompt engineering. Consequently, P-tuning outperforms the state-of-the-art approaches on the few-shot SuperGlue benchmark.*.
## PromptEncoderConfig
[[autodoc]] tuners.p_tuning.config.PromptEncoderConfig
## PromptEncoder
[[autodoc]] tuners.p_tuning.model.PromptEncoder

View File

@ -0,0 +1,77 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Models
[`PeftModel`] is the base model class for specifying the base Transformer model and configuration to apply a PEFT method to. The base `PeftModel` contains methods for loading and saving models from the Hub.
## PeftModel
[[autodoc]] PeftModel
- all
## PeftModelForSequenceClassification
A `PeftModel` for sequence classification tasks.
[[autodoc]] PeftModelForSequenceClassification
- all
## PeftModelForTokenClassification
A `PeftModel` for token classification tasks.
[[autodoc]] PeftModelForTokenClassification
- all
## PeftModelForCausalLM
A `PeftModel` for causal language modeling.
[[autodoc]] PeftModelForCausalLM
- all
## PeftModelForSeq2SeqLM
A `PeftModel` for sequence-to-sequence language modeling.
[[autodoc]] PeftModelForSeq2SeqLM
- all
## PeftModelForQuestionAnswering
A `PeftModel` for question answering.
[[autodoc]] PeftModelForQuestionAnswering
- all
## PeftModelForFeatureExtraction
A `PeftModel` for getting extracting features/embeddings from transformer models.
[[autodoc]] PeftModelForFeatureExtraction
- all
## PeftMixedModel
A `PeftModel` for mixing different adapter types (e.g. LoRA and LoHa).
[[autodoc]] PeftMixedModel
- all
## Utilities
[[autodoc]] utils.cast_mixed_precision_params
[[autodoc]] get_peft_model
[[autodoc]] inject_adapter_in_model
[[autodoc]] utils.get_peft_model_state_dict
[[autodoc]] utils.prepare_model_for_kbit_training
[[autodoc]] get_layer_status
[[autodoc]] get_model_status

View File

@ -0,0 +1,27 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# PEFT types
[`PeftType`] includes the supported adapters in PEFT, and [`TaskType`] includes PEFT-supported tasks.
## PeftType
[[autodoc]] utils.peft_types.PeftType
## TaskType
[[autodoc]] utils.peft_types.TaskType

View File

@ -0,0 +1,44 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Polytropon
[Polytropon](https://hf.co/papers/2202.13914) is a multitask model with a number of different LoRA adapters in it's "inventory". The model learns the correct combination of adapters from the inventory with a routing function to choose the best subset of modules for a specific task. PEFT also supports [Multi-Head Adapter Routing (MHR)](https://hf.co/papers/2211.03831) for Polytropon which builds on and improves the routing function by combining the adapter heads more granularly. The adapter heads are separated into disjoint blocks and a different routing function is learned for each one, allowing for more expressivity.
<hfoptions id="paper">
<hfoption id="Combining Modular Skills in Multitask Learning">
The abstract from the paper is:
*A modular design encourages neural models to disentangle and recombine different facets of knowledge to generalise more systematically to new tasks. In this work, we assume that each task is associated with a subset of latent discrete skills from a (potentially small) inventory. In turn, skills correspond to parameter-efficient (sparse / low-rank) model parameterisations. By jointly learning these and a task-skill allocation matrix, the network for each task is instantiated as the average of the parameters of active skills. To favour non-trivial soft partitions of skills across tasks, we experiment with a series of inductive biases, such as an Indian Buffet Process prior and a two-speed learning rate. We evaluate our latent-skill model on two main settings: 1) multitask reinforcement learning for grounded instruction following on 8 levels of the BabyAI platform; and 2) few-shot adaptation of pre-trained text-to-text generative models on CrossFit, a benchmark comprising 160 NLP tasks. We find that the modular design of a network significantly increases sample efficiency in reinforcement learning and few-shot generalisation in supervised learning, compared to baselines with fully shared, task-specific, or conditionally generated parameters where knowledge is entangled across tasks. In addition, we show how discrete skills help interpretability, as they yield an explicit hierarchy of tasks.*
</hfoption>
<hfoption id="Multi-Head Adapter Routing for Cross-Task Generalization">
The abstract from the paper is:
*Parameter-efficient fine-tuning (PEFT) for cross-task generalization consists in pre-training adapters on a multi-task training set before few-shot adaptation to test tasks. Polytropon [Ponti et al., 2023] (Poly) jointly learns an inventory of adapters and a routing function that selects a (variable-size) subset of adapters for each task during both pre-training and few-shot adaptation. In this paper, we investigate the role that adapter routing plays in its success and design new variants based on our findings. First, we build on the intuition that finer-grained routing provides more expressivity. Hence, we propose MHR (Multi-Head Routing), which combines subsets of adapter parameters and outperforms Poly under a comparable parameter budget; by only fine-tuning the routing function and not the adapters (MHR-z), we achieve competitive performance with extreme parameter efficiency. Second, we find that Poly/MHR performance is a result of better multi-task optimization, rather than modular inductive biases that facilitate adapter recombination and local adaptation, as previously hypothesized. In fact, we find that MHR exhibits higher gradient alignment between tasks than any other method. Since this implies that routing is only crucial during multi-task pre-training, we propose MHR-mu, which discards routing and fine-tunes the average of the pre-trained adapters during few-shot adaptation. This establishes MHR-mu as an effective method for single-adapter fine-tuning.*.
</hfoption>
</hfoptions>
## PolyConfig
[[autodoc]] tuners.poly.config.PolyConfig
## PolyModel
[[autodoc]] tuners.poly.model.PolyModel

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Prefix tuning
[Prefix tuning](https://hf.co/papers/2101.00190) prefixes a series of task-specific vectors to the input sequence that can be learned while keeping the pretrained model frozen. The prefix parameters are inserted in all of the model layers.
The abstract from the paper is:
*Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks. However, it modifies all the language model parameters and therefore necessitates storing a full copy for each task. In this paper, we propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks, which keeps language model parameters frozen, but optimizes a small continuous task-specific vector (called the prefix). Prefix-tuning draws inspiration from prompting, allowing subsequent tokens to attend to this prefix as if it were "virtual tokens". We apply prefix-tuning to GPT-2 for table-to-text generation and to BART for summarization. We find that by learning only 0.1\% of the parameters, prefix-tuning obtains comparable performance in the full data setting, outperforms fine-tuning in low-data settings, and extrapolates better to examples with topics unseen during training*.
## PrefixTuningConfig
[[autodoc]] tuners.prefix_tuning.config.PrefixTuningConfig
## PrefixEncoder
[[autodoc]] tuners.prefix_tuning.model.PrefixEncoder

View File

@ -0,0 +1,31 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Prompt tuning
[Prompt tuning](https://hf.co/papers/2104.08691) adds task-specific prompts to the input, and these prompt parameters are updated independently of the pretrained model parameters which are frozen.
The abstract from the paper is:
*In this work, we explore "prompt tuning", a simple yet effective mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks. Unlike the discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signal from any number of labeled examples. Our end-to-end learned approach outperforms GPT-3's "few-shot" learning by a large margin. More remarkably, through ablations on model size using T5, we show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, our method "closes the gap" and matches the strong performance of model tuning (where all model weights are tuned). This finding is especially relevant in that large models are costly to share and serve, and the ability to reuse one frozen model for multiple downstream tasks can ease this burden. Our method can be seen as a simplification of the recently proposed "prefix tuning" of Li and Liang (2021), and we provide a comparison to this and other similar approaches. Finally, we show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer, as compared to full model tuning*.
## PromptTuningConfig
[[autodoc]] tuners.prompt_tuning.config.PromptTuningConfig
## PromptEmbedding
[[autodoc]] tuners.prompt_tuning.model.PromptEmbedding

View File

@ -0,0 +1,27 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Tuners
A tuner (or adapter) is a module that can be plugged into a `torch.nn.Module`. [`BaseTuner`] base class for other tuners and provides shared methods and attributes for preparing an adapter configuration and replacing a target module with the adapter module. [`BaseTunerLayer`] is a base class for adapter layers. It offers methods and attributes for managing adapters such as activating and disabling adapters.
## BaseTuner
[[autodoc]] tuners.tuners_utils.BaseTuner
## BaseTunerLayer
[[autodoc]] tuners.tuners_utils.BaseTunerLayer

View File

@ -0,0 +1,40 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks
## Overview
[VB-LoRA](https://arxiv.org/abs/2405.15179) is a parameter-efficient fine-tuning technique that extends LoRA by learning a fine-grained parameter-sharing scheme at the sub-vector level, achieving significantly higher parameter efficiency. This makes VB-LoRA especially useful in scenarios where storage and transmission costs are critical. It works by decomposing low-rank matrices—from different layers and modules such as K, Q, V, and FFN—into sub-vectors, which are then globally shared through a vector bank.
The abstract from the paper is:
*As the adoption of large language models increases and the need for per-user or per-task model customization grows, the parameter-efficient fine-tuning (PEFT) methods, such as low-rank adaptation (LoRA) and its variants, incur substantial storage and transmission costs. To further reduce stored parameters, we introduce a "divide-and-share" paradigm that breaks the barriers of low-rank decomposition across matrix dimensions, modules and layers by sharing parameters globally via a vector bank. As an instantiation of the paradigm to LoRA, our proposed VB-LoRA composites all the low-rank matrices of LoRA from a shared vector bank with a differentiable top-k admixture module. VB-LoRA achieves extreme parameter efficiency while maintaining comparable or better performance compared to state-of-the-art PEFT methods. Extensive experiments demonstrate the effectiveness of VB-LoRA on natural language understanding, natural language generation, and instruction tuning tasks. When fine-tuning the Llama2-13B model, VB-LoRA only uses 0.4% of LoRA's stored parameters, yet achieves superior results.*
## Usage Tips
- VB-LoRA utilizes a sparse top-k module to learn the sharing machanism. When saving adapter parameters, you can either save only the top-k weights and their indices by setting `save_only_topk_weights = True` in `VBLoRAConfig`, or save all the trainable logits by setting it to `False`. Enabling `save_only_topk_weights = True` significantly reduces storage space; for instance, in Llama2-7B, the storage file size decreases from 308MB to 2.5MB. Note that models saved with `save_only_topk_weights = True` are intended for merging or inference only and cannot be used to resume training.
- VB-LoRA has two sets of training parameters: vector bank parameters and logit parameters. In practice, we found that logit parameters require a higher learning rate, while vector bank parameters require a lower learning rate. When using the AdamW optimizer, typical learning rates are 0.01 for logits and 0.001 for vector bank parameters.
## VBLoRAConfig
[[autodoc]] tuners.vblora.config.VBLoRAConfig
## VBLoRAModel
[[autodoc]] tuners.vblora.model.VBLoRAModel

View File

@ -0,0 +1,42 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# VeRA: Vector-based Random Matrix Adaptation
[VeRA](https://huggingface.co/papers/2310.11454) is a parameter-efficient fine-tuning technique that is similar to LoRA but requires even fewer extra parameters while promising similar or even better performance. As such, it is particularly useful when the parameter budget is very limited, e.g. when scaling to very large models. The reduction of the count of trainable parameters is achieved by sharing the same low-rank matrices across all layers, and only training two additional vectors per layer.
When saving the adapter parameters, it's possible to eschew storing the low rank matrices by setting `save_projection=False` on the `VeraConfig`. In that case, these matrices will be restored based on the fixed random seed from the `projection_prng_key` argument. This cuts down on the size of the checkpoint, but we cannot guarantee reproducibility on all devices and for all future versions of PyTorch. If you want to ensure reproducibility, set `save_projection=True` (which is the default).
To handle different shapes of adapted layers, VeRA initializes shared A and B matrices with the largest required size for each dimension. During the forward pass, submatrices A and B for a given layer are sliced out from these shared matrices and used as described in the paper. For example, adapting two linear layers of shapes (100, 20) and (80, 50) will create A and B matrices of shapes (rank, 50) and (100, rank) respectively. Then, to adapt a layer of shape (100, 20), submatrices A and B of shapes (rank, 20) and (100, rank) will be extracted.
VeRA currently has the following constraints:
- Only `nn.Linear` layers are supported.
- Quantized layers are not supported.
If these constraints don't work for your use case, use LoRA instead.
The abstract from the paper is:
> Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even larger models or deploying numerous per-user or per-task adapted models. In this work, we present Vector-based Random Matrix Adaptation (VeRA), which significantly reduces the number of trainable parameters compared to LoRA, yet maintains the same performance. It achieves this by using a single pair of low-rank matrices shared across all layers and learning small scaling vectors instead. We demonstrate its effectiveness on the GLUE and E2E benchmarks, image classification tasks, and show its application in instruction-tuning of 7B and 13B language models.
## VeRAConfig
[[autodoc]] tuners.vera.config.VeraConfig
## VeRAModel
[[autodoc]] tuners.vera.model.VeraModel

View File

@ -0,0 +1,56 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# X-LoRA
Mixture of LoRA Experts ([X-LoRA](https://arxiv.org/abs/2402.07148)) is a PEFT method enabling sparse or dense mixture of LoRA experts based on a high granularity (token, layer, sequence) scalings matrix. This leverages frozen LoRA adapters and a frozen base model to drastically reduces the number of parameters that need to be fine-tuned.
A unique aspect of X-LoRA is its versatility: it can be applied to any `transformers` base model with LoRA adapters. This means that, despite the mixture of experts strategy, no changes to the model code must be made.
The below graphic demonstrates how the scalings change for different prompts for each token. This highlights the activation of different adapters as the generation progresses and the sequence creates new context.
![Token-by-token scalings](https://github.com/EricLBuehler/xlora/raw/master/res/token_by_token_scalings.gif)
The abstract from the paper is:
*We report a mixture of expert strategy to create fine-tuned large language models using a deep layer-wise token-level approach based on low-rank adaptation (LoRA). Starting with a set of pre-trained LoRA adapters, our gating strategy uses the hidden states to dynamically mix adapted layers, allowing the resulting X-LoRA model to draw upon different capabilities and create never-before-used deep layer-wise combinations to solve tasks. The design is inspired by the biological principles of universality and diversity, where neural network building blocks are reused in different hierarchical manifestations. Hence, the X-LoRA model can be easily implemented for any existing large language model (LLM) without a need for modifications of the underlying structure. We develop a tailored X-LoRA model that offers scientific capabilities including forward/inverse analysis tasks and enhanced reasoning capability, focused on biomaterial analysis, protein mechanics and design. The impact of this work include access to readily expandable and adaptable models with strong domain knowledge and the capability to integrate across areas of knowledge. Featuring experts in biology, mathematics, reasoning, bio-inspired materials, mechanics and materials, chemistry, protein biophysics, mechanics and quantum-mechanics based molecular properties, we conduct a series of physics-focused case studies. We examine knowledge recall, protein mechanics forward/inverse tasks, protein design, adversarial agentic modeling including ontological knowledge graph construction, as well as molecular design. The model is capable not only of making quantitative predictions of nanomechanical properties of proteins or quantum mechanical molecular properties, but also reasons over the results and correctly predicts likely mechanisms that explain distinct molecular behaviors.*.
Please cite X-LoRA as:
```bibtex
@article{10.1063/5.0203126,
author = {Buehler, Eric L. and Buehler, Markus J.},
title = "{X-LoRA: Mixture of low-rank adapter experts, a flexible framework for large language models with applications in protein mechanics and molecular design}",
journal = {APL Machine Learning},
volume = {2},
number = {2},
pages = {026119},
year = {2024},
month = {05},
abstract = "{We report a mixture of expert strategy to create fine-tuned large language models using a deep layer-wise token-level approach based on low-rank adaptation (LoRA). Starting with a set of pre-trained LoRA adapters, our gating strategy uses the hidden states to dynamically mix adapted layers, allowing the resulting X-LoRA model to draw upon different capabilities and create never-before-used deep layer-wise combinations to solve tasks. The design is inspired by the biological principles of universality and diversity, where neural network building blocks are reused in different hierarchical manifestations. Hence, the X-LoRA model can be easily implemented for any existing large language model without a need for modifications of the underlying structure. We develop a tailored X-LoRA model that offers scientific capabilities, including forward/inverse analysis tasks and enhanced reasoning capability, focused on biomaterial analysis, protein mechanics, and design. The impact of this work includes access to readily expandable and adaptable models with strong domain knowledge and the capability to integrate across areas of knowledge. Featuring experts in biology, mathematics, reasoning, bio-inspired materials, mechanics and materials, chemistry, protein biophysics, mechanics, and quantum-mechanics based molecular properties, we conduct a series of physics-focused case studies. We examine knowledge recall, protein mechanics forward/inverse tasks, protein design, adversarial agentic modeling including ontological knowledge graph construction, and molecular design. The model is capable not only of making quantitative predictions of nanomechanical properties of proteins or quantum mechanical molecular properties but also reasoning over the results and correctly predicting likely mechanisms that explain distinct molecular behaviors.}",
issn = {2770-9019},
doi = {10.1063/5.0203126},
url = {https://doi.org/10.1063/5.0203126},
eprint = {https://pubs.aip.org/aip/aml/article-pdf/doi/10.1063/5.0203126/19964043/026119\_1\_5.0203126.pdf},
}
```
## XLoraConfig
[[autodoc]] tuners.xlora.config.XLoraConfig
## XLoraModel
[[autodoc]] tuners.xlora.model.XLoraModel

170
docs/source/quicktour.md Normal file
View File

@ -0,0 +1,170 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Quicktour
PEFT offers parameter-efficient methods for finetuning large pretrained models. The traditional paradigm is to finetune all of a model's parameters for each downstream task, but this is becoming exceedingly costly and impractical because of the enormous number of parameters in models today. Instead, it is more efficient to train a smaller number of prompt parameters or use a reparametrization method like low-rank adaptation (LoRA) to reduce the number of trainable parameters.
This quicktour will show you PEFT's main features and how you can train or run inference on large models that would typically be inaccessible on consumer devices.
## Train
Each PEFT method is defined by a [`PeftConfig`] class that stores all the important parameters for building a [`PeftModel`]. For example, to train with LoRA, load and create a [`LoraConfig`] class and specify the following parameters:
- `task_type`: the task to train for (sequence-to-sequence language modeling in this case)
- `inference_mode`: whether you're using the model for inference or not
- `r`: the dimension of the low-rank matrices
- `lora_alpha`: the scaling factor for the low-rank matrices
- `lora_dropout`: the dropout probability of the LoRA layers
```python
from peft import LoraConfig, TaskType
peft_config = LoraConfig(task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1)
```
<Tip>
See the [`LoraConfig`] reference for more details about other parameters you can adjust, such as the modules to target or the bias type.
</Tip>
Once the [`LoraConfig`] is setup, create a [`PeftModel`] with the [`get_peft_model`] function. It takes a base model - which you can load from the Transformers library - and the [`LoraConfig`] containing the parameters for how to configure a model for training with LoRA.
Load the base model you want to finetune.
```python
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/mt0-large")
```
Wrap the base model and `peft_config` with the [`get_peft_model`] function to create a [`PeftModel`]. To get a sense of the number of trainable parameters in your model, use the [`print_trainable_parameters`] method.
```python
from peft import get_peft_model
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"output: trainable params: 2359296 || all params: 1231940608 || trainable%: 0.19151053100118282"
```
Out of [bigscience/mt0-large's](https://huggingface.co/bigscience/mt0-large) 1.2B parameters, you're only training 0.19% of them!
That is it 🎉! Now you can train the model with the Transformers [`~transformers.Trainer`], Accelerate, or any custom PyTorch training loop.
For example, to train with the [`~transformers.Trainer`] class, setup a [`~transformers.TrainingArguments`] class with some training hyperparameters.
```py
training_args = TrainingArguments(
output_dir="your-name/bigscience/mt0-large-lora",
learning_rate=1e-3,
per_device_train_batch_size=32,
per_device_eval_batch_size=32,
num_train_epochs=2,
weight_decay=0.01,
eval_strategy="epoch",
save_strategy="epoch",
load_best_model_at_end=True,
)
```
Pass the model, training arguments, dataset, tokenizer, and any other necessary component to the [`~transformers.Trainer`], and call [`~transformers.Trainer.train`] to start training.
```py
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["test"],
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics,
)
trainer.train()
```
### Save model
After your model is finished training, you can save your model to a directory using the [`~transformers.PreTrainedModel.save_pretrained`] function.
```py
model.save_pretrained("output_dir")
```
You can also save your model to the Hub (make sure you're logged in to your Hugging Face account first) with the [`~transformers.PreTrainedModel.push_to_hub`] function.
```python
from huggingface_hub import notebook_login
notebook_login()
model.push_to_hub("your-name/bigscience/mt0-large-lora")
```
Both methods only save the extra PEFT weights that were trained, meaning it is super efficient to store, transfer, and load. For example, this [facebook/opt-350m](https://huggingface.co/ybelkada/opt-350m-lora) model trained with LoRA only contains two files: `adapter_config.json` and `adapter_model.safetensors`. The `adapter_model.safetensors` file is just 6.3MB!
<div class="flex flex-col justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/PEFT-hub-screenshot.png"/>
<figcaption class="text-center">The adapter weights for a opt-350m model stored on the Hub are only ~6MB compared to the full size of the model weights, which can be ~700MB.</figcaption>
</div>
## Inference
<Tip>
Take a look at the [AutoPeftModel](package_reference/auto_class) API reference for a complete list of available `AutoPeftModel` classes.
</Tip>
Easily load any PEFT-trained model for inference with the [`AutoPeftModel`] class and the [`~transformers.PreTrainedModel.from_pretrained`] method:
```py
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
import torch
model = AutoPeftModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = model.to("cuda")
model.eval()
inputs = tokenizer("Preheat the oven to 350 degrees and place the cookie dough", return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=50)
print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0])
"Preheat the oven to 350 degrees and place the cookie dough in the center of the oven. In a large bowl, combine the flour, baking powder, baking soda, salt, and cinnamon. In a separate bowl, combine the egg yolks, sugar, and vanilla."
```
For other tasks that aren't explicitly supported with an `AutoPeftModelFor` class - such as automatic speech recognition - you can still use the base [`AutoPeftModel`] class to load a model for the task.
```py
from peft import AutoPeftModel
model = AutoPeftModel.from_pretrained("smangrul/openai-whisper-large-v2-LORA-colab")
```
## Next steps
Now that you've seen how to train a model with one of the PEFT methods, we encourage you to try out some of the other methods like prompt tuning. The steps are very similar to the ones shown in the quicktour:
1. prepare a [`PeftConfig`] for a PEFT method
2. use the [`get_peft_model`] method to create a [`PeftModel`] from the configuration and base model
Then you can train it however you like! To load a PEFT model for inference, you can use the [`AutoPeftModel`] class.
Feel free to also take a look at the task guides if you're interested in training a model with another PEFT method for a specific task such as semantic segmentation, multilingual automatic speech recognition, DreamBooth, token classification, and more.

View File

@ -0,0 +1,239 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# IA3
[IA3](../conceptual_guides/ia3) multiplies the model's activations (the keys and values in the self-attention and encoder-decoder attention blocks, and the intermediate activation of the position-wise feedforward network) by three learned vectors. This PEFT method introduces an even smaller number of trainable parameters than LoRA which introduces weight matrices instead of vectors. The original model's parameters are kept frozen and only these vectors are updated. As a result, it is faster, cheaper and more efficient to finetune for a new downstream task.
This guide will show you how to train a sequence-to-sequence model with IA3 to *generate a sentiment* given some financial news.
<Tip>
Some familiarity with the general process of training a sequence-to-sequence would be really helpful and allow you to focus on how to apply IA3. If youre new, we recommend taking a look at the [Translation](https://huggingface.co/docs/transformers/tasks/translation) and [Summarization](https://huggingface.co/docs/transformers/tasks/summarization) guides first from the Transformers documentation. When youre ready, come back and see how easy it is to drop PEFT in to your training!
</Tip>
## Dataset
You'll use the sentences_allagree subset of the [financial_phrasebank](https://huggingface.co/datasets/financial_phrasebank) dataset. This subset contains financial news with 100% annotator agreement on the sentiment label. Take a look at the [dataset viewer](https://huggingface.co/datasets/financial_phrasebank/viewer/sentences_allagree) for a better idea of the data and sentences you'll be working with.
Load the dataset with the [`~datasets.load_dataset`] function. This subset of the dataset only contains a train split, so use the [`~datasets.train_test_split`] function to create a train and validation split. Create a new `text_label` column so it is easier to understand what the `label` values `0`, `1`, and `2` mean.
```py
from datasets import load_dataset
ds = load_dataset("financial_phrasebank", "sentences_allagree")
ds = ds["train"].train_test_split(test_size=0.1)
ds["validation"] = ds["test"]
del ds["test"]
classes = ds["train"].features["label"].names
ds = ds.map(
lambda x: {"text_label": [classes[label] for label in x["label"]]},
batched=True,
num_proc=1,
)
ds["train"][0]
{'sentence': 'It will be operated by Nokia , and supported by its Nokia NetAct network and service management system .',
'label': 1,
'text_label': 'neutral'}
```
Load a tokenizer and create a preprocessing function that:
1. tokenizes the inputs, pads and truncates the sequence to the `max_length`
2. apply the same tokenizer to the labels but with a shorter `max_length` that corresponds to the label
3. mask the padding tokens
```py
from transformers import AutoTokenizer
text_column = "sentence"
label_column = "text_label"
max_length = 128
tokenizer = AutoTokenizer.from_pretrained("bigscience/mt0-large")
def preprocess_function(examples):
inputs = examples[text_column]
targets = examples[label_column]
model_inputs = tokenizer(inputs, max_length=max_length, padding="max_length", truncation=True, return_tensors="pt")
labels = tokenizer(targets, max_length=3, padding="max_length", truncation=True, return_tensors="pt")
labels = labels["input_ids"]
labels[labels == tokenizer.pad_token_id] = -100
model_inputs["labels"] = labels
return model_inputs
```
Use the [`~datasets.Dataset.map`] function to apply the preprocessing function to the entire dataset.
```py
processed_ds = ds.map(
preprocess_function,
batched=True,
num_proc=1,
remove_columns=ds["train"].column_names,
load_from_cache_file=False,
desc="Running tokenizer on dataset",
)
```
Create a training and evaluation [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), and set `pin_memory=True` to speed up data transfer to the GPU during training if your dataset samples are on a CPU.
```py
from torch.utils.data import DataLoader
from transformers import default_data_collator
train_ds = processed_ds["train"]
eval_ds = processed_ds["validation"]
batch_size = 8
train_dataloader = DataLoader(
train_ds, shuffle=True, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True
)
eval_dataloader = DataLoader(eval_ds, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True)
```
## Model
Now you can load a pretrained model to use as the base model for IA3. This guide uses the [bigscience/mt0-large](https://huggingface.co/bigscience/mt0-large) model, but you can use any sequence-to-sequence model you like.
```py
from transformers import AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/mt0-large")
```
### PEFT configuration and model
All PEFT methods need a configuration that contains and specifies all the parameters for how the PEFT method should be applied. Create an [`IA3Config`] with the task type and set the inference mode to `False`. You can find additional parameters for this configuration in the [API reference](../package_reference/ia3#ia3config).
<Tip>
Call the [`~PeftModel.print_trainable_parameters`] method to compare the number of trainable parameters of [`PeftModel`] versus the number of parameters in the base model!
</Tip>
Once the configuration is setup, pass it to the [`get_peft_model`] function along with the base model to create a trainable [`PeftModel`].
```py
from peft import IA3Config, get_peft_model
peft_config = IA3Config(task_type="SEQ_2_SEQ_LM")
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"trainable params: 282,624 || all params: 1,229,863,936 || trainable%: 0.022980103060766553"
```
### Training
Set up an optimizer and learning rate scheduler.
```py
import torch
from transformers import get_linear_schedule_with_warmup
lr = 8e-3
num_epochs = 3
optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
lr_scheduler = get_linear_schedule_with_warmup(
optimizer=optimizer,
num_warmup_steps=0,
num_training_steps=(len(train_dataloader) * num_epochs),
)
```
Move the model to the GPU and create a training loop that reports the loss and perplexity for each epoch.
```py
from tqdm import tqdm
device = "cuda"
model = model.to(device)
for epoch in range(num_epochs):
model.train()
total_loss = 0
for step, batch in enumerate(tqdm(train_dataloader)):
batch = {k: v.to(device) for k, v in batch.items()}
outputs = model(**batch)
loss = outputs.loss
total_loss += loss.detach().float()
loss.backward()
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
model.eval()
eval_loss = 0
eval_preds = []
for step, batch in enumerate(tqdm(eval_dataloader)):
batch = {k: v.to(device) for k, v in batch.items()}
with torch.no_grad():
outputs = model(**batch)
loss = outputs.loss
eval_loss += loss.detach().float()
eval_preds.extend(
tokenizer.batch_decode(torch.argmax(outputs.logits, -1).detach().cpu().numpy(), skip_special_tokens=True)
)
eval_epoch_loss = eval_loss / len(eval_dataloader)
eval_ppl = torch.exp(eval_epoch_loss)
train_epoch_loss = total_loss / len(train_dataloader)
train_ppl = torch.exp(train_epoch_loss)
print(f"{epoch=}: {train_ppl=} {train_epoch_loss=} {eval_ppl=} {eval_epoch_loss=}")
```
## Share your model
After training is complete, you can upload your model to the Hub with the [`~transformers.PreTrainedModel.push_to_hub`] method. You'll need to login to your Hugging Face account first and enter your token when prompted.
```py
from huggingface_hub import notebook_login
account = <your-hf-account-name>
peft_model_id = f"{account}/mt0-large-ia3"
model.push_to_hub(peft_model_id)
```
## Inference
To load the model for inference, use the [`~AutoPeftModelForSeq2SeqLM.from_pretrained`] method. Let's also load a sentence of financial news from the dataset to generate a sentiment for.
```py
from peft import AutoPeftModelForSeq2SeqLM
model = AutoPeftModelForSeq2SeqLM.from_pretrained("<your-hf-account-name>/mt0-large-ia3").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("bigscience/mt0-large")
i = 15
inputs = tokenizer(ds["validation"][text_column][i], return_tensors="pt")
print(ds["validation"][text_column][i])
"The robust growth was the result of the inclusion of clothing chain Lindex in the Group in December 2007 ."
```
Call the [`~transformers.GenerationMixin.generate`] method to generate the predicted sentiment label.
```py
with torch.no_grad():
inputs = {k: v.to(device) for k, v in inputs.items()}
outputs = model.generate(input_ids=inputs["input_ids"], max_new_tokens=10)
print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))
['positive']
```

View File

@ -0,0 +1,350 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# LoRA methods
A popular way to efficiently train large models is to insert (typically in the attention blocks) smaller trainable matrices that are a low-rank decomposition of the delta weight matrix to be learnt during finetuning. The pretrained model's original weight matrix is frozen and only the smaller matrices are updated during training. This reduces the number of trainable parameters, reducing memory usage and training time which can be very expensive for large models.
There are several different ways to express the weight matrix as a low-rank decomposition, but [Low-Rank Adaptation (LoRA)](../conceptual_guides/adapter#low-rank-adaptation-lora) is the most common method. The PEFT library supports several other LoRA variants, such as [Low-Rank Hadamard Product (LoHa)](../conceptual_guides/adapter#low-rank-hadamard-product-loha), [Low-Rank Kronecker Product (LoKr)](../conceptual_guides/adapter#low-rank-kronecker-product-lokr), and [Adaptive Low-Rank Adaptation (AdaLoRA)](../conceptual_guides/adapter#adaptive-low-rank-adaptation-adalora). You can learn more about how these methods work conceptually in the [Adapters](../conceptual_guides/adapter) guide. If you're interested in applying these methods to other tasks and use cases like semantic segmentation, token classification, take a look at our [notebook collection](https://huggingface.co/collections/PEFT/notebooks-6573b28b33e5a4bf5b157fc1)!
Additionally, PEFT supports the [X-LoRA](../conceptual_guides/adapter#mixture-of-lora-experts-x-lora) Mixture of LoRA Experts method.
This guide will show you how to quickly train an image classification model - with a low-rank decomposition method - to identify the class of food shown in an image.
<Tip>
Some familiarity with the general process of training an image classification model would be really helpful and allow you to focus on the low-rank decomposition methods. If you're new, we recommend taking a look at the [Image classification](https://huggingface.co/docs/transformers/tasks/image_classification) guide first from the Transformers documentation. When you're ready, come back and see how easy it is to drop PEFT in to your training!
</Tip>
Before you begin, make sure you have all the necessary libraries installed.
```bash
pip install -q peft transformers datasets
```
## Dataset
In this guide, you'll use the [Food-101](https://huggingface.co/datasets/food101) dataset which contains images of 101 food classes (take a look at the [dataset viewer](https://huggingface.co/datasets/food101/viewer/default/train) to get a better idea of what the dataset looks like).
Load the dataset with the [`~datasets.load_dataset`] function.
```py
from datasets import load_dataset
ds = load_dataset("food101")
```
Each food class is labeled with an integer, so to make it easier to understand what these integers represent, you'll create a `label2id` and `id2label` dictionary to map the integer to its class label.
```py
labels = ds["train"].features["label"].names
label2id, id2label = dict(), dict()
for i, label in enumerate(labels):
label2id[label] = i
id2label[i] = label
id2label[2]
"baklava"
```
Load an image processor to properly resize and normalize the pixel values of the training and evaluation images.
```py
from transformers import AutoImageProcessor
image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224-in21k")
```
You can also use the image processor to prepare some transformation functions for data augmentation and pixel scaling.
```py
from torchvision.transforms import (
CenterCrop,
Compose,
Normalize,
RandomHorizontalFlip,
RandomResizedCrop,
Resize,
ToTensor,
)
normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
train_transforms = Compose(
[
RandomResizedCrop(image_processor.size["height"]),
RandomHorizontalFlip(),
ToTensor(),
normalize,
]
)
val_transforms = Compose(
[
Resize(image_processor.size["height"]),
CenterCrop(image_processor.size["height"]),
ToTensor(),
normalize,
]
)
def preprocess_train(example_batch):
example_batch["pixel_values"] = [train_transforms(image.convert("RGB")) for image in example_batch["image"]]
return example_batch
def preprocess_val(example_batch):
example_batch["pixel_values"] = [val_transforms(image.convert("RGB")) for image in example_batch["image"]]
return example_batch
```
Define the training and validation datasets, and use the [`~datasets.Dataset.set_transform`] function to apply the transformations on-the-fly.
```py
train_ds = ds["train"]
val_ds = ds["validation"]
train_ds.set_transform(preprocess_train)
val_ds.set_transform(preprocess_val)
```
Finally, you'll need a data collator to create a batch of training and evaluation data and convert the labels to `torch.tensor` objects.
```py
import torch
def collate_fn(examples):
pixel_values = torch.stack([example["pixel_values"] for example in examples])
labels = torch.tensor([example["label"] for example in examples])
return {"pixel_values": pixel_values, "labels": labels}
```
## Model
Now let's load a pretrained model to use as the base model. This guide uses the [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) model, but you can use any image classification model you want. Pass the `label2id` and `id2label` dictionaries to the model so it knows how to map the integer labels to their class labels, and you can optionally pass the `ignore_mismatched_sizes=True` parameter if you're finetuning a checkpoint that has already been finetuned.
```py
from transformers import AutoModelForImageClassification, TrainingArguments, Trainer
model = AutoModelForImageClassification.from_pretrained(
"google/vit-base-patch16-224-in21k",
label2id=label2id,
id2label=id2label,
ignore_mismatched_sizes=True,
)
```
### PEFT configuration and model
Every PEFT method requires a configuration that holds all the parameters specifying how the PEFT method should be applied. Once the configuration is setup, pass it to the [`~peft.get_peft_model`] function along with the base model to create a trainable [`PeftModel`].
<Tip>
Call the [`~PeftModel.print_trainable_parameters`] method to compare the number of parameters of [`PeftModel`] versus the number of parameters in the base model!
</Tip>
<hfoptions id="loras">
<hfoption id="LoRA">
[LoRA](../conceptual_guides/adapter#low-rank-adaptation-lora) decomposes the weight update matrix into *two* smaller matrices. The size of these low-rank matrices is determined by its *rank* or `r`. A higher rank means the model has more parameters to train, but it also means the model has more learning capacity. You'll also want to specify the `target_modules` which determine where the smaller matrices are inserted. For this guide, you'll target the *query* and *value* matrices of the attention blocks. Other important parameters to set are `lora_alpha` (scaling factor), `bias` (whether `none`, `all` or only the LoRA bias parameters should be trained), and `modules_to_save` (the modules apart from the LoRA layers to be trained and saved). All of these parameters - and more - are found in the [`LoraConfig`].
```py
from peft import LoraConfig, get_peft_model
config = LoraConfig(
r=16,
lora_alpha=16,
target_modules=["query", "value"],
lora_dropout=0.1,
bias="none",
modules_to_save=["classifier"],
)
model = get_peft_model(model, config)
model.print_trainable_parameters()
"trainable params: 667,493 || all params: 86,543,818 || trainable%: 0.7712775047664294"
```
</hfoption>
<hfoption id="LoHa">
[LoHa](../conceptual_guides/adapter#low-rank-hadamard-product-loha) decomposes the weight update matrix into *four* smaller matrices and each pair of smaller matrices is combined with the Hadamard product. This allows the weight update matrix to keep the same number of trainable parameters when compared to LoRA, but with a higher rank (`r^2` for LoHA when compared to `2*r` for LoRA). The size of the smaller matrices is determined by its *rank* or `r`. You'll also want to specify the `target_modules` which determines where the smaller matrices are inserted. For this guide, you'll target the *query* and *value* matrices of the attention blocks. Other important parameters to set are `alpha` (scaling factor), and `modules_to_save` (the modules apart from the LoHa layers to be trained and saved). All of these parameters - and more - are found in the [`LoHaConfig`].
```py
from peft import LoHaConfig, get_peft_model
config = LoHaConfig(
r=16,
alpha=16,
target_modules=["query", "value"],
module_dropout=0.1,
modules_to_save=["classifier"],
)
model = get_peft_model(model, config)
model.print_trainable_parameters()
"trainable params: 1,257,317 || all params: 87,133,642 || trainable%: 1.4429753779831676"
```
</hfoption>
<hfoption id="LoKr">
[LoKr](../conceptual_guides/adapter#low-rank-kronecker-product-lokr) expresses the weight update matrix as a decomposition of a Kronecker product, creating a block matrix that is able to preserve the rank of the original weight matrix. The size of the smaller matrices are determined by its *rank* or `r`. You'll also want to specify the `target_modules` which determines where the smaller matrices are inserted. For this guide, you'll target the *query* and *value* matrices of the attention blocks. Other important parameters to set are `alpha` (scaling factor), and `modules_to_save` (the modules apart from the LoKr layers to be trained and saved). All of these parameters - and more - are found in the [`LoKrConfig`].
```py
from peft import LoKrConfig, get_peft_model
config = LoKrConfig(
r=16,
alpha=16,
target_modules=["query", "value"],
module_dropout=0.1,
modules_to_save=["classifier"],
)
model = get_peft_model(model, config)
model.print_trainable_parameters()
"trainable params: 116,069 || all params: 87,172,042 || trainable%: 0.13314934162033282"
```
</hfoption>
<hfoption id="AdaLoRA">
[AdaLoRA](../conceptual_guides/adapter#adaptive-low-rank-adaptation-adalora) efficiently manages the LoRA parameter budget by assigning important weight matrices more parameters and pruning less important ones. In contrast, LoRA evenly distributes parameters across all modules. You can control the average desired *rank* or `r` of the matrices, and which modules to apply AdaLoRA to with `target_modules`. Other important parameters to set are `lora_alpha` (scaling factor), and `modules_to_save` (the modules apart from the AdaLoRA layers to be trained and saved). All of these parameters - and more - are found in the [`AdaLoraConfig`].
```py
from peft import AdaLoraConfig, get_peft_model
config = AdaLoraConfig(
r=8,
init_r=12,
tinit=200,
tfinal=1000,
deltaT=10,
target_modules=["query", "value"],
modules_to_save=["classifier"],
)
model = get_peft_model(model, config)
model.print_trainable_parameters()
"trainable params: 520,325 || all params: 87,614,722 || trainable%: 0.5938785036606062"
```
</hfoption>
</hfoptions>
### Training
For training, let's use the [`~transformers.Trainer`] class from Transformers. The [`Trainer`] contains a PyTorch training loop, and when you're ready, call [`~transformers.Trainer.train`] to start training. To customize the training run, configure the training hyperparameters in the [`~transformers.TrainingArguments`] class. With LoRA-like methods, you can afford to use a higher batch size and learning rate.
> [!WARNING]
> AdaLoRA has an [`~AdaLoraModel.update_and_allocate`] method that should be called at each training step to update the parameter budget and mask, otherwise the adaptation step is not performed. This requires writing a custom training loop or subclassing the [`~transformers.Trainer`] to incorporate this method. As an example, take a look at this [custom training loop](https://github.com/huggingface/peft/blob/912ad41e96e03652cabf47522cd876076f7a0c4f/examples/conditional_generation/peft_adalora_seq2seq.py#L120).
```py
from transformers import TrainingArguments, Trainer
account = "stevhliu"
peft_model_id = f"{account}/google/vit-base-patch16-224-in21k-lora"
batch_size = 128
args = TrainingArguments(
peft_model_id,
remove_unused_columns=False,
eval_strategy="epoch",
save_strategy="epoch",
learning_rate=5e-3,
per_device_train_batch_size=batch_size,
gradient_accumulation_steps=4,
per_device_eval_batch_size=batch_size,
fp16=True,
num_train_epochs=5,
logging_steps=10,
load_best_model_at_end=True,
label_names=["labels"],
)
```
Begin training with [`~transformers.Trainer.train`].
```py
trainer = Trainer(
model,
args,
train_dataset=train_ds,
eval_dataset=val_ds,
tokenizer=image_processor,
data_collator=collate_fn,
)
trainer.train()
```
## Share your model
Once training is complete, you can upload your model to the Hub with the [`~transformers.PreTrainedModel.push_to_hub`] method. Youll need to login to your Hugging Face account first and enter your token when prompted.
```py
from huggingface_hub import notebook_login
notebook_login()
```
Call [`~transformers.PreTrainedModel.push_to_hub`] to save your model to your repositoy.
```py
model.push_to_hub(peft_model_id)
```
## Inference
Let's load the model from the Hub and test it out on a food image.
```py
from peft import PeftConfig, PeftModel
from transformers import AutoImageProcessor
from PIL import Image
import requests
config = PeftConfig.from_pretrained("stevhliu/vit-base-patch16-224-in21k-lora")
model = AutoModelForImageClassification.from_pretrained(
config.base_model_name_or_path,
label2id=label2id,
id2label=id2label,
ignore_mismatched_sizes=True,
)
model = PeftModel.from_pretrained(model, "stevhliu/vit-base-patch16-224-in21k-lora")
url = "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/beignets.jpeg"
image = Image.open(requests.get(url, stream=True).raw)
image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/beignets.jpeg">
</div>
Convert the image to RGB and return the underlying PyTorch tensors.
```py
encoding = image_processor(image.convert("RGB"), return_tensors="pt")
```
Now run the model and return the predicted class!
```py
with torch.no_grad():
outputs = model(**encoding)
logits = outputs.logits
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])
"Predicted class: beignets"
```

View File

@ -0,0 +1,305 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Prompt-based methods
A prompt can describe a task or provide an example of a task you want the model to learn. Instead of manually creating these prompts, soft prompting methods add learnable parameters to the input embeddings that can be optimized for a specific task while keeping the pretrained model's parameters frozen. This makes it both faster and easier to finetune large language models (LLMs) for new downstream tasks.
The PEFT library supports several types of prompting methods (p-tuning, prefix tuning, prompt tuning) and you can learn more about how these methods work conceptually in the [Soft prompts](../conceptual_guides/prompting) guide. If you're interested in applying these methods to other tasks and use cases, take a look at our [notebook collection](https://huggingface.co/spaces/PEFT/soft-prompting)!
This guide will show you how to train a causal language model - with a soft prompting method - to *generate a classification* for whether a tweet is a complaint or not.
<Tip>
Some familiarity with the general process of training a causal language model would be really helpful and allow you to focus on the soft prompting methods. If you're new, we recommend taking a look at the [Causal language modeling](https://huggingface.co/docs/transformers/tasks/language_modeling) guide first from the Transformers documentation. When you're ready, come back and see how easy it is to drop PEFT in to your training!
</Tip>
Before you begin, make sure you have all the necessary libraries installed.
```bash
pip install -q peft transformers datasets
```
## Dataset
For this guide, you'll use the `twitter_complaints` subset of the [RAFT](https://huggingface.co/datasets/ought/raft) dataset. The `twitter_complaints` subset contains tweets labeled as `complaint` and `no complaint` and you can check out the [dataset viewer](https://huggingface.co/datasets/ought/raft/viewer/twitter_complaints) for a better idea of what the data looks like.
Use the [`~datasets.load_dataset`] function to load the dataset and create a new `text_label` column so it is easier to understand what the `Label` values, `1` and `2` mean.
```py
from datasets import load_dataset
ds = load_dataset("ought/raft", "twitter_complaints")
classes = [k.replace("_", " ") for k in ds["train"].features["Label"].names]
ds = ds.map(
lambda x: {"text_label": [classes[label] for label in x["Label"]]},
batched=True,
num_proc=1,
)
ds["train"][0]
{"Tweet text": "@HMRCcustomers No this is my first job", "ID": 0, "Label": 2, "text_label": "no complaint"}
```
Load a tokenizer, define the padding token to use, and determine the maximum length of the tokenized label.
```py
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloomz-560m")
if tokenizer.pad_token_id is None:
tokenizer.pad_token_id = tokenizer.eos_token_id
target_max_length = max([len(tokenizer(class_label)["input_ids"]) for class_label in classes])
print(target_max_length)
```
Create a preprocessing function that tokenizes the tweet text and labels, pad the inputs and labels in each batch, create an attention mask, and truncate sequences to the `max_length`. Then convert the `input_ids`, `attention_mask`, and `labels` to PyTorch tensors.
```py
import torch
max_length = 64
def preprocess_function(examples, text_column="Tweet text", label_column="text_label"):
batch_size = len(examples[text_column])
inputs = [f"{text_column} : {x} Label : " for x in examples[text_column]]
targets = [str(x) for x in examples[label_column]]
model_inputs = tokenizer(inputs)
labels = tokenizer(targets)
classes = [k.replace("_", " ") for k in ds["train"].features["Label"].names]
for i in range(batch_size):
sample_input_ids = model_inputs["input_ids"][i]
label_input_ids = labels["input_ids"][i]
model_inputs["input_ids"][i] = [tokenizer.pad_token_id] * (
max_length - len(sample_input_ids)
) + sample_input_ids
model_inputs["attention_mask"][i] = [0] * (max_length - len(sample_input_ids)) + model_inputs[
"attention_mask"
][i]
labels["input_ids"][i] = [-100] * (max_length - len(label_input_ids)) + label_input_ids
model_inputs["input_ids"][i] = torch.tensor(model_inputs["input_ids"][i][:max_length])
model_inputs["attention_mask"][i] = torch.tensor(model_inputs["attention_mask"][i][:max_length])
labels["input_ids"][i] = torch.tensor(labels["input_ids"][i][:max_length])
model_inputs["labels"] = labels["input_ids"]
return model_inputs
```
Apply the preprocessing function to the entire dataset with the [`~datasets.Dataset.map`] function, and remove the unprocessed columns because the model won't need them.
```py
processed_ds = ds.map(
preprocess_function,
batched=True,
num_proc=1,
remove_columns=ds["train"].column_names,
load_from_cache_file=False,
desc="Running tokenizer on dataset",
)
```
Finally, create a training and evaluation [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader). You can set `pin_memory=True` to speed up the data transfer to the GPU during training if the samples in your dataset are on a CPU.
```py
from torch.utils.data import DataLoader
from transformers import default_data_collator
train_ds = processed_ds["train"]
eval_ds = processed_ds["test"]
batch_size = 16
train_dataloader = DataLoader(train_ds, shuffle=True, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True)
eval_dataloader = DataLoader(eval_ds, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True)
```
## Model
Now let's load a pretrained model to use as the base model for the soft prompt method. This guide uses the [bigscience/bloomz-560m](https://huggingface.co/bigscience/bloomz-560m) model, but you can use any causal language model you want.
```py
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-560m")
```
### PEFT configuration and model
For any PEFT method, you'll need to create a configuration which contains all the parameters that specify how the PEFT method should be applied. Once the configuration is setup, pass it to the [`~peft.get_peft_model`] function along with the base model to create a trainable [`PeftModel`].
<Tip>
Call the [`~PeftModel.print_trainable_parameters`] method to compare the number of trainable parameters of [`PeftModel`] versus the number of parameters in the base model!
</Tip>
<hfoptions id="configurations">
<hfoption id="p-tuning">
[P-tuning](../conceptual_guides/prompting#p-tuning) adds a trainable embedding tensor where the prompt tokens can be added anywhere in the input sequence. Create a [`PromptEncoderConfig`] with the task type, the number of virtual tokens to add and learn, and the hidden size of the encoder for learning the prompt parameters.
```py
from peft import PromptEncoderConfig, get_peft_model
peft_config = PromptEncoderConfig(task_type="CAUSAL_LM", num_virtual_tokens=20, encoder_hidden_size=128)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"trainable params: 300,288 || all params: 559,514,880 || trainable%: 0.05366935013417338"
```
</hfoption>
<hfoption id="prefix tuning">
[Prefix tuning](../conceptual_guides/prompting#prefix-tuning) adds task-specific parameters in all of the model layers, which are optimized by a separate feed-forward network. Create a [`PrefixTuningConfig`] with the task type and number of virtual tokens to add and learn.
```py
from peft import PrefixTuningConfig, get_peft_model
peft_config = PrefixTuningConfig(task_type="CAUSAL_LM", num_virtual_tokens=20)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"trainable params: 983,040 || all params: 560,197,632 || trainable%: 0.1754809274167014"
```
</hfoption>
<hfoption id="prompt tuning">
[Prompt tuning](../conceptual_guides/prompting#prompt-tuning) formulates all tasks as a *generation* task and it adds a task-specific prompt to the input which is updated independently. The `prompt_tuning_init_text` parameter specifies how to finetune the model (in this case, it is classifying whether tweets are complaints or not). For the best results, the `prompt_tuning_init_text` should have the same number of tokens that should be predicted. To do this, you can set `num_virtual_tokens` to the number of tokens of the `prompt_tuning_init_text`.
Create a [`PromptTuningConfig`] with the task type, the initial prompt tuning text to train the model with, the number of virtual tokens to add and learn, and a tokenizer.
```py
from peft import PromptTuningConfig, PromptTuningInit, get_peft_model
prompt_tuning_init_text = "Classify if the tweet is a complaint or no complaint.\n"
peft_config = PromptTuningConfig(
task_type="CAUSAL_LM",
prompt_tuning_init=PromptTuningInit.TEXT,
num_virtual_tokens=len(tokenizer(prompt_tuning_init_text)["input_ids"]),
prompt_tuning_init_text=prompt_tuning_init_text,
tokenizer_name_or_path="bigscience/bloomz-560m",
)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"trainable params: 8,192 || all params: 559,222,784 || trainable%: 0.0014648902430985358"
```
</hfoption>
</hfoptions>
### Training
Set up an optimizer and learning rate scheduler.
```py
from transformers import get_linear_schedule_with_warmup
lr = 3e-2
num_epochs = 50
optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
lr_scheduler = get_linear_schedule_with_warmup(
optimizer=optimizer,
num_warmup_steps=0,
num_training_steps=(len(train_dataloader) * num_epochs),
)
```
Move the model to the GPU and create a training loop that reports the loss and perplexity for each epoch.
```py
from tqdm import tqdm
device = "cuda"
model = model.to(device)
for epoch in range(num_epochs):
model.train()
total_loss = 0
for step, batch in enumerate(tqdm(train_dataloader)):
batch = {k: v.to(device) for k, v in batch.items()}
outputs = model(**batch)
loss = outputs.loss
total_loss += loss.detach().float()
loss.backward()
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
model.eval()
eval_loss = 0
eval_preds = []
for step, batch in enumerate(tqdm(eval_dataloader)):
batch = {k: v.to(device) for k, v in batch.items()}
with torch.no_grad():
outputs = model(**batch)
loss = outputs.loss
eval_loss += loss.detach().float()
eval_preds.extend(
tokenizer.batch_decode(torch.argmax(outputs.logits, -1).detach().cpu().numpy(), skip_special_tokens=True)
)
eval_epoch_loss = eval_loss / len(eval_dataloader)
eval_ppl = torch.exp(eval_epoch_loss)
train_epoch_loss = total_loss / len(train_dataloader)
train_ppl = torch.exp(train_epoch_loss)
print(f"{epoch=}: {train_ppl=} {train_epoch_loss=} {eval_ppl=} {eval_epoch_loss=}")
```
## Share your model
Once training is complete, you can upload your model to the Hub with the [`~transformers.PreTrainedModel.push_to_hub`] method. You'll need to login to your Hugging Face account first and enter your token when prompted.
```py
from huggingface_hub import notebook_login
account = <your-hf-account-name>
peft_model_id = f"{account}/bloomz-560-m-peft-method"
model.push_to_hub(peft_model_id)
```
If you check the model file size in the repository, youll see that it is a lot smaller than a full sized model!
<div class="flex flex-col justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/peft/PEFT-hub-screenshot.png"/>
<figcaption class="text-center">For example, the adapter weights for a opt-350m model stored on the Hub are only ~6MB compared to the full model size which can be ~700MB.</figcaption>
</div>
## Inference
Let's load the model for inference and test it out on a tweet!
```py
from peft import AutoPeftModelForCausalLM
model = AutoPeftModelForCausalLM.from_pretrained("peft_model_id").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloomz-560m")
i = 15
inputs = tokenizer(f'{text_column} : {ds["test"][i]["Tweet text"]} Label : ', return_tensors="pt")
print(ds["test"][i]["Tweet text"])
"@NYTsupport i have complained a dozen times &amp; yet my papers are still thrown FAR from my door. Why is this so hard to resolve?"
```
Call the [`~transformers.GenerationMixin.generate`] method to generate the predicted classification label.
```py
with torch.no_grad():
inputs = {k: v.to(device) for k, v in inputs.items()}
outputs = model.generate(input_ids=inputs["input_ids"], max_new_tokens=10)
print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))
"['Tweet text : @NYTsupport i have complained a dozen times &amp; yet my papers are still thrown FAR from my door. Why is this so hard to resolve? Label : complaint']"
```

View File

@ -0,0 +1,152 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# PEFT integrations
PEFT's practical benefits extends to other Hugging Face libraries like [Diffusers](https://hf.co/docs/diffusers) and [Transformers](https://hf.co/docs/transformers). One of the main benefits of PEFT is that an adapter file generated by a PEFT method is a lot smaller than the original model, which makes it super easy to manage and use multiple adapters. You can use one pretrained base model for multiple tasks by simply loading a new adapter finetuned for the task you're solving. Or you can combine multiple adapters with a text-to-image diffusion model to create new effects.
This tutorial will show you how PEFT can help you manage adapters in Diffusers and Transformers.
## Diffusers
Diffusers is a generative AI library for creating images and videos from text or images with diffusion models. LoRA is an especially popular training method for diffusion models because you can very quickly train and share diffusion models to generate images in new styles. To make it easier to use and try multiple LoRA models, Diffusers uses the PEFT library to help manage different adapters for inference.
For example, load a base model and then load the [artificialguybr/3DRedmond-V1](https://huggingface.co/artificialguybr/3DRedmond-V1) adapter for inference with the [`load_lora_weights`](https://huggingface.co/docs/diffusers/v0.24.0/en/api/loaders/lora#diffusers.loaders.LoraLoaderMixin.load_lora_weights) method. The `adapter_name` argument in the loading method is enabled by PEFT and allows you to set a name for the adapter so it is easier to reference.
```py
import torch
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
).to("cuda")
pipeline.load_lora_weights(
"peft-internal-testing/artificialguybr__3DRedmond-V1",
weight_name="3DRedmond-3DRenderStyle-3DRenderAF.safetensors",
adapter_name="3d"
)
image = pipeline("sushi rolls shaped like kawaii cat faces").images[0]
image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/ybelkada/documentation-images/resolve/main/test-lora-diffusers.png"/>
</div>
Now let's try another cool LoRA model, [ostris/super-cereal-sdxl-lora](https://huggingface.co/ostris/super-cereal-sdxl-lora). All you need to do is load and name this new adapter with `adapter_name`, and use the [`set_adapters`](https://huggingface.co/docs/diffusers/api/loaders/unet#diffusers.loaders.UNet2DConditionLoadersMixin.set_adapters) method to set it as the currently active adapter.
```py
pipeline.load_lora_weights(
"ostris/super-cereal-sdxl-lora",
weight_name="cereal_box_sdxl_v1.safetensors",
adapter_name="cereal"
)
pipeline.set_adapters("cereal")
image = pipeline("sushi rolls shaped like kawaii cat faces").images[0]
image
```
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/ybelkada/documentation-images/resolve/main/test-lora-diffusers-2.png"/>
</div>
Finally, you can call the [`disable_lora`](https://huggingface.co/docs/diffusers/api/loaders/unet#diffusers.loaders.UNet2DConditionLoadersMixin.disable_lora) method to restore the base model.
```py
pipeline.disable_lora()
```
Learn more about how PEFT supports Diffusers in the [Inference with PEFT](https://huggingface.co/docs/diffusers/tutorials/using_peft_for_inference) tutorial.
## Transformers
🤗 [Transformers](https://hf.co/docs/transformers) is a collection of pretrained models for all types of tasks in all modalities. You can load these models for training or inference. Many of the models are large language models (LLMs), so it makes sense to integrate PEFT with Transformers to manage and train adapters.
Load a base pretrained model to train.
```py
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
```
Next, add an adapter configuration to specify how to adapt the model parameters. Call the [`~PeftModel.add_adapter`] method to add the configuration to the base model.
```py
from peft import LoraConfig
peft_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
bias="none",
task_type="CAUSAL_LM"
)
model.add_adapter(peft_config)
```
Now you can train the model with Transformer's [`~transformers.Trainer`] class or whichever training framework you prefer.
To use the newly trained model for inference, the [`~transformers.AutoModel`] class uses PEFT on the backend to load the adapter weights and configuration file into a base pretrained model.
```py
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("peft-internal-testing/opt-350m-lora")
```
Alternatively, you can use transformers [Pipelines](https://huggingface.co/docs/transformers/en/main_classes/pipelines) to load the model for conveniently running inference:
```py
from transformers import pipeline
model = pipeline("text-generation", "peft-internal-testing/opt-350m-lora")
print(model("Hello World"))
```
If you're interested in comparing or using more than one adapter, you can call the [`~PeftModel.add_adapter`] method to add the adapter configuration to the base model. The only requirement is the adapter type must be the same (you can't mix a LoRA and LoHa adapter).
```py
from transformers import AutoModelForCausalLM
from peft import LoraConfig
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
model.add_adapter(lora_config_1, adapter_name="adapter_1")
```
Call [`~PeftModel.add_adapter`] again to attach a new adapter to the base model.
```py
model.add_adapter(lora_config_2, adapter_name="adapter_2")
```
Then you can use [`~PeftModel.set_adapter`] to set the currently active adapter.
```py
model.set_adapter("adapter_1")
output = model.generate(**inputs)
print(tokenizer.decode(output_disabled[0], skip_special_tokens=True))
```
To disable the adapter, call the [disable_adapters](https://github.com/huggingface/transformers/blob/4e3490f79b40248c53ee54365a9662611e880892/src/transformers/integrations/peft.py#L313) method.
```py
model.disable_adapters()
```
The [enable_adapters](https://github.com/huggingface/transformers/blob/4e3490f79b40248c53ee54365a9662611e880892/src/transformers/integrations/peft.py#L336) can be used to enable the adapters again.
If you're curious, check out the [Load and train adapters with PEFT](https://huggingface.co/docs/transformers/main/peft) tutorial to learn more.

View File

@ -0,0 +1,182 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# PEFT configurations and models
The sheer size of today's large pretrained models - which commonly have billions of parameters - present a significant training challenge because they require more storage space and more computational power to crunch all those calculations. You'll need access to powerful GPUs or TPUs to train these large pretrained models which is expensive, not widely accessible to everyone, not environmentally friendly, and not very practical. PEFT methods address many of these challenges. There are several types of PEFT methods (soft prompting, matrix decomposition, adapters), but they all focus on the same thing, reduce the number of trainable parameters. This makes it more accessible to train and store large models on consumer hardware.
The PEFT library is designed to help you quickly train large models on free or low-cost GPUs, and in this tutorial, you'll learn how to setup a configuration to apply a PEFT method to a pretrained base model for training. Once the PEFT configuration is setup, you can use any training framework you like (Transformer's [`~transformers.Trainer`] class, [Accelerate](https://hf.co/docs/accelerate), a custom PyTorch training loop).
## PEFT configurations
<Tip>
Learn more about the parameters you can configure for each PEFT method in their respective API reference page.
</Tip>
A configuration stores important parameters that specify how a particular PEFT method should be applied.
For example, take a look at the following [`LoraConfig`](https://huggingface.co/ybelkada/opt-350m-lora/blob/main/adapter_config.json) for applying LoRA and [`PromptEncoderConfig`](https://huggingface.co/smangrul/roberta-large-peft-p-tuning/blob/main/adapter_config.json) for applying p-tuning (these configuration files are already JSON-serialized). Whenever you load a PEFT adapter, it is a good idea to check whether it has an associated adapter_config.json file which is required.
<hfoptions id="config">
<hfoption id="LoraConfig">
```json
{
"base_model_name_or_path": "facebook/opt-350m", #base model to apply LoRA to
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layers_pattern": null,
"layers_to_transform": null,
"lora_alpha": 32,
"lora_dropout": 0.05,
"modules_to_save": null,
"peft_type": "LORA", #PEFT method type
"r": 16,
"revision": null,
"target_modules": [
"q_proj", #model modules to apply LoRA to (query and value projection layers)
"v_proj"
],
"task_type": "CAUSAL_LM" #type of task to train model on
}
```
You can create your own configuration for training by initializing a [`LoraConfig`].
```py
from peft import LoraConfig, TaskType
lora_config = LoraConfig(
r=16,
target_modules=["q_proj", "v_proj"],
task_type=TaskType.CAUSAL_LM,
lora_alpha=32,
lora_dropout=0.05
)
```
</hfoption>
<hfoption id="PromptEncoderConfig">
```json
{
"base_model_name_or_path": "roberta-large", #base model to apply p-tuning to
"encoder_dropout": 0.0,
"encoder_hidden_size": 128,
"encoder_num_layers": 2,
"encoder_reparameterization_type": "MLP",
"inference_mode": true,
"num_attention_heads": 16,
"num_layers": 24,
"num_transformer_submodules": 1,
"num_virtual_tokens": 20,
"peft_type": "P_TUNING", #PEFT method type
"task_type": "SEQ_CLS", #type of task to train model on
"token_dim": 1024
}
```
You can create your own configuration for training by initializing a [`PromptEncoderConfig`].
```py
from peft import PromptEncoderConfig, TaskType
p_tuning_config = PromptEncoderConfig(
encoder_reparameterization_type="MLP",
encoder_hidden_size=128,
num_attention_heads=16,
num_layers=24,
num_transformer_submodules=1,
num_virtual_tokens=20,
token_dim=1024,
task_type=TaskType.SEQ_CLS
)
```
</hfoption>
</hfoptions>
## PEFT models
With a PEFT configuration in hand, you can now apply it to any pretrained model to create a [`PeftModel`]. Choose from any of the state-of-the-art models from the [Transformers](https://hf.co/docs/transformers) library, a custom model, and even new and unsupported transformer architectures.
For this tutorial, load a base [facebook/opt-350m](https://huggingface.co/facebook/opt-350m) model to finetune.
```py
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m")
```
Use the [`get_peft_model`] function to create a [`PeftModel`] from the base facebook/opt-350m model and the `lora_config` you created earlier.
```py
from peft import get_peft_model
lora_model = get_peft_model(model, lora_config)
lora_model.print_trainable_parameters()
"trainable params: 1,572,864 || all params: 332,769,280 || trainable%: 0.472659014678278"
```
Now you can train the [`PeftModel`] with your preferred training framework! After training, you can save your model locally with [`~PeftModel.save_pretrained`] or upload it to the Hub with the [`~transformers.PreTrainedModel.push_to_hub`] method.
```py
# save locally
lora_model.save_pretrained("your-name/opt-350m-lora")
# push to Hub
lora_model.push_to_hub("your-name/opt-350m-lora")
```
To load a [`PeftModel`] for inference, you'll need to provide the [`PeftConfig`] used to create it and the base model it was trained from.
```py
from peft import PeftModel, PeftConfig
config = PeftConfig.from_pretrained("ybelkada/opt-350m-lora")
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
lora_model = PeftModel.from_pretrained(model, "ybelkada/opt-350m-lora")
```
<Tip>
By default, the [`PeftModel`] is set for inference, but if you'd like to train the adapter some more you can set `is_trainable=True`.
```py
lora_model = PeftModel.from_pretrained(model, "ybelkada/opt-350m-lora", is_trainable=True)
```
</Tip>
The [`PeftModel.from_pretrained`] method is the most flexible way to load a [`PeftModel`] because it doesn't matter what model framework was used (Transformers, timm, a generic PyTorch model). Other classes, like [`AutoPeftModel`], are just a convenient wrapper around the base [`PeftModel`], and makes it easier to load PEFT models directly from the Hub or locally where the PEFT weights are stored.
```py
from peft import AutoPeftModelForCausalLM
lora_model = AutoPeftModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora")
```
Take a look at the [AutoPeftModel](package_reference/auto_class) API reference to learn more about the [`AutoPeftModel`] classes.
## Next steps
With the appropriate [`PeftConfig`], you can apply it to any pretrained model to create a [`PeftModel`] and train large powerful models faster on freely available GPUs! To learn more about PEFT configurations and models, the following guide may be helpful:
* Learn how to configure a PEFT method for models that aren't from Transformers in the [Working with custom models](../developer_guides/custom_models) guide.

View File

View File

@ -0,0 +1,177 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Fine-tuning for controllable generation with BOFT (ControlNet)
This guide demonstrates how to use BOFT, an orthogonal fine-tuning method, to fine-tune Stable Diffusion with either `stabilityai/stable-diffusion-2-1` or `runwayml/stable-diffusion-v1-5` model for controllable generation.
By using BOFT from 🤗 PEFT, we can significantly reduce the number of trainable parameters while still achieving impressive results in various fine-tuning tasks across different foundation models. BOFT enhances model efficiency by integrating full-rank orthogonal matrices with a butterfly structure into specific model blocks, such as attention blocks, mirroring the approach used in LoRA. During fine-tuning, only these inserted matrices are trained, leaving the original model parameters untouched. During inference, the trainable BOFT paramteres can be merged into the original model, eliminating any additional computational costs.
As a member of the **orthogonal finetuning** class, BOFT presents a systematic and principled method for fine-tuning. It possesses several unique properties and has demonstrated superior performance compared to LoRA in a variety of scenarios. For further details on BOFT, please consult the [PEFT's GitHub repo's concept guide OFT](https://https://huggingface.co/docs/peft/index), the [original BOFT paper](https://arxiv.org/abs/2311.06243) and the [original OFT paper](https://arxiv.org/abs/2306.07280).
In this guide we provide a controllable generation (ControlNet) fine-tuning script that is available in [PEFT's GitHub repo examples](https://github.com/huggingface/peft/tree/main/examples/boft_controlnet). This implementation is adapted from [diffusers's ControlNet](https://github.com/huggingface/diffusers/tree/main/examples/controlnet) and [Hecong Wu's ControlLoRA](https://github.com/HighCWu/ControlLoRA). You can try it out and finetune on your custom images.
## Set up your environment
Start by cloning the PEFT repository:
```bash
git clone https://github.com/huggingface/peft
```
Navigate to the directory containing the training scripts for fine-tuning Dreambooth with BOFT:
```bash
cd peft/examples/boft_controlnet
```
Set up your environment: install PEFT, and all the required libraries. At the time of writing this guide we recommend installing PEFT from source.
```bash
conda create --name peft python=3.10
conda activate peft
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install xformers -c xformers
pip install -r requirements.txt
pip install git+https://github.com/huggingface/peft
```
## Data
We use the [control-celeba-hq](https://huggingface.co/datasets/oftverse/control-celeba-hq) dataset for landmark-to-face controllable generation. We also provide evaluation scripts to evaluate the controllable generation performance. This task can be used to quantitatively compare different fine-tuning techniques.
```bash
export DATASET_NAME="oftverse/control-celeba-hq"
```
## Train controllable generation (ControlNet) with BOFT
Start with setting some hyperparamters for BOFT:
```bash
PEFT_TYPE="boft"
BLOCK_NUM=8
BLOCK_SIZE=0
N_BUTTERFLY_FACTOR=0
```
Here:
Navigate to the directory containing the training scripts for fine-tuning Stable Diffusion with BOFT for controllable generation:
```bash
./train_controlnet.sh
```
or
```bash
export MODEL_NAME="stabilityai/stable-diffusion-2-1"
# export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATASET_NAME="oftverse/control-celeba-hq"
export PROJECT_NAME="controlnet_${PEFT_TYPE}"
export RUN_NAME="${PEFT_TYPE}_${BLOCK_NUM}${BLOCK_SIZE}${N_BUTTERFLY_FACTOR}"
export CONTROLNET_PATH=""
export OUTPUT_DIR="./output/${DATASET_NAME}/${RUN_NAME}"
accelerate launch train_controlnet.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--resume_from_checkpoint=$RESUME_PATH \
--controlnet_model_name_or_path=$CONTROLNET_PATH \
--output_dir=$OUTPUT_DIR \
--report_to="wandb" \
--dataset_name=$DATASET_NAME \
--resolution=512 \
--learning_rate=1e-5 \
--checkpointing_steps=5000 \
--max_train_steps=50000 \
--validation_steps=2000 \
--num_validation_images=12 \
--train_batch_size=4 \
--dataloader_num_workers=2 \
--seed="0" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--wandb_project_name=$PROJECT_NAME \
--wandb_run_name=$RUN_NAME \
--enable_xformers_memory_efficient_attention \
--use_boft \
--boft_block_num=$BLOCK_NUM \
--boft_block_size=$BLOCK_SIZE \
--boft_n_butterfly_factor=$N_BUTTERFLY_FACTOR \
--boft_dropout=0.1 \
--boft_bias="boft_only" \
--report_to="wandb" \
```
Run inference on the saved model to sample new images from the validation set:
```bash
./test_controlnet.sh
```
or
```bash
ITER_NUM=50000
export MODEL_NAME="stabilityai/stable-diffusion-2-1"
# export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export RUN_NAME="${PEFT_TYPE}_${BLOCK_NUM}${BLOCK_SIZE}${N_BUTTERFLY_FACTOR}"
export DATASET_NAME="oftverse/control-celeba-hq"
export CKPT_NAME="checkpoint-${ITER_NUM}"
export OUTPUT_DIR="./output/${DATASET_NAME}/${RUN_NAME}/${CKPT_NAME}"
export CONTROLNET_PATH="${OUTPUT_DIR}/controlnet/model.safetensors"
export UNET_PATH="${OUTPUT_DIR}/unet/${RUN_NAME}"
export RESULTS_PATH="${OUTPUT_DIR}/results"
accelerate launch test_controlnet.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_NAME \
--controlnet_path=$CONTROLNET_PATH \
--unet_path=$UNET_PATH \
--adapter_name=$RUN_NAME \
--output_dir=$RESULTS_PATH \
--dataset_name=$DATASET_NAME \
```
Run evaluation on the sampled images to evaluate the landmark reprojection error:
```bash
./eval.sh
```
or
```bash
ITER_NUM=50000
export MODEL_NAME="stabilityai/stable-diffusion-2-1"
# export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export RUN_NAME="${PEFT_TYPE}_${BLOCK_NUM}${BLOCK_SIZE}${N_BUTTERFLY_FACTOR}"
export DATASET_NAME="oftverse/control-celeba-hq"
export CKPT_NAME="checkpoint-${ITER_NUM}"
export OUTPUT_DIR="./output/${DATASET_NAME}/${RUN_NAME}/${CKPT_NAME}"
export CONTROLNET_PATH="${OUTPUT_DIR}/controlnet/model.safetensors"
export UNET_PATH="${OUTPUT_DIR}/unet/${RUN_NAME}"
accelerate launch eval.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_NAME \
--controlnet_path=$CONTROLNET_PATH \
--unet_path=$UNET_PATH \
--adapter_name=$RUN_NAME \
--output_dir=$OUTPUT_DIR \
--dataset_name=$DATASET_NAME \
--vis_overlays \
```

View File

@ -0,0 +1,200 @@
# Copyright 2023-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# The implementation is based on "Parameter-Efficient Orthogonal Finetuning
# via Butterfly Factorization" (https://arxiv.org/abs/2311.06243) in ICLR 2024.
import glob
import os
from pathlib import Path
import cv2
import face_alignment
import numpy as np
import torch
from accelerate import Accelerator
from skimage.io import imread
from torchvision.utils import save_image
from tqdm import tqdm
from transformers import AutoTokenizer
from utils.args_loader import parse_args
from utils.dataset import make_dataset
detect_model = face_alignment.FaceAlignment(face_alignment.LandmarksType.TWO_D, device="cuda:0", flip_input=False)
# with open('./data/celebhq-text/prompt_val_blip_full.json', 'rt') as f: # fill50k, COCO
# for line in f:
# val_data = json.loads(line)
end_list = np.array([17, 22, 27, 42, 48, 31, 36, 68], dtype=np.int32) - 1
def count_txt_files(directory):
pattern = os.path.join(directory, "*.txt")
txt_files = glob.glob(pattern)
return len(txt_files)
def plot_kpts(image, kpts, color="g"):
"""Draw 68 key points
Args:
image: the input image
kpt: (68, 3).
"""
if color == "r":
c = (255, 0, 0)
elif color == "g":
c = (0, 255, 0)
elif color == "b":
c = (255, 0, 0)
image = image.copy()
kpts = kpts.copy()
radius = max(int(min(image.shape[0], image.shape[1]) / 200), 1)
for i in range(kpts.shape[0]):
st = kpts[i, :2]
if kpts.shape[1] == 4:
if kpts[i, 3] > 0.5:
c = (0, 255, 0)
else:
c = (0, 0, 255)
image = cv2.circle(image, (int(st[0]), int(st[1])), radius, c, radius * 2)
if i in end_list:
continue
ed = kpts[i + 1, :2]
image = cv2.line(image, (int(st[0]), int(st[1])), (int(ed[0]), int(ed[1])), (255, 255, 255), radius)
return image
def generate_landmark2d(dataset, input_dir, pred_lmk_dir, gt_lmk_dir, vis=False):
print("Generate 2d landmarks ...")
os.makedirs(pred_lmk_dir, exist_ok=True)
imagepath_list = sorted(glob.glob(f"{input_dir}/pred*.png"))
for imagepath in tqdm(imagepath_list):
name = Path(imagepath).stem
idx = int(name.split("_")[-1])
pred_txt_path = os.path.join(pred_lmk_dir, f"{idx}.txt")
gt_lmk_path = os.path.join(gt_lmk_dir, f"{idx}_gt_lmk.jpg")
gt_txt_path = os.path.join(gt_lmk_dir, f"{idx}.txt")
gt_img_path = os.path.join(gt_lmk_dir, f"{idx}_gt_img.jpg")
if (not os.path.exists(pred_txt_path)) or (not os.path.exists(gt_txt_path)):
image = imread(imagepath) # [:, :, :3]
out = detect_model.get_landmarks(image)
if out is None:
continue
pred_kpt = out[0].squeeze()
np.savetxt(pred_txt_path, pred_kpt)
# Your existing code for obtaining the image tensor
gt_lmk_img = dataset[idx]["conditioning_pixel_values"]
save_image(gt_lmk_img, gt_lmk_path)
gt_img = (dataset[idx]["pixel_values"]) * 0.5 + 0.5
save_image(gt_img, gt_img_path)
gt_img = (gt_img.permute(1, 2, 0) * 255).type(torch.uint8).cpu().numpy()
out = detect_model.get_landmarks(gt_img)
if out is None:
continue
gt_kpt = out[0].squeeze()
np.savetxt(gt_txt_path, gt_kpt)
# gt_image = cv2.resize(cv2.imread(gt_lmk_path), (512, 512))
if vis:
gt_lmk_image = cv2.imread(gt_lmk_path)
# visualize predicted landmarks
vis_path = os.path.join(pred_lmk_dir, f"{idx}_overlay.jpg")
image = cv2.imread(imagepath)
image_point = plot_kpts(image, pred_kpt)
cv2.imwrite(vis_path, np.concatenate([image_point, gt_lmk_image], axis=1))
# visualize gt landmarks
vis_path = os.path.join(gt_lmk_dir, f"{idx}_overlay.jpg")
image = cv2.imread(gt_img_path)
image_point = plot_kpts(image, gt_kpt)
cv2.imwrite(vis_path, np.concatenate([image_point, gt_lmk_image], axis=1))
def landmark_comparison(val_dataset, lmk_dir, gt_lmk_dir):
print("Calculating reprojection error")
lmk_err = []
pbar = tqdm(range(len(val_dataset)))
for i in pbar:
# line = val_dataset[i]
# img_name = line["image"].split(".")[0]
lmk1_path = os.path.join(gt_lmk_dir, f"{i}.txt")
lmk1 = np.loadtxt(lmk1_path)
lmk2_path = os.path.join(lmk_dir, f"{i}.txt")
if not os.path.exists(lmk2_path):
print(f"{lmk2_path} not exist")
continue
lmk2 = np.loadtxt(lmk2_path)
lmk_err.append(np.mean(np.linalg.norm(lmk1 - lmk2, axis=1)))
pbar.set_description(f"lmk_err: {np.mean(lmk_err):.5f}")
print("Reprojection error:", np.mean(lmk_err))
np.save(os.path.join(lmk_dir, "lmk_err.npy"), lmk_err)
def main(args):
logging_dir = Path(args.output_dir, args.logging_dir)
accelerator = Accelerator(
gradient_accumulation_steps=args.gradient_accumulation_steps,
mixed_precision=args.mixed_precision,
log_with=args.report_to,
project_dir=logging_dir,
)
# Load the tokenizer
if args.tokenizer_name:
tokenizer = AutoTokenizer.from_pretrained(args.tokenizer_name, revision=args.revision, use_fast=False)
elif args.pretrained_model_name_or_path:
tokenizer = AutoTokenizer.from_pretrained(
args.pretrained_model_name_or_path,
subfolder="tokenizer",
revision=args.revision,
use_fast=False,
)
val_dataset = make_dataset(args, tokenizer, accelerator, "test")
gt_lmk_dir = os.path.join(args.output_dir, "gt_lmk")
if not os.path.exists(gt_lmk_dir):
os.makedirs(gt_lmk_dir, exist_ok=True)
pred_lmk_dir = os.path.join(args.output_dir, "pred_lmk")
if not os.path.exists(pred_lmk_dir):
os.makedirs(pred_lmk_dir, exist_ok=True)
input_dir = os.path.join(args.output_dir, "results")
generate_landmark2d(val_dataset, input_dir, pred_lmk_dir, gt_lmk_dir, args.vis_overlays)
if count_txt_files(pred_lmk_dir) == len(val_dataset) and count_txt_files(gt_lmk_dir) == len(val_dataset):
landmark_comparison(val_dataset, pred_lmk_dir, gt_lmk_dir)
if __name__ == "__main__":
args = parse_args()
main(args)

View File

@ -0,0 +1,29 @@
PEFT_TYPE="boft"
BLOCK_NUM=8
BLOCK_SIZE=0
N_BUTTERFLY_FACTOR=1
ITER_NUM=50000
export RUN_NAME="${PEFT_TYPE}_${BLOCK_NUM}${BLOCK_SIZE}${N_BUTTERFLY_FACTOR}"
export MODEL_NAME="stabilityai/stable-diffusion-2-1"
# export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATASET_NAME="oftverse/control-celeba-hq"
export CKPT_NAME="checkpoint-${ITER_NUM}"
export OUTPUT_DIR="./output/${DATASET_NAME}/${RUN_NAME}/${CKPT_NAME}"
export CONTROLNET_PATH="${OUTPUT_DIR}/controlnet/model.safetensors"
export UNET_PATH="${OUTPUT_DIR}/unet/${RUN_NAME}"
accelerate launch eval.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_NAME \
--controlnet_path=$CONTROLNET_PATH \
--unet_path=$UNET_PATH \
--adapter_name=$RUN_NAME \
--output_dir=$OUTPUT_DIR \
--dataset_name=$DATASET_NAME \
--vis_overlays \

View File

@ -0,0 +1,8 @@
datasets==2.16.1
diffusers==0.17.1
transformers==4.36.2
accelerate==0.25.0
wandb==0.16.1
scikit-image==0.22.0
opencv-python==4.9.0.80
face-alignment==1.4.1

View File

@ -0,0 +1,129 @@
# Copyright 2023-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# The implementation is based on "Parameter-Efficient Orthogonal Finetuning
# via Butterfly Factorization" (https://arxiv.org/abs/2311.06243) in ICLR 2024.
import os
import sys
import time
from pathlib import Path
import numpy as np
import torch
import torch.utils.checkpoint
from accelerate import Accelerator
from diffusers import DDIMScheduler
from diffusers.utils import check_min_version
from safetensors.torch import load_file
from tqdm import tqdm
from transformers import AutoTokenizer
from utils.args_loader import parse_args
from utils.dataset import make_dataset
from utils.light_controlnet import ControlNetModel
from utils.pipeline_controlnet import LightControlNetPipeline
from utils.unet_2d_condition import UNet2DConditionNewModel
sys.path.append("../../src")
from peft import PeftModel # noqa: E402
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
check_min_version("0.10.0.dev0")
device = torch.device("cuda:0")
def main(args):
logging_dir = Path(args.output_dir, args.logging_dir)
accelerator = Accelerator(
gradient_accumulation_steps=args.gradient_accumulation_steps,
mixed_precision=args.mixed_precision,
log_with=args.report_to,
project_dir=logging_dir,
)
# Load the tokenizer
if args.tokenizer_name:
tokenizer = AutoTokenizer.from_pretrained(args.tokenizer_name, revision=args.revision, use_fast=False)
elif args.pretrained_model_name_or_path:
tokenizer = AutoTokenizer.from_pretrained(
args.pretrained_model_name_or_path,
subfolder="tokenizer",
revision=args.revision,
use_fast=False,
)
val_dataset = make_dataset(args, tokenizer, accelerator, "test")
controlnet_path = args.controlnet_path
unet_path = args.unet_path
controlnet = ControlNetModel()
controlnet.load_state_dict(load_file(controlnet_path))
unet = UNet2DConditionNewModel.from_pretrained(args.pretrained_model_name_or_path, subfolder="unet")
unet = PeftModel.from_pretrained(unet, unet_path, adapter_name=args.adapter_name)
pipe = LightControlNetPipeline.from_pretrained(
args.pretrained_model_name_or_path,
controlnet=controlnet,
unet=unet.model,
torch_dtype=torch.float32,
requires_safety_checker=False,
).to(device)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
if not os.path.exists(args.output_dir):
os.makedirs(args.output_dir, exist_ok=True)
exist_lst = [int(img.split("_")[-1][:-4]) for img in os.listdir(args.output_dir)]
all_lst = np.arange(len(val_dataset))
idx_lst = [item for item in all_lst if item not in exist_lst]
print("Number of images to be processed: ", len(idx_lst))
np.random.seed(seed=int(time.time()))
np.random.shuffle(idx_lst)
for idx in tqdm(idx_lst):
output_path = os.path.join(args.output_dir, f"pred_img_{idx:04d}.png")
if not os.path.exists(output_path):
data = val_dataset[idx.item()]
negative_prompt = "low quality, blurry, unfinished"
with torch.no_grad():
pred_img = pipe(
data["text"],
[data["conditioning_pixel_values"]],
num_inference_steps=50,
guidance_scale=7,
negative_prompt=negative_prompt,
).images[0]
pred_img.save(output_path)
# control_img = Image.fromarray(
# (data["conditioning_pixel_value"] * 255).numpy().transpose(1, 2, 0).astype(np.uint8)
# )
# gt_img = Image.fromarray(
# ((data["pixel_value"] + 1.0) * 0.5 * 255).numpy().transpose(1, 2, 0).astype(np.uint8)
# )
if __name__ == "__main__":
args = parse_args()
main(args)

View File

@ -0,0 +1,29 @@
PEFT_TYPE="boft"
BLOCK_NUM=8
BLOCK_SIZE=0
N_BUTTERFLY_FACTOR=1
ITER_NUM=50000
export RUN_NAME="${PEFT_TYPE}_${BLOCK_NUM}${BLOCK_SIZE}${N_BUTTERFLY_FACTOR}"
export MODEL_NAME="stabilityai/stable-diffusion-2-1"
# export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export DATASET_NAME="oftverse/control-celeba-hq"
export CKPT_NAME="checkpoint-${ITER_NUM}"
export OUTPUT_DIR="./output/${DATASET_NAME}/${RUN_NAME}/${CKPT_NAME}"
export CONTROLNET_PATH="${OUTPUT_DIR}/controlnet/model.safetensors"
export UNET_PATH="${OUTPUT_DIR}/unet/${RUN_NAME}"
export RESULTS_PATH="${OUTPUT_DIR}/results"
accelerate launch test_controlnet.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_NAME \
--controlnet_path=$CONTROLNET_PATH \
--unet_path=$UNET_PATH \
--adapter_name=$RUN_NAME \
--output_dir=$RESULTS_PATH \
--dataset_name=$DATASET_NAME \

View File

@ -0,0 +1,537 @@
#!/usr/bin/env python
# Copyright 2023-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# The implementation is based on "Parameter-Efficient Orthogonal Finetuning
# via Butterfly Factorization" (https://arxiv.org/abs/2311.06243) in ICLR 2024.
import itertools
import logging
import math
import os
from pathlib import Path
import datasets
import diffusers
import torch
import torch.nn.functional as F
import torch.utils.checkpoint
import transformers
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import set_seed
from diffusers import (
AutoencoderKL,
DDIMScheduler,
)
from diffusers.optimization import get_scheduler
from diffusers.utils import check_min_version
from diffusers.utils.import_utils import is_xformers_available
from packaging import version
from tqdm.auto import tqdm
from transformers import AutoTokenizer
from utils.args_loader import (
import_model_class_from_model_name_or_path,
parse_args,
)
from utils.dataset import collate_fn, log_validation, make_dataset
from utils.light_controlnet import ControlNetModel
from utils.tracemalloc import TorchTracemalloc, b2mb
from utils.unet_2d_condition import UNet2DConditionNewModel
from peft import BOFTConfig, get_peft_model
from peft.peft_model import PeftModel
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
check_min_version("0.16.0.dev0")
logger = get_logger(__name__)
UNET_TARGET_MODULES = ["to_q", "to_v", "to_k", "query", "value", "key"]
TEXT_ENCODER_TARGET_MODULES = ["q_proj", "v_proj"]
@torch.no_grad()
def save_adaptor(accelerator, output_dir, nets_dict):
for net_key in nets_dict.keys():
net_model = nets_dict[net_key]
unwarpped_net = accelerator.unwrap_model(net_model)
if isinstance(unwarpped_net, PeftModel):
unwarpped_net.save_pretrained(
os.path.join(output_dir, net_key),
state_dict=accelerator.get_state_dict(net_model),
safe_serialization=True,
)
else:
accelerator.save_model(
unwarpped_net,
os.path.join(output_dir, net_key),
safe_serialization=True,
)
def main(args):
logging_dir = Path(args.output_dir, args.logging_dir)
accelerator = Accelerator(
gradient_accumulation_steps=args.gradient_accumulation_steps,
mixed_precision=args.mixed_precision,
log_with=args.report_to,
project_dir=logging_dir,
)
if args.report_to == "wandb":
wandb_init = {
"wandb": {
"name": args.wandb_run_name,
"mode": "online",
}
}
# Make one log on every process with the configuration for debugging.
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO,
)
logger.info(accelerator.state, main_process_only=False)
if accelerator.is_local_main_process:
datasets.utils.logging.set_verbosity_warning()
transformers.utils.logging.set_verbosity_warning()
diffusers.utils.logging.set_verbosity_info()
else:
datasets.utils.logging.set_verbosity_error()
transformers.utils.logging.set_verbosity_error()
diffusers.utils.logging.set_verbosity_error()
# If passed along, set the training seed now.
if args.seed is not None:
set_seed(args.seed)
# Handle the repository creation
if accelerator.is_main_process:
if args.output_dir is not None:
os.makedirs(args.output_dir, exist_ok=True)
# Load the tokenizer
if args.tokenizer_name:
tokenizer = AutoTokenizer.from_pretrained(args.tokenizer_name, revision=args.revision, use_fast=False)
elif args.pretrained_model_name_or_path:
tokenizer = AutoTokenizer.from_pretrained(
args.pretrained_model_name_or_path,
subfolder="tokenizer",
revision=args.revision,
use_fast=False,
)
# import correct text encoder class
text_encoder_cls = import_model_class_from_model_name_or_path(args.pretrained_model_name_or_path, args.revision)
# Load scheduler and models
noise_scheduler = DDIMScheduler.from_pretrained(args.pretrained_model_name_or_path, subfolder="scheduler")
text_encoder = text_encoder_cls.from_pretrained(
args.pretrained_model_name_or_path, subfolder="text_encoder", revision=args.revision
)
vae = AutoencoderKL.from_pretrained(args.pretrained_model_name_or_path, subfolder="vae", revision=args.revision)
unet = UNet2DConditionNewModel.from_pretrained(
args.pretrained_model_name_or_path,
subfolder="unet",
revision=args.revision,
)
controlnet = ControlNetModel()
if args.controlnet_model_name_or_path != "":
logger.info(f"Loading existing controlnet weights from {args.controlnet_model_name_or_path}")
controlnet.load_state_dict(torch.load(args.controlnet_model_name_or_path))
if args.use_boft:
config = BOFTConfig(
boft_block_size=args.boft_block_size,
boft_block_num=args.boft_block_num,
boft_n_butterfly_factor=args.boft_n_butterfly_factor,
target_modules=UNET_TARGET_MODULES,
boft_dropout=args.boft_dropout,
bias=args.boft_bias,
)
unet = get_peft_model(unet, config)
unet.print_trainable_parameters()
vae.requires_grad_(False)
controlnet.requires_grad_(True)
if not args.train_text_encoder:
text_encoder.requires_grad_(False)
unet.train()
controlnet.train()
if args.train_text_encoder and args.use_boft:
config = BOFTConfig(
boft_block_size=args.boft_block_size,
boft_block_num=args.boft_block_num,
boft_n_butterfly_factor=args.boft_n_butterfly_factor,
target_modules=TEXT_ENCODER_TARGET_MODULES,
boft_dropout=args.boft_dropout,
bias=args.boft_bias,
)
text_encoder = get_peft_model(text_encoder, config, adapter_name=args.wandb_run_name)
text_encoder.print_trainable_parameters()
if args.train_text_encoder:
text_encoder.train()
# For mixed precision training we cast the text_encoder and vae weights to half-precision
# as these models are only used for inference, keeping weights in full precision is not required.
weight_dtype = torch.float32
if accelerator.mixed_precision == "fp16":
weight_dtype = torch.float16
elif accelerator.mixed_precision == "bf16":
weight_dtype = torch.bfloat16
# Move unet, vae and text_encoder to device and cast to weight_dtype
unet.to(accelerator.device, dtype=weight_dtype)
vae.to(accelerator.device, dtype=weight_dtype)
controlnet.to(accelerator.device, dtype=weight_dtype)
if not args.train_text_encoder:
text_encoder.to(accelerator.device, dtype=weight_dtype)
if args.enable_xformers_memory_efficient_attention:
if is_xformers_available():
import xformers
xformers_version = version.parse(xformers.__version__)
if xformers_version == version.parse("0.0.16"):
logger.warning(
"xFormers 0.0.16 cannot be used for training in some GPUs. If you observe problems during training, please update xFormers to at least 0.0.17. See https://huggingface.co/docs/diffusers/main/en/optimization/xformers for more details."
)
unet.enable_xformers_memory_efficient_attention()
controlnet.enable_xformers_memory_efficient_attention()
if args.train_text_encoder and not (args.use_lora or args.use_boft or args.use_oft):
text_encoder.enable_xformers_memory_efficient_attention()
else:
raise ValueError("xformers is not available. Make sure it is installed correctly")
if args.gradient_checkpointing:
controlnet.enable_gradient_checkpointing()
unet.enable_gradient_checkpointing()
if args.train_text_encoder and not (args.use_lora or args.use_boft or args.use_oft):
text_encoder.gradient_checkpointing_enable()
# Check that all trainable models are in full precision
low_precision_error_string = (
" Please make sure to always have all model weights in full float32 precision when starting training - even if"
" doing mixed precision training, copy of the weights should still be float32."
)
if accelerator.unwrap_model(controlnet).dtype != torch.float32:
raise ValueError(
f"Controlnet loaded as datatype {accelerator.unwrap_model(controlnet).dtype}. {low_precision_error_string}"
)
if accelerator.unwrap_model(unet).dtype != torch.float32:
raise ValueError(
f"UNet loaded as datatype {accelerator.unwrap_model(unet).dtype}. {low_precision_error_string}"
)
# Enable TF32 for faster training on Ampere GPUs,
# cf https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices
if args.allow_tf32:
torch.backends.cuda.matmul.allow_tf32 = True
if args.scale_lr:
args.learning_rate = (
args.learning_rate * args.gradient_accumulation_steps * args.train_batch_size * accelerator.num_processes
)
# Use 8-bit Adam for lower memory usage or to fine-tune the model in 16GB GPUs
if args.use_8bit_adam:
try:
import bitsandbytes as bnb
except ImportError:
raise ImportError(
"To use 8-bit Adam, please install the bitsandbytes library: `pip install bitsandbytes`."
)
optimizer_class = bnb.optim.AdamW8bit
else:
optimizer_class = torch.optim.AdamW
params_to_optimize = [param for param in controlnet.parameters() if param.requires_grad]
params_to_optimize += [param for param in unet.parameters() if param.requires_grad]
if args.train_text_encoder:
params_to_optimize += [param for param in text_encoder.parameters() if param.requires_grad]
# Optimizer creation
optimizer = optimizer_class(
params_to_optimize,
lr=args.learning_rate,
betas=(args.adam_beta1, args.adam_beta2),
weight_decay=args.adam_weight_decay,
eps=args.adam_epsilon,
)
# Load the dataset
train_dataset = make_dataset(args, tokenizer, accelerator, "train")
val_dataset = make_dataset(args, tokenizer, accelerator, "test")
train_dataloader = torch.utils.data.DataLoader(
train_dataset,
shuffle=True,
collate_fn=collate_fn,
batch_size=args.train_batch_size,
num_workers=args.dataloader_num_workers,
)
# Scheduler and math around the number of training steps.
overrode_max_train_steps = False
num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
if args.max_train_steps is None:
args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
overrode_max_train_steps = True
lr_scheduler = get_scheduler(
args.lr_scheduler,
optimizer=optimizer,
num_warmup_steps=args.lr_warmup_steps * args.gradient_accumulation_steps,
num_training_steps=args.max_train_steps * args.gradient_accumulation_steps,
num_cycles=args.lr_num_cycles,
power=args.lr_power,
)
# Prepare everything with our `accelerator`.
controlnet, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
controlnet, optimizer, train_dataloader, lr_scheduler
)
if args.train_text_encoder:
text_encoder = accelerator.prepare(text_encoder)
# We need to recalculate our total training steps as the size of the training dataloader may have changed.
num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
if overrode_max_train_steps:
args.max_train_steps = args.num_train_epochs * num_update_steps_per_epoch
# Afterwards we recalculate our number of training epochs
args.num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
# We need to initialize the trackers we use, and also store our configuration.
# The trackers initializes automatically on the main process.
if accelerator.is_main_process:
accelerator.init_trackers(args.wandb_project_name, config=vars(args), init_kwargs=wandb_init)
# Train!
total_batch_size = args.train_batch_size * accelerator.num_processes * args.gradient_accumulation_steps
logger.info("***** Running training *****")
logger.info(f" Num examples = {len(train_dataset)}")
logger.info(f" Num batches each epoch = {len(train_dataloader)}")
logger.info(f" Num Epochs = {args.num_train_epochs}")
logger.info(f" Instantaneous batch size per device = {args.train_batch_size}")
logger.info(f" Total train batch size (w. parallel, distributed & accumulation) = {total_batch_size}")
logger.info(f" Gradient Accumulation steps = {args.gradient_accumulation_steps}")
logger.info(f" Total optimization steps = {args.max_train_steps}")
global_step = 0
first_epoch = 0
# Potentially load in the weights and states from a previous save
if args.resume_from_checkpoint:
if args.resume_from_checkpoint != "latest":
path = os.path.basename(args.resume_from_checkpoint)
else:
# Get the most recent checkpoint
dirs = os.listdir(args.output_dir)
if "checkpoint-current" in dirs:
path = "checkpoint-current"
dirs = [d for d in dirs if d.startswith("checkpoint") and d.endswith("0")]
dirs = sorted(dirs, key=lambda x: int(x.split("-")[1]))
else:
dirs = [d for d in dirs if d.startswith("checkpoint")]
dirs = sorted(dirs, key=lambda x: int(x.split("-")[1]))
path = dirs[-1] if len(dirs) > 0 else None
if path is None:
accelerator.print(
f"Checkpoint '{args.resume_from_checkpoint}' does not exist. Starting a new training run."
)
args.resume_from_checkpoint = None
initial_global_step = 0
else:
accelerator.print(f"Resuming from checkpoint {path}")
accelerator.load_state(os.path.join(args.output_dir, path))
if path.split("-")[1] == "current":
global_step = int(dirs[-1].split("-")[1])
else:
global_step = int(path.split("-")[1])
initial_global_step = global_step
resume_global_step = global_step * args.gradient_accumulation_steps
first_epoch = global_step // num_update_steps_per_epoch
resume_step = resume_global_step % (num_update_steps_per_epoch * args.gradient_accumulation_steps)
else:
initial_global_step = 0
progress_bar = tqdm(
range(0, args.max_train_steps),
initial=initial_global_step,
desc="Steps",
disable=not accelerator.is_local_main_process,
)
progress_bar.set_description("Steps")
for epoch in range(first_epoch, args.num_train_epochs):
with TorchTracemalloc() as tracemalloc:
for step, batch in enumerate(train_dataloader):
# Skip steps until we reach the resumed step
if args.resume_from_checkpoint and epoch == first_epoch and step < resume_step:
if step % args.gradient_accumulation_steps == 0:
progress_bar.update(1)
if args.report_to == "wandb":
accelerator.print(progress_bar)
continue
with accelerator.accumulate(controlnet), accelerator.accumulate(unet):
# Convert images to latent space
latents = vae.encode(batch["pixel_values"].to(dtype=weight_dtype)).latent_dist.sample()
latents = latents * vae.config.scaling_factor
# Sample noise that we'll add to the latents
noise = torch.randn_like(latents)
bsz = latents.shape[0]
# Sample a random timestep for each image
timesteps = torch.randint(
0, noise_scheduler.config.num_train_timesteps, (bsz,), device=latents.device
)
timesteps = timesteps.long()
# Add noise to the latents according to the noise magnitude at each timestep
# (this is the forward diffusion process)
noisy_latents = noise_scheduler.add_noise(latents, noise, timesteps)
# Get the text embedding for conditioning
encoder_hidden_states = text_encoder(batch["input_ids"])[0]
controlnet_image = batch["conditioning_pixel_values"].to(dtype=weight_dtype)
# Get the guided hint for the UNet (320 dim)
guided_hint = controlnet(
controlnet_cond=controlnet_image,
)
# Predict the noise residual
model_pred = unet(
noisy_latents,
timesteps,
guided_hint=guided_hint,
encoder_hidden_states=encoder_hidden_states,
).sample
# Get the target for loss depending on the prediction type
if noise_scheduler.config.prediction_type == "epsilon":
target = noise
elif noise_scheduler.config.prediction_type == "v_prediction":
target = noise_scheduler.get_velocity(latents, noise, timesteps)
else:
raise ValueError(f"Unknown prediction type {noise_scheduler.config.prediction_type}")
loss = F.mse_loss(model_pred.float(), target.float(), reduction="mean")
accelerator.backward(loss)
if accelerator.sync_gradients:
params_to_clip = (
itertools.chain(controlnet.parameters(), text_encoder.parameters())
if args.train_text_encoder
else itertools.chain(
controlnet.parameters(),
)
)
accelerator.clip_grad_norm_(params_to_clip, args.max_grad_norm)
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad(set_to_none=args.set_grads_to_none)
# Checks if the accelerator has performed an optimization step behind the scenes
if accelerator.sync_gradients:
progress_bar.update(1)
if args.report_to == "wandb":
accelerator.print(progress_bar)
global_step += 1
step_save_path = os.path.join(args.output_dir, f"checkpoint-{global_step}")
if accelerator.is_main_process:
if global_step % args.validation_steps == 0 or global_step == 1:
logger.info(f"Running validation... \n Generating {args.num_validation_images} images.")
logger.info("Running validation... ")
with torch.no_grad():
log_validation(val_dataset, text_encoder, unet, controlnet, args, accelerator)
if global_step % args.checkpointing_steps == 0:
save_adaptor(accelerator, step_save_path, {"controlnet": controlnet, "unet": unet})
# save text_encoder if any
if args.train_text_encoder:
save_adaptor(accelerator, step_save_path, {"text_encoder": text_encoder})
accelerator.save_state(step_save_path)
logger.info(f"Saved {global_step} state to {step_save_path}")
logger.info(f"Saved current state to {step_save_path}")
logs = {"loss": loss.detach().item(), "lr": lr_scheduler.get_last_lr()[0]}
progress_bar.set_postfix(**logs)
accelerator.log(logs, step=global_step)
if global_step >= args.max_train_steps:
break
# Printing the GPU memory usage details such as allocated memory, peak memory, and total memory usage
accelerator.print(f"GPU Memory before entering the train : {b2mb(tracemalloc.begin)}")
accelerator.print(f"GPU Memory consumed at the end of the train (end-begin): {tracemalloc.used}")
accelerator.print(f"GPU Peak Memory consumed during the train (max-begin): {tracemalloc.peaked}")
accelerator.print(
f"GPU Total Peak Memory consumed during the train (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
)
accelerator.print(f"CPU Memory before entering the train : {b2mb(tracemalloc.cpu_begin)}")
accelerator.print(f"CPU Memory consumed at the end of the train (end-begin): {tracemalloc.cpu_used}")
accelerator.print(f"CPU Peak Memory consumed during the train (max-begin): {tracemalloc.cpu_peaked}")
accelerator.print(
f"CPU Total Peak Memory consumed during the train (max): {tracemalloc.cpu_peaked + b2mb(tracemalloc.cpu_begin)}"
)
# Create the pipeline using using the trained modules and save it.
accelerator.wait_for_everyone()
accelerator.end_training()
if __name__ == "__main__":
args = parse_args()
main(args)

View File

@ -0,0 +1,42 @@
PEFT_TYPE="boft"
BLOCK_NUM=8
BLOCK_SIZE=0
N_BUTTERFLY_FACTOR=1
export DATASET_NAME="oftverse/control-celeba-hq"
export PROJECT_NAME="controlnet_${PEFT_TYPE}"
export RUN_NAME="${PEFT_TYPE}_${BLOCK_NUM}${BLOCK_SIZE}${N_BUTTERFLY_FACTOR}"
export CONTROLNET_PATH=""
export MODEL_NAME="stabilityai/stable-diffusion-2-1"
# export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export OUTPUT_DIR="./output/${DATASET_NAME}/${RUN_NAME}"
accelerate launch train_controlnet.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--resume_from_checkpoint=$RESUME_PATH \
--controlnet_model_name_or_path=$CONTROLNET_PATH \
--output_dir=$OUTPUT_DIR \
--report_to="wandb" \
--dataset_name=$DATASET_NAME \
--resolution=512 \
--learning_rate=1e-5 \
--checkpointing_steps=500 \
--max_train_steps=50000 \
--validation_steps=5000 \
--num_validation_images=12 \
--train_batch_size=4 \
--dataloader_num_workers=2 \
--seed="0" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--wandb_project_name=$PROJECT_NAME \
--wandb_run_name=$RUN_NAME \
--enable_xformers_memory_efficient_attention \
--use_boft \
--boft_block_num=$BLOCK_NUM \
--boft_block_size=$BLOCK_SIZE \
--boft_n_butterfly_factor=$N_BUTTERFLY_FACTOR \
--boft_dropout=0.1 \
--boft_bias="boft_only" \

View File

@ -0,0 +1 @@

View File

@ -0,0 +1,447 @@
import argparse
import os
from typing import Optional
from huggingface_hub import HfFolder, whoami
from transformers import PretrainedConfig
def get_full_repo_name(model_id: str, organization: Optional[str] = None, token: Optional[str] = None):
if token is None:
token = HfFolder.get_token()
if organization is None:
username = whoami(token)["name"]
return f"{username}/{model_id}"
else:
return f"{organization}/{model_id}"
def import_model_class_from_model_name_or_path(pretrained_model_name_or_path: str, revision: str):
text_encoder_config = PretrainedConfig.from_pretrained(
pretrained_model_name_or_path,
subfolder="text_encoder",
revision=revision,
)
model_class = text_encoder_config.architectures[0]
if model_class == "CLIPTextModel":
from transformers import CLIPTextModel
return CLIPTextModel
elif model_class == "RobertaSeriesModelWithTransformation":
from diffusers.pipelines.alt_diffusion.modeling_roberta_series import (
RobertaSeriesModelWithTransformation,
)
return RobertaSeriesModelWithTransformation
else:
raise ValueError(f"{model_class} is not supported.")
def parse_args(input_args=None):
parser = argparse.ArgumentParser(description="Simple example of a ControlNet training script.")
parser.add_argument(
"--pretrained_model_name_or_path",
type=str,
default=None,
required=True,
help="Path to pretrained model or model identifier from huggingface.co/models.",
)
parser.add_argument(
"--controlnet_model_name_or_path",
type=str,
default=None,
help="Path to pretrained controlnet model or model identifier from huggingface.co/models."
" If not specified controlnet weights are initialized from unet.",
)
parser.add_argument(
"--revision",
type=str,
default=None,
required=False,
help=(
"Revision of pretrained model identifier from huggingface.co/models. Trainable model components should be"
" float32 precision."
),
)
parser.add_argument(
"--tokenizer_name",
type=str,
default=None,
help="Pretrained tokenizer name or path if not the same as model_name",
)
parser.add_argument(
"--output_dir",
type=str,
default="controlnet-model",
help="The output directory where the model predictions and checkpoints will be written.",
)
parser.add_argument(
"--cache_dir",
type=str,
default=None,
help="The directory where the downloaded models and datasets will be stored.",
)
parser.add_argument("--seed", type=int, default=None, help="A seed for reproducible training.")
parser.add_argument(
"--resolution",
type=int,
default=512,
help=(
"The resolution for input images, all the images in the train/validation dataset will be resized to this"
" resolution"
),
)
parser.add_argument("--train_text_encoder", action="store_true", help="Whether to train the text encoder")
parser.add_argument(
"--train_batch_size", type=int, default=4, help="Batch size (per device) for the training dataloader."
)
parser.add_argument(
"--sample_batch_size", type=int, default=4, help="Batch size (per device) for sampling images."
)
parser.add_argument("--num_train_epochs", type=int, default=1)
parser.add_argument(
"--max_train_steps",
type=int,
default=None,
help="Total number of training steps to perform. If provided, overrides num_train_epochs.",
)
parser.add_argument(
"--checkpointing_steps",
type=int,
default=500,
help=(
"Save a checkpoint of the training state every X updates. Checkpoints can be used for resuming training via `--resume_from_checkpoint`. "
"In the case that the checkpoint is better than the final trained model, the checkpoint can also be used for inference."
"Using a checkpoint for inference requires separate loading of the original pipeline and the individual checkpointed model components."
"See https://huggingface.co/docs/diffusers/main/en/training/dreambooth#performing-inference-using-a-saved-checkpoint for step by step"
"instructions."
),
)
parser.add_argument(
"--checkpoints_total_limit",
type=int,
default=None,
help=("Max number of checkpoints to store."),
)
parser.add_argument(
"--resume_from_checkpoint",
type=str,
default=None,
help=(
"Whether training should be resumed from a previous checkpoint. Use a path saved by"
' `--checkpointing_steps`, or `"latest"` to automatically select the last available checkpoint.'
),
)
parser.add_argument(
"--gradient_accumulation_steps",
type=int,
default=1,
help="Number of updates steps to accumulate before performing a backward/update pass.",
)
parser.add_argument(
"--gradient_checkpointing",
action="store_true",
help="Whether or not to use gradient checkpointing to save memory at the expense of slower backward pass.",
)
parser.add_argument(
"--learning_rate",
type=float,
default=5e-6,
help="Initial learning rate (after the potential warmup period) to use.",
)
parser.add_argument(
"--scale_lr",
action="store_true",
default=False,
help="Scale the learning rate by the number of GPUs, gradient accumulation steps, and batch size.",
)
parser.add_argument(
"--lr_scheduler",
type=str,
default="constant",
help=(
'The scheduler type to use. Choose between ["linear", "cosine", "cosine_with_restarts", "polynomial",'
' "constant", "constant_with_warmup"]'
),
)
parser.add_argument(
"--lr_warmup_steps", type=int, default=500, help="Number of steps for the warmup in the lr scheduler."
)
parser.add_argument(
"--lr_num_cycles",
type=int,
default=1,
help="Number of hard resets of the lr in cosine_with_restarts scheduler.",
)
parser.add_argument("--lr_power", type=float, default=1.0, help="Power factor of the polynomial scheduler.")
parser.add_argument(
"--use_8bit_adam", action="store_true", help="Whether or not to use 8-bit Adam from bitsandbytes."
)
parser.add_argument(
"--dataloader_num_workers",
type=int,
default=0,
help=(
"Number of subprocesses to use for data loading. 0 means that the data will be loaded in the main process."
),
)
parser.add_argument("--adam_beta1", type=float, default=0.9, help="The beta1 parameter for the Adam optimizer.")
parser.add_argument("--adam_beta2", type=float, default=0.999, help="The beta2 parameter for the Adam optimizer.")
parser.add_argument("--adam_weight_decay", type=float, default=1e-2, help="Weight decay to use.")
parser.add_argument("--adam_epsilon", type=float, default=1e-08, help="Epsilon value for the Adam optimizer")
parser.add_argument("--max_grad_norm", default=1.0, type=float, help="Max gradient norm.")
parser.add_argument("--push_to_hub", action="store_true", help="Whether or not to push the model to the Hub.")
parser.add_argument("--hub_token", type=str, default=None, help="The token to use to push to the Model Hub.")
parser.add_argument(
"--hub_model_id",
type=str,
default=None,
help="The name of the repository to keep in sync with the local `output_dir`.",
)
parser.add_argument(
"--logging_dir",
type=str,
default="logs",
help=(
"[TensorBoard](https://www.tensorflow.org/tensorboard) log directory. Will default to"
" *output_dir/runs/**CURRENT_DATETIME_HOSTNAME***."
),
)
parser.add_argument(
"--allow_tf32",
action="store_true",
help=(
"Whether or not to allow TF32 on Ampere GPUs. Can be used to speed up training. For more information, see"
" https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices"
),
)
parser.add_argument(
"--report_to",
type=str,
default="wandb",
help=(
'The integration to report the results and logs to. Supported platforms are `"tensorboard"`'
' (default), `"wandb"` and `"comet_ml"`. Use `"all"` to report to all integrations.'
),
)
parser.add_argument(
"--wandb_key",
type=str,
default=None,
help=("If report to option is set to wandb, api-key for wandb used for login to wandb "),
)
parser.add_argument(
"--wandb_project_name",
type=str,
default=None,
help=("If report to option is set to wandb, project name in wandb for log tracking "),
)
parser.add_argument(
"--wandb_run_name",
type=str,
default=None,
help=("If report to option is set to wandb, project name in wandb for log tracking "),
)
parser.add_argument(
"--mixed_precision",
type=str,
default=None,
choices=["no", "fp16", "bf16"],
help=(
"Whether to use mixed precision. Choose between fp16 and bf16 (bfloat16). Bf16 requires PyTorch >="
" 1.10.and an Nvidia Ampere GPU. Default to the value of accelerate config of the current system or the"
" flag passed with the `accelerate.launch` command. Use this argument to override the accelerate config."
),
)
parser.add_argument(
"--enable_xformers_memory_efficient_attention", action="store_true", help="Whether or not to use xformers."
)
parser.add_argument(
"--set_grads_to_none",
action="store_true",
help=(
"Save more memory by using setting grads to None instead of zero. Be aware, that this changes certain"
" behaviors, so disable this argument if it causes any problems. More info:"
" https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html"
),
)
parser.add_argument(
"--dataset_name",
type=str,
default=None,
help=(
"The name of the Dataset (from the HuggingFace hub) to train on (could be your own, possibly private,"
" dataset). It can also be a path pointing to a local copy of a dataset in your filesystem,"
" or to a folder containing files that 🤗 Datasets can understand."
),
)
parser.add_argument(
"--dataset_config_name",
type=str,
default=None,
help="The config of the Dataset, leave as None if there's only one config.",
)
parser.add_argument(
"--train_data_dir",
type=str,
default=None,
help=(
"A folder containing the training data. Folder contents must follow the structure described in"
" https://huggingface.co/docs/datasets/image_dataset#imagefolder. In particular, a `metadata.jsonl` file"
" must exist to provide the captions for the images. Ignored if `dataset_name` is specified."
),
)
parser.add_argument(
"--image_column", type=str, default="image", help="The column of the dataset containing the target image."
)
parser.add_argument(
"--conditioning_image_column",
type=str,
default="conditioning_image",
help="The column of the dataset containing the controlnet conditioning image.",
)
parser.add_argument(
"--caption_column",
type=str,
default="text",
help="The column of the dataset containing a caption or a list of captions.",
)
parser.add_argument(
"--max_train_samples",
type=int,
default=None,
help=(
"For debugging purposes or quicker training, truncate the number of training examples to this "
"value if set."
),
)
parser.add_argument(
"--proportion_empty_prompts",
type=float,
default=0,
help="Proportion of image prompts to be replaced with empty strings. Defaults to 0 (no prompt replacement).",
)
parser.add_argument(
"--validation_prompt",
type=str,
default=None,
nargs="+",
help=(
"A set of prompts evaluated every `--validation_steps` and logged to `--report_to`."
" Provide either a matching number of `--validation_image`s, a single `--validation_image`"
" to be used with all prompts, or a single prompt that will be used with all `--validation_image`s."
),
)
parser.add_argument(
"--validation_image",
type=str,
default=None,
nargs="+",
help=(
"A set of paths to the controlnet conditioning image be evaluated every `--validation_steps`"
" and logged to `--report_to`. Provide either a matching number of `--validation_prompt`s, a"
" a single `--validation_prompt` to be used with all `--validation_image`s, or a single"
" `--validation_image` that will be used with all `--validation_prompt`s."
),
)
parser.add_argument(
"--num_validation_images",
type=int,
default=4,
help="Number of images to be generated for each `--validation_image`, `--validation_prompt` pair",
)
parser.add_argument(
"--validation_steps",
type=int,
default=100,
help=(
"Run validation every X steps. Validation consists of running the prompt"
" `args.validation_prompt` multiple times: `args.num_validation_images`"
" and logging the images."
),
)
parser.add_argument(
"--tracker_project_name",
type=str,
default="train_controlnet",
help=(
"The `project_name` argument passed to Accelerator.init_trackers for"
" more information see https://huggingface.co/docs/accelerate/v0.17.0/en/package_reference/accelerator#accelerate.Accelerator"
),
)
# evaluation arguments
parser.add_argument("--controlnet_path", type=str, default=None, help="Path to pretrained controlnet.")
parser.add_argument("--unet_path", type=str, default=None, help="Path to pretrained unet.")
parser.add_argument("--adapter_name", type=str, default=None, help="Name of the adapter to use.")
parser.add_argument("--vis_overlays", action="store_true", help="Whether to visualize the landmarks.")
# self-invented arguments
parser.add_argument("--local_rank", type=int, default=-1, help="For distributed training: local_rank")
parser.add_argument(
"--name",
type=str,
help=("The name of the current experiment run, consists of [data]-[prompt]"),
)
# BOFT args
parser.add_argument("--use_boft", action="store_true", help="Whether to use BOFT for parameter efficient tuning")
parser.add_argument("--boft_block_num", type=int, default=8, help="The number of BOFT blocks")
parser.add_argument("--boft_block_size", type=int, default=0, help="The size of BOFT blocks")
parser.add_argument("--boft_n_butterfly_factor", type=int, default=0, help="The number of butterfly factors")
parser.add_argument("--boft_dropout", type=float, default=0.1, help="BOFT dropout, only used if use_boft is True")
parser.add_argument(
"--boft_bias",
type=str,
default="none",
help="Bias type for BOFT. Can be 'none', 'all' or 'boft_only', only used if use_boft is True",
)
if input_args is not None:
args = parser.parse_args(input_args)
else:
args = parser.parse_args()
env_local_rank = int(os.environ.get("LOCAL_RANK", -1))
if env_local_rank != -1 and env_local_rank != args.local_rank:
args.local_rank = env_local_rank
if args.dataset_name is None and args.train_data_dir is None:
raise ValueError("Specify either `--dataset_name` or `--train_data_dir`")
if args.dataset_name is not None and args.train_data_dir is not None:
raise ValueError("Specify only one of `--dataset_name` or `--train_data_dir`")
if args.proportion_empty_prompts < 0 or args.proportion_empty_prompts > 1:
raise ValueError("`--proportion_empty_prompts` must be in the range [0, 1].")
if args.validation_prompt is not None and args.validation_image is None:
raise ValueError("`--validation_image` must be set if `--validation_prompt` is set")
if args.validation_prompt is None and args.validation_image is not None:
raise ValueError("`--validation_prompt` must be set if `--validation_image` is set")
if (
args.validation_image is not None
and args.validation_prompt is not None
and len(args.validation_image) != 1
and len(args.validation_prompt) != 1
and len(args.validation_image) != len(args.validation_prompt)
):
raise ValueError(
"Must provide either 1 `--validation_image`, 1 `--validation_prompt`,"
" or the same number of `--validation_prompt`s and `--validation_image`s"
)
if args.resolution % 8 != 0:
raise ValueError(
"`--resolution` must be divisible by 8 for consistently sized encoded images between the VAE and the controlnet encoder."
)
return args

View File

@ -0,0 +1,207 @@
import random
import numpy as np
import torch
import wandb
from datasets import load_dataset
from diffusers import DDIMScheduler
from PIL import Image
from torchvision import transforms
from utils.pipeline_controlnet import LightControlNetPipeline
def image_grid(imgs, rows, cols):
assert len(imgs) == rows * cols
w, h = imgs[0].size
grid = Image.new("RGB", size=(cols * w, rows * h))
for i, img in enumerate(imgs):
grid.paste(img, box=(i % cols * w, i // cols * h))
return grid
def log_validation(val_dataset, text_encoder, unet, controlnet, args, accelerator):
pipeline = LightControlNetPipeline.from_pretrained(
args.pretrained_model_name_or_path,
controlnet=accelerator.unwrap_model(controlnet, keep_fp32_wrapper=True),
unet=accelerator.unwrap_model(unet, keep_fp32_wrapper=True).model,
text_encoder=accelerator.unwrap_model(text_encoder, keep_fp32_wrapper=True),
safety_checker=None,
revision=args.revision,
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
pipeline = pipeline.to(accelerator.device)
pipeline.set_progress_bar_config(disable=True)
generator = torch.Generator(device=accelerator.device).manual_seed(args.seed)
image_logs = []
for idx in range(args.num_validation_images):
data = val_dataset[idx]
validation_prompt = data["text"]
validation_image = data["conditioning_pixel_values"]
image = pipeline(
validation_prompt,
[validation_image],
num_inference_steps=50,
generator=generator,
)[0][0]
image_logs.append(
{
"validation_image": validation_image,
"image": image,
"validation_prompt": validation_prompt,
}
)
for tracker in accelerator.trackers:
formatted_images = []
for log in image_logs:
image = log["image"]
validation_prompt = log["validation_prompt"]
validation_image = log["validation_image"]
formatted_images.append(wandb.Image(validation_image, caption="Controlnet conditioning"))
image = wandb.Image(image, caption=validation_prompt)
formatted_images.append(image)
tracker.log({"validation": formatted_images})
del pipeline
torch.cuda.empty_cache()
def make_dataset(args, tokenizer, accelerator, split="train"):
# Get the datasets: you can either provide your own training and evaluation files (see below)
# or specify a Dataset from the hub (the dataset will be downloaded automatically from the datasets Hub).
# In distributed training, the load_dataset function guarantees that only one local process can concurrently
# download the dataset.
if args.dataset_name is not None:
# Downloading and loading a dataset from the hub.
dataset = load_dataset(
args.dataset_name,
args.dataset_config_name,
cache_dir=args.cache_dir,
)
else:
if args.train_data_dir is not None:
dataset = load_dataset(
args.train_data_dir,
cache_dir=args.cache_dir,
)
# See more about loading custom images at
# https://huggingface.co/docs/datasets/v2.0.0/en/dataset_script
# Preprocessing the datasets.
# We need to tokenize inputs and targets.
column_names = dataset[split].column_names
# Get the column names for input/target.
if args.image_column is None:
image_column = column_names[0]
else:
image_column = args.image_column
if image_column not in column_names:
raise ValueError(
f"`--image_column` value '{args.image_column}' not found in dataset columns. Dataset columns are: {', '.join(column_names)}"
)
if args.caption_column is None:
caption_column = column_names[1]
else:
caption_column = args.caption_column
if caption_column not in column_names:
raise ValueError(
f"`--caption_column` value '{args.caption_column}' not found in dataset columns. Dataset columns are: {', '.join(column_names)}"
)
if args.conditioning_image_column is None:
conditioning_image_column = column_names[2]
else:
conditioning_image_column = args.conditioning_image_column
if conditioning_image_column not in column_names:
raise ValueError(
f"`--conditioning_image_column` value '{args.conditioning_image_column}' not found in dataset columns. Dataset columns are: {', '.join(column_names)}"
)
def tokenize_captions(examples, is_train=True):
captions = []
for caption in examples[caption_column]:
if random.random() < args.proportion_empty_prompts:
captions.append("")
elif isinstance(caption, str):
captions.append(caption)
elif isinstance(caption, (list, np.ndarray)):
# take a random caption if there are multiple
captions.append(random.choice(caption) if is_train else caption[0])
else:
raise ValueError(
f"Caption column `{caption_column}` should contain either strings or lists of strings."
)
inputs = tokenizer(
captions, max_length=tokenizer.model_max_length, padding="max_length", truncation=True, return_tensors="pt"
)
return inputs.input_ids
image_transforms = transforms.Compose(
[
transforms.Resize(args.resolution, interpolation=transforms.InterpolationMode.BILINEAR),
transforms.CenterCrop(args.resolution),
transforms.ToTensor(),
transforms.Normalize([0.5], [0.5]),
]
)
conditioning_image_transforms = transforms.Compose(
[
transforms.Resize(args.resolution, interpolation=transforms.InterpolationMode.BILINEAR),
transforms.CenterCrop(args.resolution),
transforms.ToTensor(),
]
)
def preprocess_train(examples):
images = [image.convert("RGB") for image in examples[image_column]]
images = [image_transforms(image) for image in images]
conditioning_images = [image.convert("RGB") for image in examples[conditioning_image_column]]
conditioning_images = [conditioning_image_transforms(image) for image in conditioning_images]
examples["pixel_values"] = images
examples["conditioning_pixel_values"] = conditioning_images
examples["input_ids"] = tokenize_captions(examples)
return examples
with accelerator.main_process_first():
if args.max_train_samples is not None:
dataset[split] = dataset[split].shuffle(seed=args.seed).select(range(args.max_train_samples))
# Set the training transforms
split_dataset = dataset[split].with_transform(preprocess_train)
return split_dataset
def collate_fn(examples):
pixel_values = torch.stack([example["pixel_values"] for example in examples])
pixel_values = pixel_values.to(memory_format=torch.contiguous_format).float()
conditioning_pixel_values = torch.stack([example["conditioning_pixel_values"] for example in examples])
conditioning_pixel_values = conditioning_pixel_values.to(memory_format=torch.contiguous_format).float()
input_ids = torch.stack([example["input_ids"] for example in examples])
return {
"pixel_values": pixel_values,
"conditioning_pixel_values": conditioning_pixel_values,
"input_ids": input_ids,
}

View File

@ -0,0 +1,263 @@
# Copyright 2023 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass
from typing import Dict, List, Optional, Tuple, Union
import torch
from diffusers.configuration_utils import ConfigMixin, register_to_config
from diffusers.models.attention_processor import AttentionProcessor, AttnProcessor
from diffusers.models.modeling_utils import ModelMixin
from diffusers.models.unet_2d_blocks import (
CrossAttnDownBlock2D,
DownBlock2D,
)
from diffusers.utils import BaseOutput, logging
from torch import nn
from torch.nn import functional as F
logger = logging.get_logger(__name__) # pylint: disable=invalid-name
@dataclass
class ControlNetOutput(BaseOutput):
down_block_res_samples: Tuple[torch.Tensor]
mid_block_res_sample: torch.Tensor
class ControlNetConditioningEmbedding(nn.Module):
"""
Quoting from https://arxiv.org/abs/2302.05543: "Stable Diffusion uses a pre-processing method similar to VQ-GAN
[11] to convert the entire dataset of 512 × 512 images into smaller 64 × 64 “latent images” for stabilized
training. This requires ControlNets to convert image-based conditions to 64 × 64 feature space to match the
convolution size. We use a tiny network E(·) of four convolution layers with 4 × 4 kernels and 2 × 2 strides
(activated by ReLU, channels are 16, 32, 64, 128, initialized with Gaussian weights, trained jointly with the full
model) to encode image-space conditions ... into feature maps ..."
"""
def __init__(
self,
conditioning_embedding_channels: int,
conditioning_channels: int = 3,
block_out_channels: Tuple[int] = (16, 32, 96, 256),
):
super().__init__()
self.conv_in = nn.Conv2d(conditioning_channels, block_out_channels[0], kernel_size=3, padding=1)
self.blocks = nn.ModuleList([])
for i in range(len(block_out_channels) - 1):
channel_in = block_out_channels[i]
channel_out = block_out_channels[i + 1]
self.blocks.append(nn.Conv2d(channel_in, channel_in, kernel_size=3, padding=1))
self.blocks.append(nn.Conv2d(channel_in, channel_out, kernel_size=3, padding=1, stride=2))
self.conv_out = zero_module(
nn.Conv2d(block_out_channels[-1], conditioning_embedding_channels, kernel_size=3, padding=1)
)
def forward(self, conditioning):
embedding = self.conv_in(conditioning)
embedding = F.silu(embedding)
for block in self.blocks:
embedding = block(embedding)
embedding = F.silu(embedding)
embedding = self.conv_out(embedding)
return embedding
class ControlNetModel(ModelMixin, ConfigMixin):
_supports_gradient_checkpointing = True
@register_to_config
def __init__(
self,
in_channels: int = 4,
out_channels: int = 320,
controlnet_conditioning_channel_order: str = "rgb",
conditioning_embedding_out_channels: Optional[Tuple[int]] = (16, 32, 96, 256),
):
super().__init__()
# for control image
self.controlnet_cond_embedding = ControlNetConditioningEmbedding(
conditioning_embedding_channels=out_channels,
block_out_channels=conditioning_embedding_out_channels,
)
@property
# Copied from diffusers.models.unet_2d_condition.UNet2DConditionModel.attn_processors
def attn_processors(self) -> Dict[str, AttentionProcessor]:
r"""
Returns:
`dict` of attention processors: A dictionary containing all attention processors used in the model with
indexed by its weight name.
"""
# set recursively
processors = {}
def fn_recursive_add_processors(name: str, module: torch.nn.Module, processors: Dict[str, AttentionProcessor]):
if hasattr(module, "set_processor"):
processors[f"{name}.processor"] = module.processor
for sub_name, child in module.named_children():
fn_recursive_add_processors(f"{name}.{sub_name}", child, processors)
return processors
for name, module in self.named_children():
fn_recursive_add_processors(name, module, processors)
return processors
# Copied from diffusers.models.unet_2d_condition.UNet2DConditionModel.set_attn_processor
def set_attn_processor(self, processor: Union[AttentionProcessor, Dict[str, AttentionProcessor]]):
r"""
Parameters:
`processor (`dict` of `AttentionProcessor` or `AttentionProcessor`):
The instantiated processor class or a dictionary of processor classes that will be set as the processor
of **all** `Attention` layers.
In case `processor` is a dict, the key needs to define the path to the corresponding cross attention processor. This is strongly recommended when setting trainable attention processors.:
"""
count = len(self.attn_processors.keys())
if isinstance(processor, dict) and len(processor) != count:
raise ValueError(
f"A dict of processors was passed, but the number of processors {len(processor)} does not match the"
f" number of attention layers: {count}. Please make sure to pass {count} processor classes."
)
def fn_recursive_attn_processor(name: str, module: torch.nn.Module, processor):
if hasattr(module, "set_processor"):
if not isinstance(processor, dict):
module.set_processor(processor)
else:
module.set_processor(processor.pop(f"{name}.processor"))
for sub_name, child in module.named_children():
fn_recursive_attn_processor(f"{name}.{sub_name}", child, processor)
for name, module in self.named_children():
fn_recursive_attn_processor(name, module, processor)
# Copied from diffusers.models.unet_2d_condition.UNet2DConditionModel.set_default_attn_processor
def set_default_attn_processor(self):
"""
Disables custom attention processors and sets the default attention implementation.
"""
self.set_attn_processor(AttnProcessor())
# Copied from diffusers.models.unet_2d_condition.UNet2DConditionModel.set_attention_slice
def set_attention_slice(self, slice_size):
r"""
Enable sliced attention computation.
When this option is enabled, the attention module will split the input tensor in slices, to compute attention
in several steps. This is useful to save some memory in exchange for a small speed decrease.
Args:
slice_size (`str` or `int` or `list(int)`, *optional*, defaults to `"auto"`):
When `"auto"`, halves the input to the attention heads, so attention will be computed in two steps. If
`"max"`, maximum amount of memory will be saved by running only one slice at a time. If a number is
provided, uses as many slices as `attention_head_dim // slice_size`. In this case, `attention_head_dim`
must be a multiple of `slice_size`.
"""
sliceable_head_dims = []
def fn_recursive_retrieve_sliceable_dims(module: torch.nn.Module):
if hasattr(module, "set_attention_slice"):
sliceable_head_dims.append(module.sliceable_head_dim)
for child in module.children():
fn_recursive_retrieve_sliceable_dims(child)
# retrieve number of attention layers
for module in self.children():
fn_recursive_retrieve_sliceable_dims(module)
num_sliceable_layers = len(sliceable_head_dims)
if slice_size == "auto":
# half the attention head size is usually a good trade-off between
# speed and memory
slice_size = [dim // 2 for dim in sliceable_head_dims]
elif slice_size == "max":
# make smallest slice possible
slice_size = num_sliceable_layers * [1]
slice_size = num_sliceable_layers * [slice_size] if not isinstance(slice_size, list) else slice_size
if len(slice_size) != len(sliceable_head_dims):
raise ValueError(
f"You have provided {len(slice_size)}, but {self.config} has {len(sliceable_head_dims)} different"
f" attention layers. Make sure to match `len(slice_size)` to be {len(sliceable_head_dims)}."
)
for i in range(len(slice_size)):
size = slice_size[i]
dim = sliceable_head_dims[i]
if size is not None and size > dim:
raise ValueError(f"size {size} has to be smaller or equal to {dim}.")
# Recursively walk through all the children.
# Any children which exposes the set_attention_slice method
# gets the message
def fn_recursive_set_attention_slice(module: torch.nn.Module, slice_size: List[int]):
if hasattr(module, "set_attention_slice"):
module.set_attention_slice(slice_size.pop())
for child in module.children():
fn_recursive_set_attention_slice(child, slice_size)
reversed_slice_size = list(reversed(slice_size))
for module in self.children():
fn_recursive_set_attention_slice(module, reversed_slice_size)
def _set_gradient_checkpointing(self, module, value=False):
if isinstance(module, (CrossAttnDownBlock2D, DownBlock2D)):
module.gradient_checkpointing = value
def forward(
self,
controlnet_cond: torch.FloatTensor,
) -> Union[ControlNetOutput, Tuple]:
# check channel order
channel_order = self.config.controlnet_conditioning_channel_order
if channel_order == "rgb":
# in rgb order by default
...
elif channel_order == "bgr":
controlnet_cond = torch.flip(controlnet_cond, dims=[1])
else:
raise ValueError(f"unknown `controlnet_conditioning_channel_order`: {channel_order}")
# 2. pre-process
controlnet_cond = self.controlnet_cond_embedding(controlnet_cond)
return controlnet_cond
def zero_module(module):
for p in module.parameters():
nn.init.zeros_(p)
return module

View File

@ -0,0 +1,452 @@
# Copyright 2023 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass
from typing import Any, Callable, Dict, List, Optional, Union
import numpy as np
import PIL.Image
import torch
from diffusers.pipelines.controlnet.multicontrolnet import MultiControlNetModel
from diffusers.pipelines.controlnet.pipeline_controlnet import StableDiffusionControlNetPipeline
from diffusers.utils import BaseOutput, is_compiled_module, logging
from torch.nn import functional as F
from utils.light_controlnet import ControlNetModel
logger = logging.get_logger(__name__) # pylint: disable=invalid-name
@dataclass
class LightControlNetPipelineOutput(BaseOutput):
"""
Output class for Stable Diffusion pipelines.
Args:
images (`List[PIL.Image.Image]` or `np.ndarray`)
List of denoised PIL images of length `batch_size` or numpy array of shape `(batch_size, height, width,
num_channels)`. PIL images or numpy array present the denoised images of the diffusion pipeline.
nsfw_content_detected (`List[bool]`)
List of flags denoting whether the corresponding generated image likely represents "not-safe-for-work"
(nsfw) content, or `None` if safety checking could not be performed.
"""
images: Union[List[PIL.Image.Image], np.ndarray]
nsfw_content_detected: Optional[List[bool]]
class LightControlNetPipeline(StableDiffusionControlNetPipeline):
_optional_components = ["safety_checker", "feature_extractor"]
def check_inputs(
self,
prompt,
image,
callback_steps,
negative_prompt=None,
prompt_embeds=None,
negative_prompt_embeds=None,
controlnet_conditioning_scale=1.0,
):
if (callback_steps is None) or (
callback_steps is not None and (not isinstance(callback_steps, int) or callback_steps <= 0)
):
raise ValueError(
f"`callback_steps` has to be a positive integer but is {callback_steps} of type"
f" {type(callback_steps)}."
)
if prompt is not None and prompt_embeds is not None:
raise ValueError(
f"Cannot forward both `prompt`: {prompt} and `prompt_embeds`: {prompt_embeds}. Please make sure to"
" only forward one of the two."
)
elif prompt is None and prompt_embeds is None:
raise ValueError(
"Provide either `prompt` or `prompt_embeds`. Cannot leave both `prompt` and `prompt_embeds` undefined."
)
elif prompt is not None and (not isinstance(prompt, str) and not isinstance(prompt, list)):
raise ValueError(f"`prompt` has to be of type `str` or `list` but is {type(prompt)}")
if negative_prompt is not None and negative_prompt_embeds is not None:
raise ValueError(
f"Cannot forward both `negative_prompt`: {negative_prompt} and `negative_prompt_embeds`:"
f" {negative_prompt_embeds}. Please make sure to only forward one of the two."
)
if prompt_embeds is not None and negative_prompt_embeds is not None:
if prompt_embeds.shape != negative_prompt_embeds.shape:
raise ValueError(
"`prompt_embeds` and `negative_prompt_embeds` must have the same shape when passed directly, but"
f" got: `prompt_embeds` {prompt_embeds.shape} != `negative_prompt_embeds`"
f" {negative_prompt_embeds.shape}."
)
# `prompt` needs more sophisticated handling when there are multiple
# conditionings.
if isinstance(self.controlnet, MultiControlNetModel):
if isinstance(prompt, list):
logger.warning(
f"You have {len(self.controlnet.nets)} ControlNets and you have passed {len(prompt)}"
" prompts. The conditionings will be fixed across the prompts."
)
# Check `image`
is_compiled = hasattr(F, "scaled_dot_product_attention") and isinstance(
self.controlnet, torch._dynamo.eval_frame.OptimizedModule
)
if (
isinstance(self.controlnet, ControlNetModel)
or is_compiled
and isinstance(self.controlnet._orig_mod, ControlNetModel)
):
self.check_image(image, prompt, prompt_embeds)
elif (
isinstance(self.controlnet, MultiControlNetModel)
or is_compiled
and isinstance(self.controlnet._orig_mod, MultiControlNetModel)
):
if not isinstance(image, list):
raise TypeError("For multiple controlnets: `image` must be type `list`")
# When `image` is a nested list:
# (e.g. [[canny_image_1, pose_image_1], [canny_image_2, pose_image_2]])
elif any(isinstance(i, list) for i in image):
raise ValueError("A single batch of multiple conditionings are supported at the moment.")
elif len(image) != len(self.controlnet.nets):
raise ValueError(
"For multiple controlnets: `image` must have the same length as the number of controlnets."
)
for image_ in image:
self.check_image(image_, prompt, prompt_embeds)
else:
assert False
# Check `controlnet_conditioning_scale`
if (
isinstance(self.controlnet, ControlNetModel)
or is_compiled
and isinstance(self.controlnet._orig_mod, ControlNetModel)
):
if not isinstance(controlnet_conditioning_scale, float):
raise TypeError("For single controlnet: `controlnet_conditioning_scale` must be type `float`.")
elif (
isinstance(self.controlnet, MultiControlNetModel)
or is_compiled
and isinstance(self.controlnet._orig_mod, MultiControlNetModel)
):
if isinstance(controlnet_conditioning_scale, list):
if any(isinstance(i, list) for i in controlnet_conditioning_scale):
raise ValueError("A single batch of multiple conditionings are supported at the moment.")
elif isinstance(controlnet_conditioning_scale, list) and len(controlnet_conditioning_scale) != len(
self.controlnet.nets
):
raise ValueError(
"For multiple controlnets: When `controlnet_conditioning_scale` is specified as `list`, it must have"
" the same length as the number of controlnets"
)
else:
assert False
@torch.no_grad()
def __call__(
self,
prompt: Union[str, List[str]] = None,
image: Union[
torch.FloatTensor,
PIL.Image.Image,
np.ndarray,
List[torch.FloatTensor],
List[PIL.Image.Image],
List[np.ndarray],
] = None,
height: Optional[int] = None,
width: Optional[int] = None,
num_inference_steps: int = 50,
guidance_scale: float = 7.5,
negative_prompt: Optional[Union[str, List[str]]] = None,
num_images_per_prompt: Optional[int] = 1,
eta: float = 0.0,
generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
latents: Optional[torch.FloatTensor] = None,
prompt_embeds: Optional[torch.FloatTensor] = None,
negative_prompt_embeds: Optional[torch.FloatTensor] = None,
output_type: Optional[str] = "pil",
return_dict: bool = True,
callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None,
callback_steps: int = 1,
cross_attention_kwargs: Optional[Dict[str, Any]] = None,
controlnet_conditioning_scale: Union[float, List[float]] = 1.0,
guess_mode: bool = False,
):
r"""
Function invoked when calling the pipeline for generation.
Args:
prompt (`str` or `List[str]`, *optional*):
The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`.
instead.
image (`torch.FloatTensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.FloatTensor]`, `List[PIL.Image.Image]`, `List[np.ndarray]`,:
`List[List[torch.FloatTensor]]`, `List[List[np.ndarray]]` or `List[List[PIL.Image.Image]]`):
The ControlNet input condition. ControlNet uses this input condition to generate guidance to Unet. If
the type is specified as `Torch.FloatTensor`, it is passed to ControlNet as is. `PIL.Image.Image` can
also be accepted as an image. The dimensions of the output image defaults to `image`'s dimensions. If
height and/or width are passed, `image` is resized according to them. If multiple ControlNets are
specified in init, images must be passed as a list such that each element of the list can be correctly
batched for input to a single controlnet.
height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
The height in pixels of the generated image.
width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
The width in pixels of the generated image.
num_inference_steps (`int`, *optional*, defaults to 50):
The number of denoising steps. More denoising steps usually lead to a higher quality image at the
expense of slower inference.
guidance_scale (`float`, *optional*, defaults to 7.5):
Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).
`guidance_scale` is defined as `w` of equation 2. of [Imagen
Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale >
1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`,
usually at the expense of lower image quality.
negative_prompt (`str` or `List[str]`, *optional*):
The prompt or prompts not to guide the image generation. If not defined, one has to pass
`negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is
less than `1`).
num_images_per_prompt (`int`, *optional*, defaults to 1):
The number of images to generate per prompt.
eta (`float`, *optional*, defaults to 0.0):
Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to
[`schedulers.DDIMScheduler`], will be ignored for others.
generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
to make generation deterministic.
latents (`torch.FloatTensor`, *optional*):
Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
tensor will ge generated by sampling using the supplied random `generator`.
prompt_embeds (`torch.FloatTensor`, *optional*):
Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
provided, text embeddings will be generated from `prompt` input argument.
negative_prompt_embeds (`torch.FloatTensor`, *optional*):
Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
argument.
output_type (`str`, *optional*, defaults to `"pil"`):
The output format of the generate image. Choose between
[PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] instead of a
plain tuple.
callback (`Callable`, *optional*):
A function that will be called every `callback_steps` steps during inference. The function will be
called with the following arguments: `callback(step: int, timestep: int, latents: torch.FloatTensor)`.
callback_steps (`int`, *optional*, defaults to 1):
The frequency at which the `callback` function will be called. If not specified, the callback will be
called at every step.
cross_attention_kwargs (`dict`, *optional*):
A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under
`self.processor` in
[diffusers.cross_attention](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/cross_attention.py).
controlnet_conditioning_scale (`float` or `List[float]`, *optional*, defaults to 1.0):
The outputs of the controlnet are multiplied by `controlnet_conditioning_scale` before they are added
to the residual in the original unet. If multiple ControlNets are specified in init, you can set the
corresponding scale as a list.
guess_mode (`bool`, *optional*, defaults to `False`):
In this mode, the ControlNet encoder will try best to recognize the content of the input image even if
you remove all prompts. The `guidance_scale` between 3.0 and 5.0 is recommended.
Examples:
Returns:
[`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] or `tuple`:
[`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] if `return_dict` is True, otherwise a `tuple.
When returning a tuple, the first element is a list with the generated images, and the second element is a
list of `bool`s denoting whether the corresponding generated image likely represents "not-safe-for-work"
(nsfw) content, according to the `safety_checker`.
"""
# 1. Check inputs. Raise error if not correct
self.check_inputs(
prompt,
image,
callback_steps,
negative_prompt,
prompt_embeds,
negative_prompt_embeds,
controlnet_conditioning_scale,
)
# 2. Define call parameters
if prompt is not None and isinstance(prompt, str):
batch_size = 1
elif prompt is not None and isinstance(prompt, list):
batch_size = len(prompt)
else:
batch_size = prompt_embeds.shape[0]
device = self._execution_device
# here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
# of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`
# corresponds to doing no classifier free guidance.
do_classifier_free_guidance = guidance_scale > 1.0
controlnet = self.controlnet._orig_mod if is_compiled_module(self.controlnet) else self.controlnet
if isinstance(controlnet, MultiControlNetModel) and isinstance(controlnet_conditioning_scale, float):
controlnet_conditioning_scale = [controlnet_conditioning_scale] * len(controlnet.nets)
# 3. Encode input prompt
text_encoder_lora_scale = (
cross_attention_kwargs.get("scale", None) if cross_attention_kwargs is not None else None
)
prompt_embeds = self._encode_prompt(
prompt,
device,
num_images_per_prompt,
do_classifier_free_guidance,
negative_prompt,
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_prompt_embeds,
lora_scale=text_encoder_lora_scale,
)
# 4. Prepare image
if isinstance(controlnet, ControlNetModel):
image = self.prepare_image(
image=image,
width=width,
height=height,
batch_size=batch_size * num_images_per_prompt,
num_images_per_prompt=num_images_per_prompt,
device=device,
dtype=controlnet.dtype,
do_classifier_free_guidance=do_classifier_free_guidance,
guess_mode=guess_mode,
)
height, width = image.shape[-2:]
elif isinstance(controlnet, MultiControlNetModel):
images = []
for image_ in image:
image_ = self.prepare_image(
image=image_,
width=width,
height=height,
batch_size=batch_size * num_images_per_prompt,
num_images_per_prompt=num_images_per_prompt,
device=device,
dtype=controlnet.dtype,
do_classifier_free_guidance=do_classifier_free_guidance,
guess_mode=guess_mode,
)
images.append(image_)
image = images
height, width = image[0].shape[-2:]
else:
assert False
# 5. Prepare timesteps
self.scheduler.set_timesteps(num_inference_steps, device=device)
timesteps = self.scheduler.timesteps
# 6. Prepare latent variables
num_channels_latents = self.unet.config.in_channels
latents = self.prepare_latents(
batch_size * num_images_per_prompt,
num_channels_latents,
height,
width,
prompt_embeds.dtype,
device,
generator,
latents,
)
# 7. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline
extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta)
# 8. Denoising loop
num_warmup_steps = len(timesteps) - num_inference_steps * self.scheduler.order
with self.progress_bar(total=num_inference_steps) as progress_bar:
for i, t in enumerate(timesteps):
# expand the latents if we are doing classifier free guidance
latent_model_input = torch.cat([latents] * 2) if do_classifier_free_guidance else latents
latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
# controlnet(s) inference
if guess_mode and do_classifier_free_guidance:
# Infer ControlNet only for the conditional batch.
control_model_input = latents
control_model_input = self.scheduler.scale_model_input(control_model_input, t)
else:
control_model_input = latent_model_input
# Get the guided hint for the UNet (320 dim)
guided_hint = self.controlnet(
controlnet_cond=image,
)
# Predict the noise residual
noise_pred = self.unet(
latent_model_input,
t,
guided_hint=guided_hint,
encoder_hidden_states=prompt_embeds,
)[0]
# perform guidance
if do_classifier_free_guidance:
noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
# compute the previous noisy sample x_t -> x_t-1
latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs, return_dict=False)[0]
# call the callback, if provided
if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0):
progress_bar.update()
if callback is not None and i % callback_steps == 0:
callback(i, t, latents)
# If we do sequential model offloading, let's offload unet and controlnet
# manually for max memory savings
if hasattr(self, "final_offload_hook") and self.final_offload_hook is not None:
self.unet.to("cpu")
self.controlnet.to("cpu")
torch.cuda.empty_cache()
if not output_type == "latent":
image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False)[0]
image, has_nsfw_concept = self.run_safety_checker(image, device, prompt_embeds.dtype)
else:
image = latents
has_nsfw_concept = None
if has_nsfw_concept is None:
do_denormalize = [True] * image.shape[0]
else:
do_denormalize = [not has_nsfw for has_nsfw in has_nsfw_concept]
image = self.image_processor.postprocess(image, output_type=output_type, do_denormalize=do_denormalize)
# Offload last model to CPU
if hasattr(self, "final_offload_hook") and self.final_offload_hook is not None:
self.final_offload_hook.offload()
if not return_dict:
return (image, has_nsfw_concept)
return LightControlNetPipelineOutput(images=image, nsfw_content_detected=has_nsfw_concept)

View File

@ -0,0 +1,58 @@
import gc
import threading
import psutil
import torch
# Converting Bytes to Megabytes
def b2mb(x):
return int(x / 2**20)
# This context manager is used to track the peak memory usage of the process
class TorchTracemalloc:
def __enter__(self):
gc.collect()
torch.cuda.empty_cache()
torch.cuda.reset_max_memory_allocated() # reset the peak gauge to zero
self.begin = torch.cuda.memory_allocated()
self.process = psutil.Process()
self.cpu_begin = self.cpu_mem_used()
self.peak_monitoring = True
peak_monitor_thread = threading.Thread(target=self.peak_monitor_func)
peak_monitor_thread.daemon = True
peak_monitor_thread.start()
return self
def cpu_mem_used(self):
"""get resident set size memory for the current process"""
return self.process.memory_info().rss
def peak_monitor_func(self):
self.cpu_peak = -1
while True:
self.cpu_peak = max(self.cpu_mem_used(), self.cpu_peak)
# can't sleep or will not catch the peak right (this comment is here on purpose)
# time.sleep(0.001) # 1msec
if not self.peak_monitoring:
break
def __exit__(self, *exc):
self.peak_monitoring = False
gc.collect()
torch.cuda.empty_cache()
self.end = torch.cuda.memory_allocated()
self.peak = torch.cuda.max_memory_allocated()
self.used = b2mb(self.end - self.begin)
self.peaked = b2mb(self.peak - self.begin)
self.cpu_end = self.cpu_mem_used()
self.cpu_used = b2mb(self.cpu_end - self.cpu_begin)
self.cpu_peaked = b2mb(self.cpu_peak - self.cpu_begin)
# print(f"delta used/peak {self.used:4d}/{self.peaked:4d}")

View File

@ -0,0 +1,277 @@
# Copyright 2023 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from dataclasses import dataclass
from typing import Any, Dict, Optional, Tuple, Union
import torch
from diffusers.models import UNet2DConditionModel
from diffusers.utils import BaseOutput, logging
logger = logging.get_logger(__name__) # pylint: disable=invalid-name
@dataclass
class UNet2DConditionOutput(BaseOutput):
"""
Args:
sample (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
Hidden states conditioned on `encoder_hidden_states` input. Output of last layer of model.
"""
sample: torch.FloatTensor
class UNet2DConditionNewModel(UNet2DConditionModel):
def forward(
self,
sample: torch.FloatTensor,
timestep: Union[torch.Tensor, float, int],
encoder_hidden_states: torch.Tensor,
guided_hint: Optional[torch.Tensor] = None,
class_labels: Optional[torch.Tensor] = None,
timestep_cond: Optional[torch.Tensor] = None,
attention_mask: Optional[torch.Tensor] = None,
cross_attention_kwargs: Optional[Dict[str, Any]] = None,
added_cond_kwargs: Optional[Dict[str, torch.Tensor]] = None,
down_block_additional_residuals: Optional[Tuple[torch.Tensor]] = None,
mid_block_additional_residual: Optional[torch.Tensor] = None,
encoder_attention_mask: Optional[torch.Tensor] = None,
return_dict: bool = True,
) -> Union[UNet2DConditionOutput, Tuple]:
r"""
Args:
sample (`torch.FloatTensor`): (batch, channel, height, width) noisy inputs tensor
timestep (`torch.FloatTensor` or `float` or `int`): (batch) timesteps
encoder_hidden_states (`torch.FloatTensor`): (batch, sequence_length, feature_dim) encoder hidden states
encoder_attention_mask (`torch.Tensor`):
(batch, sequence_length) cross-attention mask, applied to encoder_hidden_states. True = keep, False =
discard. Mask will be converted into a bias, which adds large negative values to attention scores
corresponding to "discard" tokens.
return_dict (`bool`, *optional*, defaults to `True`):
Whether or not to return a [`models.unet_2d_condition.UNet2DConditionOutput`] instead of a plain tuple.
cross_attention_kwargs (`dict`, *optional*):
A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under
`self.processor` in
[diffusers.cross_attention](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/cross_attention.py).
added_cond_kwargs (`dict`, *optional*):
A kwargs dictionary that if specified includes additonal conditions that can be used for additonal time
embeddings or encoder hidden states projections. See the configurations `encoder_hid_dim_type` and
`addition_embed_type` for more information.
Returns:
[`~models.unet_2d_condition.UNet2DConditionOutput`] or `tuple`:
[`~models.unet_2d_condition.UNet2DConditionOutput`] if `return_dict` is True, otherwise a `tuple`. When
returning a tuple, the first element is the sample tensor.
"""
# By default samples have to be AT least a multiple of the overall upsampling factor.
# The overall upsampling factor is equal to 2 ** (# num of upsampling layers).
# However, the upsampling interpolation output size can be forced to fit any upsampling size
# on the fly if necessary.
default_overall_up_factor = 2**self.num_upsamplers
# upsample size should be forwarded when sample is not a multiple of `default_overall_up_factor`
forward_upsample_size = False
upsample_size = None
if any(s % default_overall_up_factor != 0 for s in sample.shape[-2:]):
logger.info("Forward upsample size to force interpolation output size.")
forward_upsample_size = True
# ensure attention_mask is a bias, and give it a singleton query_tokens dimension
# expects mask of shape:
# [batch, key_tokens]
# adds singleton query_tokens dimension:
# [batch, 1, key_tokens]
# this helps to broadcast it as a bias over attention scores, which will be in one of the following shapes:
# [batch, heads, query_tokens, key_tokens] (e.g. torch sdp attn)
# [batch * heads, query_tokens, key_tokens] (e.g. xformers or classic attn)
if attention_mask is not None:
# assume that mask is expressed as:
# (1 = keep, 0 = discard)
# convert mask into a bias that can be added to attention scores:
# (keep = +0, discard = -10000.0)
attention_mask = (1 - attention_mask.to(sample.dtype)) * -10000.0
attention_mask = attention_mask.unsqueeze(1)
# convert encoder_attention_mask to a bias the same way we do for attention_mask
if encoder_attention_mask is not None:
encoder_attention_mask = (1 - encoder_attention_mask.to(sample.dtype)) * -10000.0
encoder_attention_mask = encoder_attention_mask.unsqueeze(1)
# 0. center input if necessary
if self.config.center_input_sample:
sample = 2 * sample - 1.0
# 1. time
timesteps = timestep
if not torch.is_tensor(timesteps):
# TODO: this requires sync between CPU and GPU. So try to pass timesteps as tensors if you can
# This would be a good case for the `match` statement (Python 3.10+)
is_mps = sample.device.type == "mps"
if isinstance(timestep, float):
dtype = torch.float32 if is_mps else torch.float64
else:
dtype = torch.int32 if is_mps else torch.int64
timesteps = torch.tensor([timesteps], dtype=dtype, device=sample.device)
elif len(timesteps.shape) == 0:
timesteps = timesteps[None].to(sample.device)
# broadcast to batch dimension in a way that's compatible with ONNX/Core ML
timesteps = timesteps.expand(sample.shape[0])
t_emb = self.time_proj(timesteps)
# `Timesteps` does not contain any weights and will always return f32 tensors
# but time_embedding might actually be running in fp16. so we need to cast here.
# there might be better ways to encapsulate this.
t_emb = t_emb.to(dtype=sample.dtype)
emb = self.time_embedding(t_emb, timestep_cond)
if self.class_embedding is not None:
if class_labels is None:
raise ValueError("class_labels should be provided when num_class_embeds > 0")
if self.config.class_embed_type == "timestep":
class_labels = self.time_proj(class_labels)
# `Timesteps` does not contain any weights and will always return f32 tensors
# there might be better ways to encapsulate this.
class_labels = class_labels.to(dtype=sample.dtype)
class_emb = self.class_embedding(class_labels).to(dtype=sample.dtype)
if self.config.class_embeddings_concat:
emb = torch.cat([emb, class_emb], dim=-1)
else:
emb = emb + class_emb
if self.config.addition_embed_type == "text":
aug_emb = self.add_embedding(encoder_hidden_states)
emb = emb + aug_emb
elif self.config.addition_embed_type == "text_image":
# Kadinsky 2.1 - style
if "image_embeds" not in added_cond_kwargs:
raise ValueError(
f"{self.__class__} has the config param `addition_embed_type` set to 'text_image' which requires the keyword argument `image_embeds` to be passed in `added_cond_kwargs`"
)
image_embs = added_cond_kwargs.get("image_embeds")
text_embs = added_cond_kwargs.get("text_embeds", encoder_hidden_states)
aug_emb = self.add_embedding(text_embs, image_embs)
emb = emb + aug_emb
if self.time_embed_act is not None:
emb = self.time_embed_act(emb)
if self.encoder_hid_proj is not None and self.config.encoder_hid_dim_type == "text_proj":
encoder_hidden_states = self.encoder_hid_proj(encoder_hidden_states)
elif self.encoder_hid_proj is not None and self.config.encoder_hid_dim_type == "text_image_proj":
# Kadinsky 2.1 - style
if "image_embeds" not in added_cond_kwargs:
raise ValueError(
f"{self.__class__} has the config param `encoder_hid_dim_type` set to 'text_image_proj' which requires the keyword argument `image_embeds` to be passed in `added_conditions`"
)
image_embeds = added_cond_kwargs.get("image_embeds")
encoder_hidden_states = self.encoder_hid_proj(encoder_hidden_states, image_embeds)
# 2. pre-process and insert conditioning (ControlNet)
# Note: the added "guided_hint" is the only difference between this implementation and the original UNet2DConditionModel
sample = self.conv_in(sample)
sample = guided_hint + sample if guided_hint is not None else sample
# 3. down
down_block_res_samples = (sample,)
for downsample_block in self.down_blocks:
if hasattr(downsample_block, "has_cross_attention") and downsample_block.has_cross_attention:
sample, res_samples = downsample_block(
hidden_states=sample,
temb=emb,
encoder_hidden_states=encoder_hidden_states,
attention_mask=attention_mask,
cross_attention_kwargs=cross_attention_kwargs,
encoder_attention_mask=encoder_attention_mask,
)
else:
sample, res_samples = downsample_block(hidden_states=sample, temb=emb)
down_block_res_samples += res_samples
if down_block_additional_residuals is not None:
new_down_block_res_samples = ()
for down_block_res_sample, down_block_additional_residual in zip(
down_block_res_samples, down_block_additional_residuals
):
down_block_res_sample = down_block_res_sample + down_block_additional_residual
new_down_block_res_samples = new_down_block_res_samples + (down_block_res_sample,)
down_block_res_samples = new_down_block_res_samples
# 4. mid
if self.mid_block is not None:
sample = self.mid_block(
sample,
emb,
encoder_hidden_states=encoder_hidden_states,
attention_mask=attention_mask,
cross_attention_kwargs=cross_attention_kwargs,
encoder_attention_mask=encoder_attention_mask,
)
if mid_block_additional_residual is not None:
sample = sample + mid_block_additional_residual
# 5. up
for i, upsample_block in enumerate(self.up_blocks):
is_final_block = i == len(self.up_blocks) - 1
res_samples = down_block_res_samples[-len(upsample_block.resnets) :]
down_block_res_samples = down_block_res_samples[: -len(upsample_block.resnets)]
# if we have not reached the final block and need to forward the
# upsample size, we do it here
if not is_final_block and forward_upsample_size:
upsample_size = down_block_res_samples[-1].shape[2:]
if hasattr(upsample_block, "has_cross_attention") and upsample_block.has_cross_attention:
sample = upsample_block(
hidden_states=sample,
temb=emb,
res_hidden_states_tuple=res_samples,
encoder_hidden_states=encoder_hidden_states,
cross_attention_kwargs=cross_attention_kwargs,
upsample_size=upsample_size,
attention_mask=attention_mask,
encoder_attention_mask=encoder_attention_mask,
)
else:
sample = upsample_block(
hidden_states=sample, temb=emb, res_hidden_states_tuple=res_samples, upsample_size=upsample_size
)
# 6. post-process
if self.conv_norm_out:
sample = self.conv_norm_out(sample)
sample = self.conv_act(sample)
sample = self.conv_out(sample)
if not return_dict:
return (sample,)
return UNet2DConditionOutput(sample=sample)

1
examples/boft_dreambooth/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
data/

View File

View File

@ -0,0 +1,165 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# DreamBooth fine-tuning with BOFT
This guide demonstrates how to use BOFT, an orthogonal fine-tuning method, to fine-tune Dreambooth with either `stabilityai/stable-diffusion-2-1` or `runwayml/stable-diffusion-v1-5` model.
By using BOFT from 🤗 PEFT, we can significantly reduce the number of trainable parameters while still achieving impressive results in various fine-tuning tasks across different foundation models. BOFT enhances model efficiency by integrating full-rank orthogonal matrices with a butterfly structure into specific model blocks, such as attention blocks, mirroring the approach used in LoRA. During fine-tuning, only these inserted matrices are trained, leaving the original model parameters untouched. During inference, the trainable BOFT paramteres can be merged into the original model, eliminating any additional computational costs.
As a member of the **orthogonal finetuning** class, BOFT presents a systematic and principled method for fine-tuning. It possesses several unique properties and has demonstrated superior performance compared to LoRA in a variety of scenarios. For further details on BOFT, please consult the [PEFT's GitHub repo's concept guide OFT](https://https://huggingface.co/docs/peft/index), the [original BOFT paper](https://arxiv.org/abs/2311.06243) and the [original OFT paper](https://arxiv.org/abs/2306.07280).
In this guide we provide a Dreambooth fine-tuning script that is available in [PEFT's GitHub repo examples](https://github.com/huggingface/peft/tree/main/examples/boft_dreambooth). This implementation is adapted from [peft's lora_dreambooth](https://github.com/huggingface/peft/tree/main/examples/lora_dreambooth). You can try it out and finetune on your custom images.
## Set up your environment
Start by cloning the PEFT repository:
```bash
git clone --recursive https://github.com/huggingface/peft
```
Navigate to the directory containing the training scripts for fine-tuning Dreambooth with BOFT:
```bash
cd peft/examples/boft_dreambooth
```
Set up your environment: install PEFT, and all the required libraries. At the time of writing this guide we recommend installing PEFT from source. The following environment setup should work on A100 and H100:
```bash
conda create --name peft python=3.10
conda activate peft
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install xformers -c xformers
pip install -r requirements.txt
pip install git+https://github.com/huggingface/peft
```
## Download the data
[dreambooth](https://github.com/google/dreambooth) dataset should have been automatically cloned in the following structure when running the training script.
```
boft_dreambooth
├── data
│ ├── data_dir
│ └── dreambooth
│ └── data
│ ├── backpack
│ └── backpack_dog
│ ...
```
You can also put your custom images into `boft_dreambooth/data/dreambooth`.
## Finetune Dreambooth with BOFT
```bash
./train_dreambooth.sh
```
or using the following script arguments:
```bash
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export INSTANCE_DIR="path-to-instance-images"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"
```
Here:
- `INSTANCE_DIR`: The directory containing the images that you intend to use for training your model.
- `CLASS_DIR`: The directory containing class-specific images. In this example, we use prior preservation to avoid overfitting and language-drift. For prior preservation, you need other images of the same class as part of the training process. However, these images can be generated and the training script will save them to a local path you specify here.
- `OUTPUT_DIR`: The destination folder for storing the trained model's weights.
To learn more about DreamBooth fine-tuning with prior-preserving loss, check out the [Diffusers documentation](https://huggingface.co/docs/diffusers/training/dreambooth#finetuning-with-priorpreserving-loss).
Launch the training script with `accelerate` and pass hyperparameters, as well as LoRa-specific arguments to it such as:
- `use_boft`: Enables BOFT in the training script.
- `boft_block_size`: the BOFT matrix block size across different layers, expressed in `int`. Smaller block size results in sparser update matrices with fewer trainable paramters. **Note**, please choose it to be dividable to most layer `in_features` dimension, e.g., 4, 8, 16. Also, you can only specify either `boft_block_size` or `boft_block_num`, but not both simultaneously, because `boft_block_size` x `boft_block_num` = layer dimension.
- `boft_block_num`: the number of BOFT matrix blocks across different layers, expressed in `int`. Fewer blocks result in sparser update matrices with fewer trainable paramters. **Note**, please choose it to be dividable to most layer `in_features` dimension, e.g., 4, 8, 16. Also, you can only specify either `boft_block_size` or `boft_block_num`, but not both simultaneously, because `boft_block_size` x `boft_block_num` = layer dimension.
- `boft_n_butterfly_factor`: the number of butterfly factors. **Note**, for `boft_n_butterfly_factor=1`, BOFT is the same as vanilla OFT, for `boft_n_butterfly_factor=2`, the effective block size of OFT becomes twice as big and the number of blocks become half.
- `bias`: specify if the `bias` paramteres should be traind. Can be `none`, `all` or `boft_only`.
- `boft_dropout`: specify the probability of multiplicative dropout.
Here's what the full set of script arguments may look like:
```bash
PEFT_TYPE="boft"
BLOCK_NUM=8
BLOCK_SIZE=0
N_BUTTERFLY_FACTOR=1
VALIDATION_PROMPT=${PROMPT_LIST[@]}
INSTANCE_PROMPT="a photo of ${UNIQUE_TOKEN} ${CLASS_TOKEN}"
CLASS_PROMPT="a photo of ${CLASS_TOKEN}"
export MODEL_NAME="stabilityai/stable-diffusion-2-1"
# export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export PROJECT_NAME="dreambooth_${PEFT_TYPE}"
export RUN_NAME="${SELECTED_SUBJECT}_${PEFT_TYPE}_${BLOCK_NUM}${BLOCK_SIZE}${N_BUTTERFLY_FACTOR}"
export INSTANCE_DIR="./data/dreambooth/dataset/${SELECTED_SUBJECT}"
export CLASS_DIR="./data/class_data/${CLASS_TOKEN}"
export OUTPUT_DIR="./data/output/${PEFT_TYPE}"
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir="$CLASS_DIR" \
--output_dir=$OUTPUT_DIR \
--wandb_project_name=$PROJECT_NAME \
--wandb_run_name=$RUN_NAME \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="$INSTANCE_PROMPT" \
--validation_prompt="$VALIDATION_PROMPT" \
--class_prompt="$CLASS_PROMPT" \
--resolution=512 \
--train_batch_size=1 \
--num_dataloader_workers=2 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--use_boft \
--boft_block_num=$BLOCK_NUM \
--boft_block_size=$BLOCK_SIZE \
--boft_n_butterfly_factor=$N_BUTTERFLY_FACTOR \
--boft_dropout=0.1 \
--boft_bias="boft_only" \
--learning_rate=3e-5 \
--max_train_steps=1010 \
--checkpointing_steps=200 \
--validation_steps=200 \
--enable_xformers_memory_efficient_attention \
--report_to="wandb" \
```
or use this training script:
```bash
./train_dreambooth.sh $idx
```
with the `$idx` corresponds to different subjects.
If you are running this script on Windows, you may need to set the `--num_dataloader_workers` to 0.
## Inference with a single adapter
To run inference with the fine-tuned model, simply run the jupyter notebook `dreambooth_inference.ipynb` for visualization with `jupyter notebook` under `./examples/boft_dreambooth`.

View File

@ -0,0 +1,186 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "acab479f",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"import torch\n",
"from accelerate.logging import get_logger\n",
"from diffusers import StableDiffusionPipeline\n",
"from diffusers.utils import check_min_version\n",
"\n",
"from peft import PeftModel\n",
"\n",
"# Will error if the minimal version of diffusers is not installed. Remove at your own risks.\n",
"check_min_version(\"0.10.0.dev0\")\n",
"\n",
"logger = get_logger(__name__)\n",
"\n",
"MODEL_NAME = \"stabilityai/stable-diffusion-2-1\"\n",
"# MODEL_NAME=\"runwayml/stable-diffusion-v1-5\"\n",
"\n",
"PEFT_TYPE=\"boft\"\n",
"BLOCK_NUM=8\n",
"BLOCK_SIZE=0\n",
"N_BUTTERFLY_FACTOR=1\n",
"SELECTED_SUBJECT=\"backpack\"\n",
"EPOCH_IDX = 200\n",
"\n",
"PROJECT_NAME=f\"dreambooth_{PEFT_TYPE}\"\n",
"RUN_NAME=f\"{SELECTED_SUBJECT}_{PEFT_TYPE}_{BLOCK_NUM}{BLOCK_SIZE}{N_BUTTERFLY_FACTOR}\"\n",
"OUTPUT_DIR=f\"./data/output/{PEFT_TYPE}\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "06cfd506",
"metadata": {},
"outputs": [],
"source": [
"def get_boft_sd_pipeline(\n",
" ckpt_dir, base_model_name_or_path=None, epoch=int, dtype=torch.float32, device=\"cuda\", adapter_name=\"default\"\n",
"):\n",
"\n",
" if base_model_name_or_path is None:\n",
" raise ValueError(\"Please specify the base model name or path\")\n",
"\n",
" pipe = StableDiffusionPipeline.from_pretrained(\n",
" base_model_name_or_path, torch_dtype=dtype, requires_safety_checker=False\n",
" ).to(device)\n",
" \n",
" load_adapter(pipe, ckpt_dir, epoch, adapter_name)\n",
"\n",
" if dtype in (torch.float16, torch.bfloat16):\n",
" pipe.unet.half()\n",
" pipe.text_encoder.half()\n",
"\n",
" pipe.to(device)\n",
" return pipe\n",
"\n",
"\n",
"def load_adapter(pipe, ckpt_dir, epoch, adapter_name=\"default\"):\n",
" \n",
" unet_sub_dir = os.path.join(ckpt_dir, f\"unet/{epoch}\", adapter_name)\n",
" text_encoder_sub_dir = os.path.join(ckpt_dir, f\"text_encoder/{epoch}\", adapter_name)\n",
" \n",
" if isinstance(pipe.unet, PeftModel):\n",
" pipe.unet.load_adapter(unet_sub_dir, adapter_name=adapter_name)\n",
" else:\n",
" pipe.unet = PeftModel.from_pretrained(pipe.unet, unet_sub_dir, adapter_name=adapter_name)\n",
" \n",
" if os.path.exists(text_encoder_sub_dir):\n",
" if isinstance(pipe.text_encoder, PeftModel):\n",
" pipe.text_encoder.load_adapter(text_encoder_sub_dir, adapter_name=adapter_name)\n",
" else:\n",
" pipe.text_encoder = PeftModel.from_pretrained(pipe.text_encoder, text_encoder_sub_dir, adapter_name=adapter_name)\n",
" \n",
"\n",
"def set_adapter(pipe, adapter_name):\n",
" pipe.unet.set_adapter(adapter_name)\n",
" if isinstance(pipe.text_encoder, PeftModel):\n",
" pipe.text_encoder.set_adapter(adapter_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "98a0d8ac",
"metadata": {},
"outputs": [],
"source": [
"prompt = \"a photo of sks backpack on a wooden floor\"\n",
"negative_prompt = \"low quality, blurry, unfinished\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d4e888d2",
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"pipe = get_boft_sd_pipeline(OUTPUT_DIR, MODEL_NAME, EPOCH_IDX, adapter_name=RUN_NAME)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f1c1a1c0",
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"image = pipe(prompt, num_inference_steps=50, guidance_scale=7, negative_prompt=negative_prompt).images[0]\n",
"image"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3a1aafdf-8cf7-4e47-9471-26478034245e",
"metadata": {},
"outputs": [],
"source": [
"# load and reset another adapter\n",
"# WARNING: requires training DreamBooth with `boft_bias=None`\n",
"\n",
"SELECTED_SUBJECT=\"dog\"\n",
"EPOCH_IDX = 200\n",
"RUN_NAME=f\"{SELECTED_SUBJECT}_{PEFT_TYPE}_{BLOCK_NUM}{BLOCK_SIZE}{N_BUTTERFLY_FACTOR}\"\n",
"\n",
"load_adapter(pipe, OUTPUT_DIR, epoch=EPOCH_IDX, adapter_name=RUN_NAME)\n",
"set_adapter(pipe, adapter_name=RUN_NAME)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c7091ad0-2005-4528-afc1-4f9d70a9a535",
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"prompt = \"a photo of sks dog running on the beach\"\n",
"negative_prompt = \"low quality, blurry, unfinished\"\n",
"image = pipe(prompt, num_inference_steps=50, guidance_scale=7, negative_prompt=negative_prompt).images[0]\n",
"image"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f534eca2-94a4-432b-b092-7149ac44b12f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:peft] *",
"language": "python",
"name": "conda-env-peft-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

Some files were not shown because too many files have changed in this diff Show More