Make OFT faster and more memory efficient. This new version of OFT is
not backwards compatible with older checkpoints and vice versa. To load
older checkpoints, downgrade PEFT to 0.15.2 or lower.
* Refactor test_adaption_prompt.py
- Did not really use PeftCommonTester, thus removed it
- Removed skip if llama or mistral not avaiable
- Parametrized tests instead of duplicating
- Use small models from Hub instead of creating new ones
- Test coverage misses 3 more lines around loading checkpoint, most
likely unrelated to adaption prompt but instead due to using hub models
instead of creating new ones
* Refactor test_feature_extraction.py
Pretty straightforward, test coverage is 100% identical.
* Refactor test_multitask_prompt_tuning
Same arguments apply as for test_adaption_prompt.py
* Refactor test_stablediffusion.py
This was pretty straightforward. After refactoring, the test coverage
was 100% the same.
I noticed, however, that these tests did not cover LoKr, they only
pretended to:
37f8dc3458/tests/test_stablediffusion.py (L113-L114)
Thus I added LoKr to the test matrix, after which the test coverage if
of course different, but is fine.
* Skip LoKr merging tests when not CUDA
For some reason, the outputs differ after merging. However, I locally
verified that this is already true before this refactor, so let's just
skip for now, as it is out of scope.
So far, tests are using hf-internal-testing/tiny-stable-diffusion-torch
for testing diffusion models. However, this model has some issues:
- still uses pickle (.bin) instead of safetensors
- there is a FutureWarning because of the config
Now, using hf-internal-testing/tiny-sd-pipe instead which doesn't have
those issues.
The previous OFT implementation contained a few errors, which are fixed now.
Unfortunately, this makes previous OFT checkpoints invalid, which is why an
error will be raised. Users are instructed to either retrain the OFT adapter or
switch to an old PEFT version.
This PR allows to initialize the adpater weights as empty, i.e. on meta
device, by passing low_cpu_mem_usage=True.
Why would this be useful? For PEFT training, it is indeed not useful, as
we need the real weights in order to train the model. However, when
loading a trained PEFT adapter, it is unnecessary to initialize the
adapters for real, as we override them with the loaded weights later.
In the grand scheme of things, loading the base model will typically be
much slower, but if the user loads, say, dozens of adapters, the
overhead could add up. Of course, besides loading the model, this has no
performance impact and is thus not a high priority feature.
For the time being, this is completely opt in. However, it should be safe to
make this default for loading adapters. Therefore, in the future we may change
the default there.
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Support OFT
* add test
* Update README
* fix code quality
* fix test
* Skip 1 test
* fix eps rule and add more test
* feat: added examples to new OFT method
* fix: removed wrong arguments from model example
* fix: changed name of inference file
* fix: changed prompt variable
* fix docs
* fix: dreambooth inference revision based on feedback
* fix: review from BenjaminBossan
* apply safe merge
* del partially
* refactor oft
* refactor oft
* del unused line
* del unused line
* fix skip in windows
* skip test
* Add comments about bias added place
* rename orig_weights to new_weights
* use inverse instead of linalg.inv
* delete alpha and scaling
---------
Co-authored-by: Lukas Kuhn <lukaskuhn.lku@gmail.com>
Co-authored-by: Lukas Kuhn <lukas.kuhn@deutschebahn.com>
This PR deals with some issues with disabling adapter:
- typo in active.adapter
- prompt encoder could be on wrong device
- when using prompt learning + generate, disabling did not work
For the last point, there is a somewhat ugly fix in place for now,
pending a more comprehensive refactor (a comment was added to that
effect).
Comprehensive tests were added to check that everything works now.
The following tests still not working:
- adaption prompt
- seq2seq with prompt tuning/prompt encoding
- stable diffusion is a little bit flaky but test is hopefully robust enough
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>