Compare commits

...

95 Commits

Author SHA1 Message Date
f6b0a2dd43 ENH Small speedups to adapter injection (#2785)
See
https://github.com/huggingface/diffusers/issues/11816#issuecomment-3281290153

This PR implements two small improvements to the speed of adapter
injection. On a benchmark based on the linked issue, the first change
leads to a speedup of 21% and the second change of another 3%. It's not
that much, but as the changes don't make the code more complicated,
there is really no reason not to take them.

The optimizations don't add any functional change but are simply based
on not recomputing the same values multiple times. Therefore, unless I'm
missing something, they should strictly improve runtime.
2025-09-23 13:27:49 +02:00
f1b83646a6 The great deduplication (#2771)
Deduplicate a lot of redundant code from PEFT method's model.py:

merge_and_unload
unload
delete_adapter
set_adapter
enable_adapter_layers
disable_adapter_layers
_replace_module
_unload_and_optionally_merge
_mark_only_adapters_as_trainable
_check_new_adapter_config
_check_target_module_exists
_prepare_adapter_config
__getattr__
get_peft_config_as_dict (fully deleted)

Related changes:

A new module, functional.py, is introduced, which contains functions
(just reimported from elsewhere) that can be useful for libraries that
want to integrate PEFT. I would suggest that we should treat them as
public API and thus guarantee backwards compatibility.

I also deduplicated almost identical
TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPING constants by copying
them from LoRA and only overriding a few values that differ. Moreover,
some PEFT methods didn't have their own
TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPING but used the one from
LoRA instead. They now each have their own constant, which is a copy
from the one from LoRA.
2025-09-23 13:26:35 +02:00
b774fd901e TST Add missing configs to test_config.py (#2781)
The test_config.py tests were missing a few configs from recently added
PEFT methods. Those are now included. After adding those, it was
revealed that for C3A and trainable tokens, super().__post_init__() was
not being called. This is now done.
2025-09-19 17:52:58 +02:00
20a9829f76 FIX Account for rsLoRA scaling in set_scale (#2775) 2025-09-16 11:30:29 +02:00
1806c1651a CHORE Update and pin (commit hash) GitHub actions (#2779)
Some GH actions didn't have a pinned commit hash while others did
because of Zizmor. Now all actions have pinned commit hashes.
2025-09-11 11:12:23 +02:00
13fa0aea7e FIX: Wrong coupling between requires_grad and the active adapter (#2765)
Description

At the moment, we strongly couple the active adapter with
requires_grad=True. Concretely, when we call model.set_adapter(name), we
automatically assume that this adapter should not only be made active,
its requires_grad should also be set to True.

For the purpose of training PEFT models, this is fair. However, when
loading PEFT models for inference, this is not desired. Generally, for
inference, we don't need requires_grad=True, but as is, it is enabled.

Generally, this is not a severe bug, since in the inference code, we
don't perform any updates, thus we don't inadvertently update a weight
because it wrongly has requires_grad=True -- this is probably why it
went unnoticed so far. However, it could lead to worse runtime
performance and memory overhead when PyTorch records grads for those
parameters (which it shouldn't if called with torch.inference_mode, but
some users may forget to use this). Therefore, this bug is still worth
fixing.

Example

Example

With `modules_to_save`

A very basic example where the current PEFT fails:

import os
from transformers import AutoModelForCausalLM
from peft import LoraConfig, PeftModel, get_peft_model

model_id = "facebook/opt-125m"
path = "/tmp/peft/2759"
if not os.path.exists(path + "/adapter_model.safetensors"):
    model = AutoModelForCausalLM.from_pretrained(model_id)
    config = LoraConfig(target_modules=["q_proj", "v_proj"], modules_to_save=["lm_head"], r=8)
    model = get_peft_model(model, config)
    model.save_pretrained(path)
    del model

model = AutoModelForCausalLM.from_pretrained(model_id)
model = PeftModel.from_pretrained(model, path)

`modules_to_save` should not have grads enabled, but currently it does.

### With multiple adapters

There is also an issue when loading more than one adapter:

model = PeftModel.from_pretrained(...)
assert not any(p.requires_grad for p in model.parameters())  # works

So far, so good, the first adapter does not have `requires_grad`.

model.load_adapter(...)
assert not any(p.requires_grad for p in model.parameters())  # fails

The load_adapter call inadvertently sets requires_grad=True for the
weights of the _first_ adapter. The reason why this happens is because
when the second adapter is loaded, we call set_adapter with the first
adapter to ensure that it remains the activate adapter. However, due to
the coupling of active adapter and requires_grad, this would result in
setting requires_grad=True for the first adapter.

The PR relaxes this coupling by allowing to call set_adapter with an
additional argument, inference_mode. If set to True, the requires_grad
will not be enabled, even if the adapter is activated.

The example above would also fail for modules_to_save and trainable
tokens, not only for the LoRA/LoHa/... weights.

Still open bugs

The proposed solution is unfortunately not perfect. Right now, we do
pass inference_mode based on the PEFT config of the adapter being added,
which helps with the original issue described above. However, even this
is not absolutely correct, because inference_mode of the second adapter
does not necessarily have the same value as inference_mode of the first
adapter. To illustrate how this can go wrong, I added an xfailing test:

test_loading_model_requires_grad_set_correctly_switch_inference_mode

I believe that this use case is rarer than the ones described at the
beginning, so IMO it is okay to have this bug because we fix more common
bugs. However, LMK if you disagree.

Related to this, I noticed that many tests in
test_custom_models.TestRequiresGrad had code like this:

config0 = FooConfig(...)
peft_model = get_peft_model(MLP(), config0)
config1 = FooConfig(..., inference_mode=True)  # <==
peft_model.add_adapter("adapter1", config1)

This now fails because of the reason just given. I removed
inference_mode=True here and the tests pass again.

Note that the only reason why inference_mode=True was passed here is
because AdaLoRA cannot load 2 adapters in training mode and thus
requires this. Later PEFT methods without this restriction blindly
copied the AdaLoRA test. For those PEFT methods, I removed
inference_mode=True.

However, this also means that the AdaLoRA tests now fail. I thus marked
them as xfail.

To properly fix this bug, I think we would have to refactor the code to
isolate set_adapter (i.e. determining the active adapter) and setting
requires_grad into separate code paths, as they're orthogonal. Moreover,
these attributes are being set all over the place, which makes it hard
to reason about where these attributes are being changed. This should be
streamlined.

Making these changes while not breaking any existing code is not
trivial (or maybe impossible even). Therefore, I went the easier way for
the time being with this PR. Maybe a bigger refactor could be envisioned
for a version 1.0 release of PEFT.

Related changes

While working on this, I noticed that LNTuning was completely buggy when
calling set_adapter. This is now fixed.

Moreover, since I had to touch update_layer everywhere, I ensured that
they all take kwargs for consistency.
2025-09-08 19:49:29 +02:00
42db980676 Add Arrow + GenKnowSub to LoRA (#2644)
This PR adds support for Arrow, a modular routing mechanism for LoRA experts introduced here, as well as the refinement method GenKnowSub, proposed in our ACL 2025 Main Conference paper. GenKnowSub enhances Arrow by subtracting a general-domain LoRA from task-specific ones prior to routing, leading to improved generalisation and modularity.
2025-09-08 14:21:37 +02:00
ed5c6eaa1a Replace from_legacy_cache method with constructors (#2767)
Replace Cache.from_legacy_cache method with init.
2025-09-08 13:49:25 +02:00
92e15573ac CHORE Upgrade trufflehog GitHub action to 3.90.5 (#2770)
Maybe solves the trufflehog false positive, maybe not.
2025-09-08 13:47:02 +02:00
5ef8e85d1f FIX X-LoRA forward hook issue during generate (#2761)
There was an issue that forward hooks would accumulate during
generation, since one hook per forward step was being registered and
generate would call forward multiple times. This is already undesirable,
but to make it worse, only the last hook was removed, resulting in hooks
accumulating.
2025-09-08 13:46:31 +02:00
c81363bd4e Support dataclass model configs (#2778)
LeRobot uses dataclasses to manage policy configs. If we want to
support LeRobot policy fine-tuning it'd be easiest to support
these configs in `get_model_config`.

While it is possible to fix this on LeRobot's side (add a to_dict implementation to the config classes) I think it'd be cleaner to support it on our side since the cost is relatively low and dataclasses are getting more popular anyway.

Thanks @xliu0105 for raising this issue and proposing a fix.
2025-09-08 13:35:47 +02:00
5d97453235 FIX Deprecated key_cache attribute on Cache pt 2 (#2753)
In #2737, we fixed some code that relied on the deprecated attribute but
some was being missed, as it only runs on the nightly CI with multiple
GPUs. This PR fixes this.

Note that the original transformers code that this solution was based on
no longer exists, as transformers now initializes the cache lazily, so
pre-allocating the keys and values to the correct device is not
necessary. But since prefix tuning inserts "virtual" keys/values, we
still have to ensure the correct device in PEFT.

I have tested the failing tests locally and they pass.
2025-09-04 14:47:29 +02:00
2ea5377ee3 TST FIX Failing AutoAWQ test with torch 2.8 (#2752)
There is a failing AWQ test since torch 2.6 which is marked as xfail for
torch=2.7. However, now torch 2.8 is out and the test is still failing.
Therefore, the xfail now checks for torch>=2.7.

As AWQ is no longer being maintained, we should expect this situation to
deteriorate over time and eventually we'll have to remove it. But for
the time being, it still appears to mostly work, so I suggest we leave
it as is.
2025-09-03 19:25:05 +02:00
de60e88b6b Fix missing code start in docs (#2768)
There was a minor typo which a suggestion of PR #2609 which broke code formatting for one code sample.

This is a simple fix for that.
2025-09-03 18:37:52 +02:00
293aea5df6 Support for Activated LoRA (#2609)
This PR migrates Activated LoRA (aLoRA) support from a standalone Github (see above) to PEFT itself.

Note there is also an active PR for vLLM inference support for Activated LoRA: vllm-project/vllm#19710 . There are also collections of aLoRA models on huggingface (in the ibm-granite org), note that these preexisting models run off of the standalone github repo and will be updated to work with this new PEFT feature if merged.

Description of changes: Activated LoRA is a modification of the LoRA architecture to "activate" the adapter weights only on tokens coming after a specified invocation_string. This fact makes it so that KV values for the string coming before the activation matches KV values for the base model. This allows KV cache for the input to be interchangeable between the base model and adapter model, and allows for major speedups in inference pipelines (e.g. agentic pipelines) that want to use both base models and adapter models. See the paper for detailed exploration of use cases and further elaboration.

Other notes:

The crux of the changes are really in layer.py. Everything else is simply managing the alora_offsets quantity which defines where the weights start to be activated. This is determined by scanning input strings for the invocation_string defined in the aLoraConfig.
    
I believe that aLoRA really only makes sense for CausalLMs, hence I've only implemented this for that model type.

Merging doesn't make sense for aLoRA adapters since the weights are not universally applied to all tokens.
    
I used the LoRA code as a starting point, but did not implement various seemingly extra features in that code.

As of now, invocation_string should probably start and end with special tokens, to avoid tokenizer issues at the boundary. Open to suggestions on how to make this more general if needed.

---------

Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>
2025-09-03 18:26:50 +02:00
a3197b1ec5 FIX: Multiple active adapters with auxiliary layers (#2758)
This PR fixes a few issues with the handling of active adapters for
auxiliary modules.

1. Calling set_adapter on the model.base_model

When calling peft_model.set_adapter, it is not possible to activate more
than one adapter, as not all PEFT methods support that. However, many
PEFT methods like LoRA do, in which case users should call
peft_model.base_model.set_adapter(['default', 'other']).

Now the issue was that the activation of auxiliary modules was only done
on PeftModel.set_adapter. This means that if users are calling
peft_model.base_model.set_adapter (i.e. LoraModel.set_adapter etc.), the
auxiliary adapters were not activated.

This PR fixes this issue by ensuring that even if the user activates
adapters like this, the auxiliary modules are activated. When users
activate more than one adapter, additional checks are performed to
ensure that they are not activating multiple auxiliary modules on the
same module.

Note that some existing PEFT code could start raising errors now because
of the change. However, this PEFT code is buggy right now so IMO it is
fine to raise an error.

2. Adding multiple adapters with non-overlapping auxiliary modules

Furthermore, I found an activation issue that could occur when adding
multiple adapters with non-overlapping auxiliary modules. Normally, when
the second/third/... adapter are being added, they are not automatically
activated. However, when these additional adapters target new auxiliary
modules, those would be incorrectly activated (because they look like
they're the first adapter). This has also been fixed.

Right now, we don't allow users to activate multiple auxiliary adapters
on the same module. However, this limitation could be considered too
strict:

For trainable tokens, as long as the indices don't overlap, there is no
conflict. For modules_to_save, we could theoretically determine the
"delta_weight" as new_weight - original_weight, then add up all
delta_weights. This is not implemented in the PR for now to prevent it
becoming even more complex.
2025-08-29 17:54:19 +02:00
e62aee44e3 feat(lokr, loha): add 1x1 Conv2d and Conv1d support (#2515)
This PR enhances the LoKr and LoHa adapter implementations within PEFT by adding proper support for:

 - 1x1 Convolutions (nn.Conv2d with kernel_size=(1,1))
 - nn.Conv1d layers (specifically including kernel_size=1).

This allows LoKr/LoHa adapters to be correctly applied to a wider range of modern architectures that heavily utilize these layer types (e.g., ResNet bottlenecks, MobileNet pointwise convolutions, various Transformer blocks). The implementation aims for optimized handling, inspired by LoRA's 1x1 optimization, while maintaining consistency with existing LyCORIS patterns in PEFT. Parts of the implementation logic, particularly for parameter factorization and layer adaptation, were adapted from the KohakuBlueleaf/LyCORIS library (e.g., lycoris/modules/loha.py), consistent with existing acknowledgements within the PEFT codebase.

This includes:

    New Conv1d adapter layer classes for both LoKr and LoHa, mirroring Conv2d.
    Updated layers_mapping in LoKrModel and LoHaModel to recognize Conv1d.
    Enhanced create_adapter_parameters methods in LoKr/LoHa to correctly initialize parameters based on Conv1d weight shapes.
    Refactored update_layer methods in LoKr/LoHa to:
        Detect Conv1d layers.
        Implement specific logic for 1x1 Conv2d and kernel_size=1 Conv1d, notably disabling use_effective_conv2d where appropriate for direct matrix handling.
        Ensure correct shape calculations for factorization.
    Added detection flags (is_1x1_conv2d, is_1_conv1d) in get_delta_weight methods as hooks for potential future computation optimizations (without altering current paths).
    Maintained backward compatibility; changes are additive and do not affect existing functionality for other layer types or kernel sizes.
    Followed established PEFT/LyCORIS coding patterns for consistency.


---------

Co-authored-by: Kabir Grewal <kabirgrewal@Kabirs-MacBook-Pro-5.local>
2025-08-27 13:07:05 +02:00
246fe4db7c DOC Update BOFT conceptual guide (#2744) 2025-08-26 11:23:27 +02:00
2d9b22f4c0 FIX: DynamicCache key_cache attribute deprecation (#2737)
Resolves failing CI with transformers source install.

The key_cache attribute on DynamicCache is deprecated and will be
removed in the 4.56.0 transformers release. Update the cache dict
in-place instead.
2025-08-26 10:37:12 +02:00
2a27f0e00c Bump version to 0.17.2.dev0 after release (#2748) 2025-08-21 17:58:14 +02:00
41c07f0445 FIX: DynamicCache max_cache_len attribute error (#2735)
Resolves current CI errors with prefix tuning.

Due to some recent changes in transformers (surfaced by
https://github.com/huggingface/transformers/pull/39797), checking
hasattr(cache, max_cache_len) results in an error. This PR fixes it.

Morever, that PR also changed the argument order to initialize
HybridCache (will probably also be reverted in transformers), which is
also taken into account in this PR by only using keyword arguments.

Finally, HybridCache will be deprecated and later removed, so move the
import inside a version guard.
2025-08-21 16:24:04 +02:00
ce5c2044f1 FEAT RoAd: 2D Rotary Adaptation (#2678)
Implements RoAd from https://arxiv.org/pdf/2409.00119

Supports mixed adapter batches.
2025-08-19 15:45:38 +02:00
b5ace6a8c4 CHORE: Clean up config kwargs in custom model tests (#2736)
Resolves #2695

For some PEFT methods, there was a bit of a mess when it comes to how
the init_weights argument was set in test_custom_models.py. The default
kwargs for the tests should be that the PEFT method is initialized as an
identity transform, and for specific tests we want to disable that. Note
that most PEFT methods are initialized by default to be identity
transforms, which is why the argument does not need to be set
explicitly, but it's not true for all PEFT methods.

With this PR, SHiRA, C3A, and FourierFT are now initialized to be
consistent with this. This made it possible to remove some extra
handling of those methods which was intermingled with certain tests.

Moreover, test_custom_models.py now uses the set_init_weights_false
helper function where appropriate.

While working on this, I also cleaned up a bit the docs for the
init_weights arguments of these PEFT methods where appropriate.

I added some clarifying comments.

For test_unload_adapter, I simplified a config type check and
rewrote it to load the base model only once.

---------

Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>
2025-08-19 11:55:25 +02:00
480929537f CI: Allow CI to pass even if MacOS tests error (#2715)
Also fix Zizmor complaint about wrong check.
2025-08-19 11:53:28 +02:00
04d41cbcd0 ENH Enable TP for LoRA linear layers (#2741)
Enables tensor parallelism.
2025-08-14 20:39:33 +02:00
eb1a25abfb CHORE: Upgrade ruff to ~0.12.8 (#2734)
Subjectively, there have been more issues recently with contributor PRs
being rejected by ruff. This could possibly be caused by them using a
different ruff version (presumably: more recent). This PR upgrades ruff
to the latest version to hopefully reduce these issues.

The only change needed to make this ruff version pass was to disable
UP045. This rule requires changing code like:

x: Optional[int]

into

x: int | None

in 220 places. Personally, I don't think it's crucial. Moreover, ruff
won't fix this automically, except with --unsafe-fixes (note that Python
3.9 needs a __future__ import for this, so that could be the reason). My
preference is thus just to disable the rule, but LMK if you disagree.
2025-08-14 18:03:38 +02:00
47961bb547 FIX Dataset download in docs and examples (#2708)
Co-authored-by: Camilo Leonel Amadio <camilo.amadio@microchip.com>
2025-08-12 20:00:06 +02:00
a2c6612b12 FIX Multiple issues with target_parameters (#2710)
There are a few issues with target_parameters that are fixed in this PR.

Existing parametrizations

When using target_parameters with LoRA, after the forward call finishes,
the LoRA parametrization is removed. However, this also used to remove
all other parametrizations on the same parameter, which is bad. With
this PR, only the LoRA parametrization is removed.

Module repr

This PR also extends the __repr__ of lora.ParamWrapper to contain the
parameter name, which makes it more useful.

Extend testing

Added a tiny gpt-oss model to the target_parameters test suite.

Multiple LoRA adapters with target_parameters

There is an issue when adding a second LoRA adapter with
target_paramters, where this second adapter would not actually be
applied correctly. The corresponding unit test was too lax to notice the
bug. This is not easy to fix, so for now we forbid adding a second
adapter with target_parameters. This is very strict but it's better than
having silent errors.

Although it was possible to fix that specific issue, the solution
resulted in ever deeply nested adapters (i.e. with multiple
.base_layer). This in turn results in those infixes to be part of the
state_dict. But then we cannot load the individual adapters correctly,
except if the model is restored in the exact same order as it was
previously created. This is not normally a requirement in PEFT (e.g. I
can create a model with two adapters and later decide to load only one
of them).

In the long run, we need to think about solutions that would allow this.
It may require some form of normalization of the layers to prevent ever
deeper nesting. Also, what is ugly right now is that, given that the
LoRA lives on a module but actually targets one of possibly multiple
parameter, the LoRA weights don't actually reference said parameter in
any name. That means, purely from the state_dict, it is unclear which
parameter a LoRA weight belongs to. Ideally, this should be encoded in
the LoRA weight key.
2025-08-12 13:59:29 +02:00
95df499d87 ENH Support XPU in text gen benchmark (#2730)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-12 11:08:43 +02:00
06b54d8a0d ENH Support XPU for SFT training script (#2709)
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-08-11 14:35:05 +02:00
a90003f0ed ENH Make BufferDict repr accelerator agnostic (#2731)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-08 12:07:46 +02:00
9b420cc9c7 ENH Support XPU for seq clf examples (#2732)
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-08-08 12:07:20 +02:00
a4b41e7924 ENH Support XPU in train_memory.py script (#2729)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-08 12:06:46 +02:00
e98a59ec2d DOC Make docs more device agnostic (e.g. XPU) (#2728)
Also adjusted some more examples.

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-08 12:06:22 +02:00
7f7463548a ENH Update token clf/NER examples, support XPU (#2727)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-08 12:05:38 +02:00
a72bbaabf7 ENH Support XPU for SD dreambooth example (#2726)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-08 12:05:05 +02:00
766a9776bb ENH Update bnb 8bit examples, support XPU (#2723)
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-08-08 12:04:29 +02:00
a475f56c81 Updated MetaMathQA results (#2686)
- Updated results for OFT, C3A and shira
- New results for trainable tokens (for completeness)

Trainable tokens wasn't tuned a lot, we could probably search for better
tokens and increase the learning rate. We can do this later.
2025-08-07 14:57:50 +02:00
ee4a2b86be FIX: Warn when using LoRA bias w/o base layer bias (#2725)
When setting lora_bias=True, a bias term is added to lora_B (#2237).
However, to merge this LoRA adapter, we need the base layer to also have
a bias. This has not been checked so far.

With this PR, we will now warn the user when we detect this situation.
Thus they can decide if they want to continue with this setting or not.
If they don't intend to merge, they're fine.

On top of this, when trying to merge in this situation, we now raise an
appropriate error that clearly explains why merging failed.

About PeftWarning

This PR adds a new warning class, PeftWarning. This makes it easier for
users to add PEFT specific warning filters (say, to ignore them or to
raise an error).

There are many more warnings in PEFT that could be migrated to this new
warning class (or a subclass where appropriate). This is outside the
scope of this PR.

Alternatives

1. We considered raising an error instead of warning when encountering
said situation. Many users miss warnings, so an error would be a
stronger signal. This would, however, be too harsh, as it could break
existing user code that is working perfectly fine.

2. We considered adding a bias term to the base layer when it is missing
during the merge. However, this requires careful bookkeeping (e.g. when
unmerging all adapters, the bias needs to be removed again). Moreover,
when calling merge_and_unload, users expect the original model
architecture to be returned. Suddenly adding a bias term would be
unexpected and could lead to errors down the line.
2025-08-07 14:50:13 +02:00
8876664cfe CI: Fix Windows error for low CPU mem usage tests (#2724)
Add tolerances (still quite strict)
2025-08-07 14:49:40 +02:00
6673609479 ENH Support XPU for image clf example (#2722)
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-08-07 11:33:33 +02:00
52cc71df9f ENH Support XPU for semantic-segmentation example (#2721)
Also fixing a few issues in the example.

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-07 11:32:26 +02:00
78bf27dd42 ENH Support XPU for RandLoRA example (#2720)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-07 11:31:42 +02:00
5ef4362e12 ENH Support XPU for QALoRA example (#2719)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-07 11:30:49 +02:00
a7781aa5e0 ENH Support XPU for OFT dreambooth example (#2718)
Also fixing a couple of issues like wrong argument name.

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-07 11:30:22 +02:00
VED
ec5a1c67b0 FEAT Text generation benchmark (#2525)
Similar to #2395, this benchmark serves to compare different PEFT
methods on an equal basis. This time, the goal is to measure metrics
related to text generation, most notably speed and memory usage. The
results should be easy to reproduce and compare.

The actual experimental settings and results have yet to be added.
2025-08-07 10:17:32 +02:00
d7194f869a ENH Support XPU bnb 4bit example (#2714)
---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-08-06 16:28:56 +02:00
154ef37561 ENH Support XPU for causal LM examples (#2680)
---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-08-06 16:27:57 +02:00
6a33744cc2 ENH Support XPU for HRA dreambooth example (#2717)
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-08-06 16:27:26 +02:00
db5c00fad2 FIX Poly issue with returned base model (#2702)
Also, add XPU support for Poly example.

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-06 12:16:49 +02:00
e3d8fc98f1 ENH XPU support for conditional generation examples (#2684)
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-08-06 12:15:28 +02:00
6d531c77a4 FIX Issue with XPU for face alignment example (#2713)
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-08-06 12:14:30 +02:00
2d49c6798d ENH Support XPU for MLP LoRA example (#2712)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-06 12:14:03 +02:00
d6ed90e8e2 ENH Support XPU for multi_adapter examples (#2711)
---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-06 12:13:31 +02:00
e0b2ca7977 Bump version to 0.17.1.dev0 after release (#2707) 2025-08-05 13:05:21 +02:00
44f001c695 Use hub_online_once in trainable token tests (#2701)
Also fix a minor import nit where `TrainableTokensWrapper` was not
added to `utils/__init__.py`. Fixed the corresponding imports as well.

Another housekeeping job is to move hub_online_once to testing_utils.py since it has 
grown to be used in a lot of places and testing_utils.py is the better place to keep 
such utilities.
2025-08-05 12:58:55 +02:00
ff12d13be6 FIX Bug in semantic search example (#2706)
Also updated requirements.

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-08-05 11:49:00 +02:00
2518ceeb15 FIX Deprecations in MiSS example (#2704)
Also, was validated on XPU.

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-05 11:46:28 +02:00
ec7dee024f FIX Small issue in PISSA example (#2703)
Also validated it with XPU.

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-05 11:45:34 +02:00
86feb8c4f9 ENH Support XPU for CPT, EVA, GPU offload (#2694)
---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-05 11:43:53 +02:00
daee6367aa ENH Support XPU for CorDA example (#2687)
---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-05 11:41:42 +02:00
207b27ec2c ENH Support XPU for LoRA-FA example (#2697)
---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-05 11:38:44 +02:00
68265a1583 ENH XPU support for training dreambooth (#2696)
---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-04 11:42:45 +02:00
be8f824d93 ENH XPU support for dna_language_model example (#2689)
---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-08-04 11:32:25 +02:00
951e720081 ENH XPU support for boft_dreambooth example (#2679)
---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-04 11:17:10 +02:00
49b29c1d1a ENH XPU support for boft/controlnet example (#2674)
---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-08-04 11:15:36 +02:00
48f6493f94 Release 0.17.0 (#2691)
- Bump versions
- Fix a few TODO comments
- A bit of clean up in test_target_paramters.py
2025-08-01 18:44:24 +02:00
337be05f03 ENH: Adapter injection based on state_dict (#2637)
Make it possible to inject the PEFT adapters based on a state_dict
instead of the PEFT config.

See https://github.com/huggingface/diffusers/issues/11874 for context.

Description

Right now, when creating a PEFT adapter like LoRA, the adapter layers
are injected based on the PEFT config, most notably the entries in
`target_modules`, but other arguments also play into this. Generally,
this is a good approach, but it breaks down in some situations. For
instance, in diffusers, we often have the situation that the checkpoint
was created without PEFT/diffusers, thus there is no PEFT config, only
the `state_dict`. To load these checkpoints in diffusers, the current
approach is to reverse-engineer a valid PEFT config based on the keys in
the `state_dict`.

Unfortunately, this is error prone. Moreover, not every combination of
`state_dict` keys can be easily expressed in a PEFT config through a
combination of `target_modules`, `exclude_modules`, etc. Yes, in theory
everything can be expressed by passing `target_module=<regex_pattern>`,
but reverse-engineering such a regex correctly and efficiently is very
hard (and thus currently not done).

This PR implements a completely different approach to inject adapters.
Instead of relying on the PEFT config to determine which layers to
target, it takes the `state_dict` directly as the source of truth. This
should allow to exactly match what is desired.

Implementation details

I took care to implement this change in a way that if no `state_dict` is
passed, the exact same code path as previously is taken. The risk of
breaking anything should thus be minimized.

Technically, it is not necessary to pass the `state_dict`, we are only
interested in the keys. I still called the argument `state_dict`, since
that is typically what we have at this point, but this can be easily
changed.

I thought it might be a good idea, if the `state_dict` is used, to still
check what modules would have been targeted if we had used the PEFT
config. Then, the results are compared and a warning is given if they
differ. This allows the user to see if the PEFT config is not correctly
specified. While running some diffusers tests, I never encountered this
warning, which is good. However, if we plan, for instance, to get rid of
all the reverse engineering of the PEFT config in diffusers, it would
make more sense to not give this warning.

Caveats

When the original LoRA model was using `target_parameters`, injecting
from `state_dict` will not work correctly. The problem is that the
`state_dict` looks the same, whether the module or a parameter was
targeted. Therefore, we cannot correctly determine the user's intent.

For now, what I decided to do is:

1. Always assume that `target_modules` is meant, as it's the far more
   common occurrence.
2. When we detect `target_parameters` while using `state_dict` for
   injection, we raise an error.
3. If we don't detect this, injection might just slip through, resulting
   in modules being targeted (if they are valid modules) instead of
   parameters.
4. Document that these two features don't work together.

I think overall, this is not too concerning, as both features are rather
niche and thus unlikely to be used in conjunction.

Related changes

While working on this PR, I made a couple of related, though not
strictly necessary, changes:

- Refactor tests in `test_low_level_api.py` to use pytest instead of
  unittest
- Add default target modules for LoHa and LoKr (just copying LoRA)
- Most PEFT method's model classes like `LoraModel` had an `__init__`
  that effectively just called `super()` with the same arguments. I
  removed these `__init__` methods.
2025-08-01 18:39:53 +02:00
J.L
bb4fb50e2b FEAT Add MiSS as a replacement for Bone. (#2604)
Add MiSS, an evolution of Bone, from https://arxiv.org/abs/2409.15371.

MiSS will replace Bone, which is now deprecated. A script to convert Bone
checkpoints to MiSS checkpoints is included.
2025-08-01 18:37:20 +02:00
a91ec33fc5 Fix not detecting regex-targeted embedding layer (#2649)
This issue was found in PR #2638 and is defined thusly:

> When calling `get_peft_model_state_dict(..., save_embedding_layers="auto")` we check if the
> embedding layer is targetted to determine if the embedding layers need saving. This is not
> done when `PeftConfig.target_modules` is a regex-string, potentially missing to save embeddings.

This is fixed by adding a check similar to the existing query of whether `EMBEDDING_LAYER_NAMES` is
a subset of the defined target modules, only that the regex matching from `BaseTuner.inject_adapter`
is used. To avoid code duplication, the matching was moved to its own utility function
`match_target_against_key`.

The main complication was to define the test-cases as it was non-trivial to find what the meaning
of `save_embedding_layers="auto"` entails. I've assembled a list of cases that I think are correct
in the corresponding unit test.
2025-07-31 16:08:32 +02:00
25e5c6b25c FIX Missing device map for facebook/opt-125m (#2675)
Fixes the failing EETQ test in the nighly multi device CI.

In #2612, fixed device_maps were added for multi-GPU training as we
could not rely on device_map="auto". While doing this change, one
device_map was missing, namely for facebook/opt-125m, which is used in
the EETQ multi device test. This device_map was now added. This makes
the test pass locally.
2025-07-30 20:02:22 +02:00
5e00266e85 TST: Add more HF Hub model caching (#2682)
A bunch of tests in test_tuners_utils.py didn't use the decorator so
far, which is now fixed. This should hopefully help reduce timeouts.

Moreover, the iris dataset loading is now moved to a module-scoped
fixture (before, it was just loaded on module level). This doesn't help
with caching, but it prevents loading of this dataset when the
corresponding tests are not even run.
2025-07-30 20:02:07 +02:00
46ae69ac29 FIX Small fixes to target_parameters (#2677)
1. Better error message when same layer targeted twice
2. Remove unused attribute num_experts from _LoraParameterProxy
2025-07-30 14:34:04 +02:00
1c853eaaad Fix trainable tokens with fsdp (#2681)
When using FSDP with trainable tokens, there was an error when
retrieving the state_dict of the TrainableTokensWrapper. The reason is
that for the state_dict that is passed to get_peft_model_state_dict, the
FSDP wrapper was already unwrapped, which means the keys don't have the
FSDP-specific prefix. However, in the PEFT code, when looking up keys
from said state_dict, the prefix was not removed. Now it is removed,
making the lookup succeed. The same logic applies to
set_peft_model_state_dict.

I could successfully start training with FSDP and trainable tokens
locally by adjusting the examples/sft script to include trainable
tokens. Checkpoints could be successfully created and resumed from. The
only change I needed to make was to configure use_orig_params=True for
FSDP.
2025-07-30 14:33:53 +02:00
c11a9dfeaa FIX Failing target_parameters param usage count (#2676)
For testing target_parameters, we use a tiny Llama4 model. This model
was refactored in
https://github.com/huggingface/transformers/pull/39501, resulting in one
parameter being accessed an additional time:

https://github.com/huggingface/transformers/pull/39501/files#diff-e668ec07f78afdb2cb805d939e47453757f0b9437436cb860fcb7cb2431c9cf5R69

Therefore, a unit test that relied on how often this parameter was
accessed started failing. This PR updates the count to the correct
number.

Additionally debug print statements that were accidentally left over are
now removed.
2025-07-30 12:29:51 +02:00
92d65cafa5 Update extending vocab docs (#2669)
- Recommends trainable tokens as first measure
- Clarifies a few things about saving embeddings
- Adds full-finetuning as an option of last resort

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-07-25 13:09:00 +02:00
434651346c ENH: Targeting multiple parameters on the same module (#2665)
When the target_parameters feature for LoRA was introduced in #2638,
there was one gap, namely the possibility to target multiple
nn.Parameters on the same module (there was only a workaround involving
multiple adapters, but that is not user friendly). With this PR, it is
now possible to achieve this.

The mechanism to enable this is a bit crude, namely allowing to nest
multiple ParamWrappers. This should generally be fine as long as there
are only a couple of nn.Parameters being targeted on the same module.
When there are dozens or hundreds, this approach could load to slow
downs or other issues.

A side effect of this implementation is that the ParamWrapper, when it
removes the parametrization, now only removes its own parametrization.
When using nn.utils.parametrize.remove_parametrization, it removes all
parametrizations, which is bad when we have nested parametrizations.

Alternative approaches

Some alternative approaches were discussed internally but the chosen one
was considered most practical.

Allow to have more than one adapted parameter per LoRA layer. This would
require to have nested dicts for the LoRA parameters, something like
self.lora_A[adapter_name][parameter_name]. We don't have this anywhere
so far and it would probably break implicit assumptions about PEFT
layers in many places (like, parsing of state_dict keys), requiring many
adjustments. Have an auxiliary module that contains the individual LoRA
layers that target the individual parameters. This could be the cleanest
solution and would probably be more efficient if there are a huge number
of targeted parameters per module. However, this also brings extra
complexity, as it requires implementing the logic of how to route the
information to the right parameter, and it may be a solution to a
problem that is irrelevant in practice (large number of targets per
module).
2025-07-24 19:42:19 +02:00
43845f9b14 Method Comparison: Improve formatting/layout of table (#2670)
* Method Comparison: Improve formatting/layout of table

Quick improvement to reduce the dominance of columns like `{peft,train}_config` and make
numbers a bit more readable through proper decimal/thousands formatting.

* Bump gradio version to accomodate required fixes
2025-07-24 19:02:09 +02:00
663b1209fd ENH Llama-Adapters support for GPT2 (#2643)
aka "adaption prompt"
2025-07-24 14:51:16 +02:00
04a5ed7b2f DOC Fix error in code example (#2666) 2025-07-24 12:13:41 +02:00
a795199ffa Update tokenizer parameter in sfttrainer across multiple examples (#2664)
* REFAC Update tokenizer parameter to processing_class in SFTTrainer instances across multiple examples

* REFAC Replace tokenizer parameter with processing_class in Trainer instances across documentation and examples

* Refactor tokenizer parameter to processing_class in various examples

- Updated the Trainer initialization in corda_finetuning.py to use processing_class instead of tokenizer.
- Changed the execution_count to null in image_classification_peft_lora.ipynb.
- Modified the tokenizer parameter to processing_class in image_classification_peft_lora.ipynb.
- Adjusted the tokenizer parameter to processing_class in peft_bnb_whisper_large_v2_training.ipynb.
- Updated the README.md in lorafa_finetune to reflect the change from tokenizer to processing_class in Trainer initialization.

* REFAC Update tokenizer parameter to processing_class in Seq2SeqTrainer instantiation

* REFAC Replace tokenizer parameter with processing_class in README and notebook examples
2025-07-23 15:30:28 +02:00
f650b08abb make method comparison device agnostic, so it can expand to more accelerators like XPU (#2610)
make method comparision device agnostic, so it can expand to more
accelerators like XPU

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-07-22 15:25:56 +02:00
e77924563a FIX Prefix tuning after transformers PR 38635 (#2662)
Due to https://github.com/huggingface/transformers/pull/38635, several
tests involving prefix tuning broke:

https://github.com/huggingface/peft/actions/runs/16417140904/job/46385751329

This PR fixes this by resoling two issues:

1. The _supports_cache_class attribute was removed, we can now assume
that it is True if the attribute does not exist.

2. We had special handling of past_key_values for GPTBigCodeForCausalLM
which is no longer required (nor valid) after that PR, so it is removed
depending on the transformers version.
2025-07-22 13:59:34 +02:00
fa85d10a7f Update README.md (#2659)
Update bibtex entry.
2025-07-21 14:36:02 +02:00
f3b97c3704 FEAT Allow LoRA to target nn.Parameter (#2638)
Normally, nn.Parameter cannot be targeted with LoRA adapters. This can
be problematic, e.g. when there are MoE layers that use nn.Parameter
directly, or when there is nn.Linear but the weight is passed directly
instead of calling forward (e.g. MHA).

It would be possible to craft a solution involving a special LoRA layer
for each of the modules that use nn.Parameter directly (e.g. lora.MHA)
but that doesn't scale. This PR is implements a direct way to target
nn.Parameter making use of torch.nn.utils.parametrize.

Using the feature requires passing target_parameters to the LoraConfig.
During the forward pass, when the parameter is acceessed, the LoRA
weights are added to the weights while still ensuring that gradients
flow correctly to the LoRA weights.

Right now, only LoRA supports this feature. Moreover, it is not possible
to target multiple parameters of the same module with the same adapter.
A workaround is to use multiple adapters (i.e. with different names).

---------

Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>
2025-07-15 16:18:46 +02:00
22506a8e42 FIX Deploy method comp app: error in workflow file (#2645)
Fixing the error:

permissions:
  contents: {}
 Check failure on line 11 in .github/workflows/deploy_method_comparison_app.yml

GitHub Actions
/ Deploy "method_comparison" Gradio to Spaces
Invalid workflow file

The workflow is not valid.
.github/workflows/deploy_method_comparison_app.yml (Line: 11, Col: 13):
A mapping was not expected
2025-07-14 14:48:06 +02:00
1c75d96aca FIX: Prompt learning methods modules_to_save issue (#2646)
When using prompt learning methods, modules_to_save was not correctly
set automatically. This is really bad when using, for instance, sequence
classification tasks, which require the classifier layer to be added to
modules_to_save.

The issue was introduced in #2220 where it is wrongly assumed that the
PEFT config always has a modules_to_save attribute, which is not true
for prompt learning. In #2481, this was partly fixed by using getattr to
avoid an error. However, this did not resolve the fundamental issue that
for prompt learning, there is no such attribute, resulting in
module_to_save not being applied.

This PR proposes to fix this by adding modules_to_save to the prompt
learning configs.
2025-07-14 13:57:33 +02:00
a4f9334f12 FEAT Add SHiRA Adapters (#2584)
Implements: Sparse High Rank Adapters

Paper: https://arxiv.org/abs/2406.13175
2025-07-14 11:16:10 +02:00
35000fda88 Fix #2634: Allow peft_method to be a string (#2635)
The auto-tagging code assumed that every `PeftConfig.peft_type` value is an Enum value but
when adding custom types without modifying the enum it is possible to have strings as well
(and the interface supports that).

This change allows for string values of `PeftConfig.peft_type` in the auto-tagging code.
2025-07-08 11:13:06 +02:00
0755ab93f6 FIX Faulty OFT parameter device test (#2630)
There is an error in an OFT test because .cpu() is called on a parameter
instead of a module. Calling it on parameter is not an in-place
operation, so it has no effect.
2025-07-07 15:57:06 +02:00
fa9e429e93 FIX Correctly skip AWQ test based on torch version (#2631)
There is currently an issue with a multi-GPU test using AutoAWQ. Thus,
PR #2529 introduced an unconditional skip for this test. In #2596, a
condition was added to only skip with torch 2.7, as other torch versions
are not affected. However, the is_torch_version function does not
actually match minor and patch versions, so

is_torch_version("==", "2.7")

returns False when using version 2.7.1.

This PR fixes that by checking both "2.7.0" and "2.7.1" explicitly. This
is not very robust in case that there are further patch releases of
PyTorch. However, that is unlikely, and introducing a more general
solution is IMO not worth it just for this instance.
2025-07-07 15:55:37 +02:00
d76f3fe98c FIX Create mask function signature change (#2633)
We use create_mask_for_generate from transformers. It was introduced in
v4.53.0 but in v4.53.1, the function signature was changed to include
position_ids as mandatory argument:

https://github.com/huggingface/transformers/pull/39194

This breaks our function call in PEFT. This PR fixes the function call
by passing position_ids. This in turn would break the function call with
transformers v4.53.0, thus a strict version check is being used for >=
v4.53.1.
2025-07-07 11:46:57 +02:00
b960d259e8 ENH Enable FSDP example for GPTQ quantized model (#2626)
Besides fixes, includes an example script that uses
`hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4`

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-07-07 11:08:03 +02:00
9f01809e70 FEAT: Add GH action to deploy method comparison app (#2625)
* FEAT Add GH action to deploy method comparison app

* Add to git credentials

* Different approach

* More fixes

* Fix for requirements

* Another approach

* Bah

* Change trigger to changes in method_comparison/

Manual trigger still possible

* Update method_comparison/README.md

* Satisfy Zizmor
2025-07-04 14:46:59 +02:00
4ad953aefb Bump version to 0.16.1.dev0 after release (#2632) 2025-07-04 14:46:48 +02:00
307 changed files with 22736 additions and 13749 deletions

View File

@ -24,7 +24,7 @@ jobs:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0
- name: Check out code
uses: actions/checkout@v3
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
persist-credentials: false
- name: Login to DockerHub
@ -57,7 +57,7 @@ jobs:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0
- name: Check out code
uses: actions/checkout@v3
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
persist-credentials: false
- name: Login to DockerHub
@ -90,7 +90,7 @@ jobs:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0
- name: Check out code
uses: actions/checkout@v3
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
persist-credentials: false
- name: Login to DockerHub
@ -123,7 +123,7 @@ jobs:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0
- name: Check out code
uses: actions/checkout@v3
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
persist-credentials: false
- name: Login to DockerHub

View File

@ -11,7 +11,7 @@ permissions: {}
jobs:
build:
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@ba4b74d11c46d884a4cf6497687c090f55f027d9 # main from 2025-09-05
with:
commit_sha: ${{ github.sha }}
package: peft

View File

@ -11,7 +11,7 @@ permissions: {}
jobs:
build:
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@ba4b74d11c46d884a4cf6497687c090f55f027d9 # main from 2025-09-05
with:
commit_sha: ${{ github.event.pull_request.head.sha }}
pr_number: ${{ github.event.number }}

View File

@ -0,0 +1,41 @@
name: Deploy "method_comparison" Gradio to Spaces
on:
push:
branches: [ main ]
paths:
- "method_comparison/**"
workflow_dispatch:
permissions: {}
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
fetch-depth: 0 # full history needed for subtree
persist-credentials: false
- name: Authenticate via ~/.netrc
env:
HF_TOKEN: ${{ secrets.PEFT_INTERNAL_REPO_READ_WRITE }}
run: |
# netrc needs BOTH login and password entries
printf "machine huggingface.co\nlogin hf\npassword ${HF_TOKEN}\n" >> ~/.netrc
chmod 600 ~/.netrc
- name: Deploy method_comparison app to HF Spaces
run: |
cd method_comparison
git init
# Spaces expect requirements.txt
mv requirements-app.txt requirements.txt
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git remote add gradio-app https://huggingface.co/spaces/peft-internal-testing/PEFT-method-comparison
git add .
git commit -m "🚀 Deploy method comparison app from GH action"
git push -f gradio-app HEAD:main

View File

@ -17,13 +17,13 @@ jobs:
transformers-version: ['main', 'latest']
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
ref: ${{ github.event.inputs.branch }}
repository: ${{ github.event.pull_request.head.repo.full_name }}
persist-credentials: false
- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c # v6.0.0
with:
python-version: "3.10"
cache: "pip"
@ -54,13 +54,13 @@ jobs:
diffusers-version: ['main']
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
ref: ${{ github.event.inputs.branch }}
repository: ${{ github.event.pull_request.head.repo.full_name }}
persist-credentials: false
- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c # v6.0.0
with:
python-version: "3.10"
cache: "pip"

View File

@ -33,7 +33,7 @@ jobs:
run:
shell: bash
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
persist-credentials: false
- name: Pip install
@ -158,7 +158,7 @@ jobs:
run:
shell: bash
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
persist-credentials: false
- name: Pip install

View File

@ -30,7 +30,7 @@ jobs:
run:
shell: bash
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
persist-credentials: false
- name: Pip install
@ -80,7 +80,7 @@ jobs:
run:
shell: bash
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
persist-credentials: false
- name: Pip install

View File

@ -17,12 +17,12 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
persist-credentials: false
- name: Setup Python
uses: actions/setup-python@v4
uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c # v6.0.0
with:
python-version: 3.11

View File

@ -16,7 +16,7 @@ jobs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- name: Check out code
uses: actions/checkout@v3
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
persist-credentials: false
- name: Get changed files
@ -36,7 +36,7 @@ jobs:
needs: get_changed_files
name: Build Docker images on modified files
runs-on: ubuntu-latest
if: ${{ needs.get_changed_files.outputs.matrix }} != ''
if: ${{ needs.get_changed_files.outputs.matrix != '[]' }}
strategy:
fail-fast: false
matrix:
@ -55,7 +55,7 @@ jobs:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0
- name: Check out code
uses: actions/checkout@v3
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
persist-credentials: false
- name: Build Docker image

View File

@ -15,11 +15,11 @@ jobs:
tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
persist-credentials: false
- name: Set up Python 3.11
uses: actions/setup-python@v4
uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c # v6.0.0
with:
python-version: 3.11
cache: "pip"

View File

@ -19,11 +19,11 @@ jobs:
check_code_quality:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
persist-credentials: false
- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c # v6.0.0
with:
python-version: "3.11"
cache: "pip"
@ -39,18 +39,17 @@ jobs:
tests:
needs: check_code_quality
strategy:
# TODO: remove 'fail-fast' line once timeout issue from the Hub is solved
fail-fast: false
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]
os: ["ubuntu-latest", "macos-13", "windows-latest"]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
persist-credentials: false
- name: Model cache
uses: actions/cache/restore@v4
uses: actions/cache/restore@0400d5f644dc74513175e3cd8d07132dd4860809 # v4.2.4
with:
# Avoid caching HF_HOME/modules and Python cache files to prevent interoperability
# issues and potential cache poisioning. We also avoid lock files to prevent runs
@ -69,7 +68,7 @@ jobs:
[ -f "$(which shasum)" ] && SHASUM=shasum
find "${{ env.HF_HOME }}/hub" -type f -exec "$SHASUM" {} \; > cache_content_initial || true
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c # v6.0.0
with:
python-version: ${{ matrix.python-version }}
cache: "pip"
@ -87,8 +86,27 @@ jobs:
run: |
pip install --force-reinstall -U "numpy<2.0.0"
- name: Test with pytest
# MacOS tests are currently too flaky and will fail almost each time. Thus, continue (green checkmark) even if
# they fail, but add a notice so that the failure is not completely silent
continue-on-error: ${{ matrix.os == 'macos-13' }}
shell: bash
run: |
set +e
make test
status=$?
# Post a notice only if this is macOS AND tests failed
if [ "$status" -ne 0 ] && [ "${{ matrix.os }}" = "macos-13" ]; then
{
echo "## ⚠️ macOS tests failed"
echo ""
echo "- OS: ${{ matrix.os }}"
echo "- Python: ${{ matrix.python-version }}"
echo ""
echo "Check the logs from this step for details."
} >> "$GITHUB_STEP_SUMMARY"
fi
# Return the real status. On macOS this won't fail the job because of continue-on-error.
exit $status
- name: Dump cache content and diff
# This is just debug info so that we can monitor if the model cache diverges substantially
# over time and what the diverging model is.
@ -104,7 +122,7 @@ jobs:
# make sure that cache cleaning doesn't break the pipeline
python scripts/ci_clean_cache.py -d || true
- name: Update model cache
uses: actions/cache/save@v4
uses: actions/cache/save@0400d5f644dc74513175e3cd8d07132dd4860809 # v4.2.4
# Only let one runner (preferably the one that covers most tests) update the model cache
# after *every* run. This way we make sure that our cache is never outdated and we don't
# have to keep track of hashes.

View File

@ -35,7 +35,7 @@ jobs:
run:
shell: bash
steps:
- uses: actions/checkout@v4
- uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
ref: ${{ github.event.inputs.branch }}
repository: ${{ github.event.pull_request.head.repo.full_name }}

View File

@ -10,9 +10,9 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
fetch-depth: 0
persist-credentials: false
- name: Secret Scanning
uses: trufflesecurity/trufflehog@d722a7e50645c42123e31fe97761a88ade988db8 # v3.88.25
uses: trufflesecurity/trufflehog@0f58ae7c5036094a1e3e750d18772af92821b503 # v3.90.5

View File

@ -10,7 +10,7 @@ permissions: {}
jobs:
build:
uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@main
uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@ba4b74d11c46d884a4cf6497687c090f55f027d9 # main from 2025-09-05
with:
package_name: peft
secrets:

View File

@ -19,7 +19,7 @@ jobs:
security-events: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
with:
persist-credentials: false
- name: Install zizmor

View File

@ -1,6 +1,6 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.9.2
rev: v0.12.8
hooks:
- id: ruff
args:

View File

@ -42,7 +42,7 @@ Prepare a model for training with a PEFT method such as LoRA by wrapping the bas
from transformers import AutoModelForCausalLM
from peft import LoraConfig, TaskType, get_peft_model
device = "cuda"
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
model_id = "Qwen/Qwen2.5-3B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device)
peft_config = LoraConfig(
@ -65,7 +65,7 @@ To load a PEFT model for inference:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
device = "cuda"
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
model_id = "Qwen/Qwen2.5-3B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device)
@ -181,7 +181,7 @@ To use 🤗 PEFT in your publication, please cite it by using the following BibT
```bibtex
@Misc{peft,
title = {PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods},
title = {{PEFT}: State-of-the-art Parameter-Efficient Fine-Tuning methods},
author = {Sourab Mangrulkar and Sylvain Gugger and Lysandre Debut and Younes Belkada and Sayak Paul and Benjamin Bossan},
howpublished = {\url{https://github.com/huggingface/peft}},
year = {2022}

View File

@ -126,8 +126,14 @@
title: Trainable Tokens
- local: package_reference/randlora
title: RandLora
- local: package_reference/shira
title: SHiRA
- local: package_reference/c3a
title: C3A
- local: package_reference/miss
title: MiSS
- local: package_reference/road
title: RoAd
title: Adapters
- sections:
@ -137,5 +143,7 @@
title: Helpers
- local: package_reference/hotswap
title: Hotswapping adapters
- local: package_reference/functional
title: Functions for PEFT integration
title: Utilities
title: API reference

View File

@ -134,7 +134,7 @@ The first thing to know is that the script uses DeepSpeed for distributed traini
# trainer
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
processing_class=tokenizer,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,

View File

@ -114,7 +114,7 @@ The first thing to know is that the script uses FSDP for distributed training as
# trainer
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
processing_class=tokenizer,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,

View File

@ -85,9 +85,11 @@ OFT preserves the hyperspherical energy by learning an orthogonal transformation
## Orthogonal Butterfly (BOFT)
[BOFT](https://hf.co/papers/2311.06243) is a method that primarily focuses on preserving a pretrained model's generative performance in the finetuned model. It tries to maintain the same cosine similarity (hyperspherical energy) between all pairwise neurons in a layer because this better captures the semantic information among neurons. This means OFT is more capable at preserving the subject and it is better for controllable generation (similar to [ControlNet](https://huggingface.co/docs/diffusers/using-diffusers/controlnet)).
[BOFT](https://hf.co/papers/2311.06243) is an improved orthogonal finetuning method that focuses on preserving a pretrained model's generative capabilities while being significantly more parameter-efficient than standard OFT. Like OFT, BOFT maintains the same cosine similarity (hyperspherical energy) between all pairwise neurons in a layer by applying an orthogonal transformation to the pretrained weight matrix, ensuring the semantic relationships among neurons are preserved.
OFT preserves the hyperspherical energy by learning an orthogonal transformation for neurons to keep the cosine similarity between them unchanged. In practice, this means taking the matrix product of an orthogonal matrix with the pretrained weight matrix. However, to be parameter-efficient, the orthogonal matrix is represented as a block-diagonal matrix with rank `r` blocks. Whereas LoRA reduces the number of trainable parameters with low-rank structures, OFT reduces the number of trainable parameters with a sparse block-diagonal matrix structure.
Instead of using a block-diagonal orthogonal matrix, BOFT factorizes the orthogonal transformation into a product of **sparse butterfly matrices** (originally introduced in the [CooleyTukey FFT](https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm)). Unlike OFT's block-diagonal rotations, which only mix inputs within each block, the butterfly structure guarantees that every input can influence every output, producing a **dense connectivity** with just `O(d log d)` parameters. This factorization preserves expressivity while drastically reducing the parameter count compared to OFT (at the expense of computation time).
In practice, BOFT multiplies each pretrained weight matrix by a sequence of butterfly-structured orthogonal factors, enabling efficient and expressive neuron rotations. This makes BOFT well-suited for controllable generation and tasks where maintaining the pretrained model's subject representation is critical, while also scaling to larger models with lower memory and compute overhead.
## Adaptive Low-Rank Adaptation (AdaLoRA)
@ -122,12 +124,16 @@ HRA constructs a chain of `r` trainable Householder reflections (HRs). Because t
The higher `r`, the more trainable parameters, resulting in a larger model capacity and better performance. Besides, due to the chain structure, the orthogonality of HR planes impacts the capacity and regularity of HRA. To achieve a trade-off between the model capacity and regularity, an orthogonality regularizer of the HR planes is added to the loss function. The weight \\(\lambda\\) can control the strength of the regularizer.
## Bone
[DiSHA](https://huggingface.co/papers/2409.15371) A novel PEFT technique distinct from LoRA, called Dimension-Sharding Adaptation (DiSHA). By dividing the original weights into multiple subspaces that share a single matrix for weight updates, DiSHA simplifies the process by requiring the trainable matrix to be initialized to zero, eliminating the need for complex initialization as in some LoRA variants. Bone and Bat are derivative structures of DiSHA. Bone significantly improves computational efficiency while saving memory, whereas Bat addresses the limitation of Bone's linear update by employing a non-linear update to break through the upper bound.
[MiSS](https://huggingface.co/papers/2409.15371) New version of paper(MiSS: Balancing LoRA Performance and Efficiency with Simple Shard Sharing)
If you already have a Bone checkpoint, you can use `/scripts/convert-bone-to-miss.py` to convert it into a MiSS checkpoint and proceed with training using MiSS.
<small><a href="https://huggingface.co/papers/2409.15371">DiSHA: Dimension-Sharding Adaptation with Fast Convergence and Fast Computation</a></small>
## MiSS
[MiSS](https://huggingface.co/papers/2409.15371) MiSS (Matrix Shard Sharing) is a novel Parameter-Efficient Fine-Tuning (PEFT) method designed to address the trade-off between adaptability and efficiency in Large Language Models. The core approach of MiSS involves a simple shard-sharing mechanism. It achieves low-rank adaptation by decomposing a weight matrix into multiple fragments and then utilizing a shared, trainable "common fragment." The final low-rank update matrix is constructed by replicating these shared, partitioned shards. (MiSS is a novel PEFT method that adopts a low-rank structure, requires only a single trainable matrix, and introduces a new update mechanism distinct from LoRA, achieving an excellent balance between performance and efficiency.)
Intuitively, the shape of a single trainable matrix in Bone is consistent with `lora_B`, so the `r` parameter in Bone is less than the `r` in LoRA by (`in_feature * r`).
<small><a href="https://huggingface.co/papers/2409.15371">MiSS: Balancing LoRA Performance and Efficiency with Simple Shard Sharing</a></small>
Note: Bat's r (b) is special and requires that weight W satisfies the conditions `in_features % r == 0` and `out_features % r == 0`. Additionally, when `in_features == out_features` and Bone-r equals LoRA-r, Bone's number of trainable parameters is only half that of LoRA.
Intuitively, the shape of a single trainable matrix in MiSS is consistent with `lora_B`, so the `r` parameter in MiSS is less than the `r` in LoRA by (`in_feature * r`).
Although the nonlinear updates of Bat bring some performance improvements, they also increase computational overhead. Its main purpose is to provide researchers with a direction for improvement. Therefore, we recommend fine-tuning the comprehensive Bone model instead.
Note: Bat's r (b) is special and requires that weight W satisfies the conditions `in_features % r == 0` and `out_features % r == 0`. Additionally, when `in_features == out_features` and MiSS-r equals LoRA-r, MiSS's number of trainable parameters is only half that of LoRA.
Although the nonlinear updates of Bat bring some performance improvements, they also increase computational overhead. Its main purpose is to provide researchers with a direction for improvement. Therefore, we recommend fine-tuning the comprehensive MiSS model instead.

View File

@ -123,7 +123,7 @@ trainer = SFTTrainer(
model=model,
train_dataset=ds['train'],
peft_config=peft_config,
tokenizer=tokenizer,
processing_class=tokenizer,
args=training_arguments,
data_collator=collator,
)

View File

@ -109,7 +109,7 @@ peft_config = LoraConfig(
```
The parameter `rho` (≥ 1.0) determines how much redistribution is allowed. When `rho=1.0` and `r=16`, LoRA adapters are limited to exactly 16 ranks, preventing any redistribution from occurring. A recommended value for EVA with redistribution is 2.0, meaning the maximum rank allowed for a layer is 2r.
It is recommended to perform EVA initialization on a GPU as it is much faster. To optimize the amount of available memory for EVA, you can use the `low_cpu_mem_usage` flag in [`get_peft_model`]:
It is recommended to perform EVA initialization on an accelerator(e.g. CUDA GPU, Intel XPU) as it is much faster. To optimize the amount of available memory for EVA, you can use the `low_cpu_mem_usage` flag in [`get_peft_model`]:
```python
peft_model = get_peft_model(model, peft_config, low_cpu_mem_usage=True)
```
@ -173,6 +173,111 @@ from peft import LoraConfig
config = LoraConfig(use_rslora=True, ...)
```
### Activated LoRA (aLoRA)
Activated LoRA (aLoRA) is a low rank adapter architecture for Causal LMs that allows for reusing existing base model KV cache for more efficient inference. This approach is best suited for inference pipelines which rely on the base model for most tasks/generations, but use aLoRA adapter(s) to perform specialized task(s) within the chain. For example, checking or correcting generated outputs of the base model. In these settings, inference times can be sped up by an order of magnitude or more. For more information on aLoRA and many example use cases, see https://huggingface.co/papers/2504.12397.
This technique scans for the last occurence of an invocation sequence (`alora_invocation_tokens`) in each input (this can be as short as 1 token), and activates the adapter weights on tokens starting with the beginning of the invocation sequence (any inputs after the invocation sequence are also adapted, and all generated tokens will use the adapted weights). Weights on prior tokens are left un-adapted -- making the cache for those tokens interchangeable with base model cache due to the causal attention mask in Causal LMs. Usage is very similar to standard LoRA, with the key difference that this invocation sequence must be specified when the adapter is created:
```py
from peft import LoraConfig
config = LoraConfig(alora_invocation_tokens=alora_invocation_tokens, task_type="CAUSAL_LM", ...)
```
where `alora_invocation_tokens` is a list of integer token ids. Given a desired invocation string, this can be obtained as
```
invocation_string = "placeholder"
alora_invocation_tokens = tokenizer.encode(invocation_string, add_special_tokens=False).
```
where the tokenizer is the tokenizer for the base model. Note that we have `add_special_tokens=False` to avoid adding SOS/EOS tokens in our search string (which will most likely cause failure to find).
**Notes**
* aLoRA is only supported for `task_type=CAUSAL_LM` tasks due to its focus on cache reuse.
* Since the weights are adapted on fewer tokens, often (not always) aLoRA requires higher rank (`r`) than LoRA. `r=32` can be a good starting point.
* aLoRA weights cannot be merged into the base model by definition, since the adapter weights are selectively applied to a subset of tokens. Attempts to merge will throw errors.
* Beam search is not yet supported.
* It is generally not recommended to add new tokens to the tokenizer that are not present in the base model, as this can complicate the target use case of both the base model and adapter model operating on overlapping context. That said, there is a possible workaround by first efficiently adding [trainable tokens](https://huggingface.co/docs/peft/en/package_reference/trainable_tokens) to the base model prior to training the adapter.
#### Choice of invocation sequence and SFT design
Each input must have the `alora_invocation_tokens` sequence present, it is not added automatically. To maximize model performance without compromising cache reuse, it is recommended to have the adapter weights activated early, i.e. at the start of any adapter-specific prompting, but after any long inputs such as prior generations or documents. As with any model,
formatting should be consistent between train and test.
Consider the following example, where the base model has a chat template,
and the goal is to train the adapter to generate a desired output.
* Option 1: If there is no task-specific prompt, i.e. the input is a chat history with the `assistant` prompt, then the chat template's `assistant` prompt (e.g. `<|start_of_role|>assistant<|end_of_role|>`) is a natural choice for the invocation string. See the model's chat template to find the prompt for the model.
* Option 2: If there is a task-specific prompt for the adapter that describes the task the adapter is learning, and that prompt is put as a `user` turn immediately prior to the generation, then the chat template's `user` prompt (e.g. `<|start_of_role|>user<|end_of_role|>`) is a natural choice for the invocation string.
Once deciding on an invocation string, get the model tokenizer and obtain `alora_invocation_tokens` as
```
alora_invocation_tokens = tokenizer.encode(invocation_string, add_special_tokens=False).
```
An example inference setup is at [alora finetuning](https://github.com/huggingface/peft/blob/main/examples/alora_finetuning/alora_finetuning.py).
**Note** If using custom strings for the invocation string, make sure that the start and end of the string are special tokens to avoid issues with tokenization at the boundaries.
To see why, imagine that 'a', 'b', 'c', and 'ab' are tokens in your tokenizer (numbers 1, 2, 3, 4 respectively). Suppose that your alora_invocation_tokens = [2, 3]. Now imagine your input string is "abc". Because "ab" is a token, this will get tokenized as [4,3]. So the alora_invocation_tokens will fail to be found, despite the string "bc" being in it. If the start and end of the invocation string are special tokens, however, this failure case will never happen since special tokens are never tokenized into the same token with other characters.
#### Using (and reusing) cache for generation
The main purpose of Activated LoRA is to make KV cache interchangeable between the base model and aLoRA adapter models **prior to the invocation sequence** since base and adapted KV values are not compatible. Specifically, keys and values stored during one model generation can be used in subsequent generations to avoid expensive prefill operations for context tokens. When sharing cache between the base model and aLoRA adapters, there are 2 main patterns:
1. The base model has generated something, and an aLoRA adapter is then called to do a followup generation. Example: the base model answers a question, and an aLoRA trained to detect hallucinations checks the base model response.
2. An aLoRA adapter has generated something, and the base model or a different aLoRA adapter is called to do a followup generation where there is partial context overlap with the original aLoRA. Example: The user provides a query, and an aLoRA rewrites the query to be more self-contained and improve retrieval in a RAG system. Then, documents are retrieved and loaded into context, an aLoRA checks if these documents are indeed relevant to the question, and then the base model generates an answer.
To demonstrate the above behaviors when using caching, we're using [DynamicCache](https://huggingface.co/docs/transformers/en/kv_cache) from `transformers`. Care must be taken to ensure that adapted cache values are not mixed with base cache values. In particular, an extra step is required for sharing the cache when there is partial context overlap (pattern 2).
**Pattern 1: Base model followed by aLoRA** Here, the entire input and generation from the base model is input into the aLoRA adapter, along with the invocation sequence:
```
from transformers import DynamicCache
...
cache = DynamicCache()
inputs_base = tokenizer(prompt_base, return_tensors="pt")
# Generate from base model and save cache
with model_alora.disable_adapter():
output = model_alora.generate(inputs_base["input_ids"].to(device),attention_mask=inputs_base["attention_mask"].to(device),past_key_values = cache,return_dict_in_generate=True)
output_text_base = tokenizer.decode(output.sequences[0])
cache = output.past_key_values
# Generate with aLoRA adapter from cache
prompt_alora = output_text + INVOCATION_STRING
inputs_alora = tokenizer(prompt_alora, return_tensors="pt").to(device)
output = model_alora.generate(**inputs_alora, past_key_values=cache)
output_text_alora = tokenizer.decode(output[0])
# Note: cache is now tainted with adapter values and cannot be used in base model from here on!
```
**Pattern 2: aLoRA generation followed by base model (or another aLoRA) with partial context overlap** Here, we prefill the shared context using the base model, and then generate.
```
from transformers import DynamicCache
import copy
...
cache = DynamicCache()
inputs_shared = tokenizer(prompt_shared, return_tensors="pt").to(device)
# Prefill from base model and save cache
with model_alora.disable_adapter():
with torch.no_grad():
model_alora(**inputs_shared, past_key_values=cache)
cache_copy = copy.deepcopy(cache)
# Generate from aLoRA using prefilled cache
prompt_alora = prompt_shared + INVOCATION_STRING
inputs_alora = tokenizer(prompt_alora, return_tensors="pt").to(device)
output = model_alora.generate(**inputs_alora, past_key_values=cache)
output_text_alora = tokenizer.decode(output[0])
# Generate from base model using saved cache not tainted by aLoRA KV values
prompt_base = prompt_shared
inputs_base = tokenizer(prompt_base, return_tensors="pt").to(device)
with model_alora.disable_adapter():
output = model_alora.generate(**inputs_base, past_key_values=cache_copy)
output_text_base = tokenizer.decode(output[0])
```
### Weight-Decomposed Low-Rank Adaptation (DoRA)
@ -203,7 +308,7 @@ model = PeftModel.from_pretrained(base_model, peft_model_id, ephemeral_gpu_offlo
DoRA is optimized (computes faster and takes less memory) for models in the evaluation mode, or when dropout is set to 0. We reuse the
base result at those times to get the speedup.
Running [dora finetuning](https://github.com/huggingface/peft/blob/main/examples/dora_finetuning/dora_finetuning.py)
with `CUDA_VISIBLE_DEVICES=0 time python examples/dora_finetuning/dora_finetuning.py --quantize --lora_dropout 0 --batch_size 16 --eval_step 2 --use_dora`
with `CUDA_VISIBLE_DEVICES=0 ZE_AFFINITY_MASK=0 time python examples/dora_finetuning/dora_finetuning.py --quantize --lora_dropout 0 --batch_size 16 --eval_step 2 --use_dora`
on a 4090 with gradient accumulation set to 2 and max step to 20 resulted with the following observations:
| | Without Optimization | With Optimization |
@ -269,6 +374,18 @@ Outer(
The same logic applies to `alpha_pattern`. If you're in doubt, don't try to get fancy with regular expressions -- just pass the full name for each module with a different rank/alpha, preceded by the `^` prefix, and you should be good.
### Targeting `nn.Parameter` directly
> [!WARNING]
> This feature is experimental and subject to change.
Generally, you should use `target_modules` to target the module (e.g. `nn.Linear`). However, in some circumstances, this is not possible. E.g., in many mixture of expert (MoE) layers in HF Transformers, instead of using `nn.Linear`, an `nn.Parameter` is used. PEFT normally overwrites the `forward` method for LoRA, but for `nn.Parameter`, there is none. Therefore, to apply LoRA to that parameter, it needs to be targeted with `target_parameters`. As an example, for [Llama4](https://huggingface.co/collections/meta-llama/llama-4-67f0c30d9fe03840bc9d0164), you can pass: `target_parameters=['feed_forward.experts.gate_up_proj', 'feed_forward.experts.down_proj]`.
#### Caveats
- At the moment, this argument allows to target 2-dim or 3-dim `nn.Parameter`s. It is assumed that in the case of a 3-dim parameter, the 0th dimension is the expert dimension.
- It is currently not possible to add multiple LoRA adapters (via `model.add_adapter` or `model.load_adapter`) that use `target_parameters` at the same time.
## Optimizers
LoRA training can optionally include special purpose optimizers. Currently PEFT supports LoRA-FA and LoRA+.
@ -361,7 +478,7 @@ special_tokens = ['<|start_think|>', '<|stop_think|>']
tokenizer.add_special_tokens({'additional_special_tokens': special_tokens})
# make room for new tokens in the embedding matrix if it isn't big enough already
base_model.resize_token_embeddings(max(len(tokenizer), base_model.model.embed_tokens.num_embeddings)
base_model.resize_token_embeddings(max(len(tokenizer), base_model.model.embed_tokens.num_embeddings))
# typical LoRA config with `trainable_token_indices` targeting embedding layer `embed_tokens`
# and specifically our new tokens we just added
@ -462,11 +579,13 @@ There are several supported methods for `combination_type`. Refer to the [docume
Now, perform inference:
```python
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
prompt = "Hey, are you conscious? Can you talk to me?"
inputs = tokenizer(prompt, return_tensors="pt")
inputs = {k: v.to("cuda") for k, v in inputs.items()}
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
generate_ids = model.generate(**inputs, max_length=30)
@ -568,3 +687,148 @@ Using this feature has some drawbacks, namely:
- Increase the batch size.
- Try to avoid having a large number of different adapters in the same batch, prefer homogeneous batches. This can be achieved by buffering samples with the same adapter and only perform inference with a small handful of different adapters.
- Take a look at alternative implementations such as [LoRAX](https://github.com/predibase/lorax), [punica](https://github.com/punica-ai/punica), or [S-LoRA](https://github.com/S-LoRA/S-LoRA), which are specialized to work with a large number of different adapters.
## Composing and Reusing LoRA Adapters
### Arrow
[Arrow](https://huggingface.co/papers/2405.11157) is a modular routing algorithm designed to combine multiple pre-trained task-specific LoRA adapters to solve a given task. Rather than merging all adapters naively, Arrow introduces a **gradient-free, token-wise mixture-of-experts (MoE) routing mechanism**. At inference time, it first computes a _prototype_ for each LoRA by extracting the top right singular vector from its SVD decomposition. Each token representation is then compared to these prototypes via cosine similarity to obtain routing coefficients. Tokens are assigned to the top-k most relevant LoRA adapters, with the coefficients normalized through softmax, and their outputs linearly combined. This allows effective reuse of existing LoRA modules for new tasks and leads to stronger zero-shot generalization.
In PEFT, Arrow is enabled through ```ArrowConfig``` and ```create_arrow_model```. You can also configure parameters such as ```top_k``` (the number of LoRA adapters combined per token), ```router_temperature``` (the softmax temperature applied to the routing coefficients), and ```rng_seed``` (for reproducibility).
```py
from peft import create_arrow_model, ArrowConfig
from transformers import AutoModelForCausalLM
# Loading the model
base_model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
# Creating the Arrow config
arrow_config = ArrowConfig(
top_k=3,
router_temperature=1.0,
rng_seed=42,
)
# The LoRA adapters below were trained on a clustered FLAN dataset.
# Task clustering was performed using the Model-Based Clustering (MBC) method,
# as described in the Arrow paper.
# While one could train a separate LoRA for each task and let Arrow route tokens among them,
# training LoRAs on clusters of tasks instead provides an indirect optimization for
# transfer across the multi-task dataset.
task_specific_adapter_paths = [
f"TahaBa/phi3-mini-clustered-flan/ts_expert_{i}" for i in range(10)
]
# Creating the Arrow model
model = create_arrow_model(
base_model=base_model,
task_specific_adapter_paths=task_specific_adapter_paths,
arrow_config=arrow_config,
)
# Now the forward path could be called on this model, like a normal PeftModel.
```
Furthermore, you can add or remove adapters after calling ```create_arrow_model```—for example, to fine-tune a new adapter or discard an unnecessary one. Once the adapters are in place, you can activate the ```"arrow_router"``` for inference to use Arrow. Note that if you add a new LoRA adapter after ```create_arrow_model``` and want to fine-tune it, you must explicitly set the new adapter as active, since ```"arrow_router"``` is activated by default in ```create_arrow_model```.
```py
from trl import SFTTrainer, SFTConfig
# Adding a new adapter and activating it
model.add_adapter(adapter_name='new_adapter')
model.set_adapter('new_adapter')
# Now the model could be trained along the `new_adapter`.
trainer = SFTTrainer(
model=model,
args=SFTConfig(...),
...
)
# Once the training is done, you can activate `arrow_router` and use it in inference
model.set_adapter('arrow_router') # Model is ready to be used at inference time now
```
### GenKnowSub
[GenKnowSub](https://aclanthology.org/2025.acl-short.54/) augments Arrow by purifying task-specific LoRA adapters before routing. The key idea is to subtract general knowledge encoded in LoRA space—based on the [forgetting-via-negation principle](https://huggingface.co/papers/2212.04089)—so that task adapters become more isolated and focused on task-relevant signals. Concretely, GenKnowSub estimates a low-dimensional “general” subspace from a set of general (non task-specific) LoRA adapters and removes this component from each task adapters LoRA update prior to Arrows token-wise routing. This typically improves compositionality and reduces interference when combining many task adapters.
In PEFT, enable GenKnowSub by setting ```use_gks=True``` in ArrowConfig, and providing ```general_adapter_paths``` in ```create_arrow_model```:
```py
from peft import create_arrow_model, ArrowConfig
from transformers import AutoModelForCausalLM
# Loading the model
base_model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
# Creating the Arrow config
arrow_config = ArrowConfig(
top_k=3,
router_temperature=1.0,
use_gks=True,
rng_seed=42,
)
# Path to task-specific, trained on flan clustered dataset (as we explained before.)
task_specific_adapter_paths = [
f"TahaBa/phi3-mini-clustered-flan/ts_expert_{i}" for i in range(10)
]
# These general adapters are trained on English, German, and French Wikipedia dataset,
# with causal language modelling objective, each pair like: (507 token tsentence, 5 token completion), and the loss computed on the completion
general_adapter_paths = [
"TahaBa/phi3-mini-general-adapters/cluster0_batch16_prop1.0_langen/checkpoint-17",
"TahaBa/phi3-mini-general-adapters/cluster0_batch16_prop1.0_langfr/checkpoint-35",
"TahaBa/phi3-mini-general-adapters/cluster0_batch16_prop1.0_langger/checkpoint-17"
]
# Creating the Arrow model
model = create_arrow_model(
base_model=base_model,
task_specific_adapter_paths=task_specific_adapter_paths,
general_adapter_paths=general_adapter_paths,
arrow_config=arrow_config,
)
# Now the forward path could be called on this model, like a normal PeftModel.
```
To encode general knowledge, GenKnowSub subtracts the average of the provided general adapters from each task-specific adapter once, before routing begins. Furthermore, the ability to add or remove adapters after calling ```create_arrow_model``` (as described in the Arrow section) is still supported in this case.
<Tip>
**Things to keep in mind when using Arrow + GenKnowSub:**
- All LoRA adapters (task-specific and general) must share the same ```rank``` and ```target_modules```.
- Any inconsistency in these settings will raise an error in ```create_arrow_model```.
- Having different scaling factors (```lora_alpha```) across task adapters is supported — Arrow handles them automatically.
- Merging the ```"arrow_router"``` is not supported, due to its dynamic routing behavior.
- In create_arrow_model, task adapters are loaded as ```task_i``` and general adapters as ```gks_j``` (where ```i``` and ```j``` are indices). The function ensures consistency of ```target_modules```, ```rank```, and whether adapters are applied to ```Linear``` or ```Linear4bit``` layers. It then adds the ```"arrow_router"``` module and activates it. Any customization of this process requires overriding ```create_arrow_model```.
- This implementation is compatible with 4-bit quantization (via bitsandbytes):
```py
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch
# Quantisation config
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=False,
)
# Loading the model
base_model = AutoModelForCausalLM.from_pretrained(
"microsoft/Phi-3-mini-4k-instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
quantization_config=bnb_config,
)
# Now call create_arrow_model() as we explained before.
```
</Tip>

View File

@ -16,7 +16,7 @@ rendered properly in your Markdown viewer.
# Adapter injection
With PEFT, you can inject trainable adapters into any `torch` module which allows you to use adapter methods without relying on the modeling classes in PEFT. Currently, PEFT supports injecting [LoRA](../conceptual_guides/adapter#low-rank-adaptation-lora), [AdaLoRA](../conceptual_guides/adapter#adaptive-low-rank-adaptation-adalora), and [IA3](../conceptual_guides/ia3) into models because for these adapters, inplace modification of the model is sufficient for finetuning it.
With PEFT, you can inject trainable adapters into any `torch` module which allows you to use adapter methods without relying on the modeling classes in PEFT. This works for all adapters except for those based on prompt learning (e.g. prefix tuning or p-tuning).
Check the table below to see when you should inject adapters.
@ -87,6 +87,28 @@ DummyModel(
)
```
### Injection based on a `state_dict`
Sometimes, it is possible that there is a PEFT adapter checkpoint but the corresponding PEFT config is not known for whatever reason. To inject the PEFT layers for this checkpoint, you would usually have to reverse-engineer the corresponding PEFT config, most notably the `target_modules` argument, based on the `state_dict` from the checkpoint. This can be cumbersome and error prone. To avoid this, it is also possible to call [`inject_adapter_in_model`] and pass the loaded `state_dict` as an argument:
```python
from safetensors.torch import load_file
model = ...
state_dict = load_file(<path-to-safetensors-file>)
lora_config = LoraConfig(...)
model = inject_adapter_in_model(lora_config, model, state_dict=state_dict)
```
In this case, PEFT will use the `state_dict` as reference for which layers to target instead of using the PEFT config. As a user, you don't have to set the exact `target_modules` of the PEFT config for this to work. However, you should still pass a PEFT config of the right type, in this example `LoraConfig`, you can leave the `target_modules` as `None`.
Be aware that this still only creates the uninitialized PEFT layers, the values from the `state_dict` are not used to populate the model weights. To populate the weights, proceed with calling [`set_peft_model_state_dict`] as described below.
⚠️ Note that if there is a mismatch between what is configured in the PEFT config and what is found in the `state_dict`, PEFT will warn you about this. You can ignore the warning if you know that the PEFT config is not correctly specified.
> [!WARNING]
> If the original PEFT adapters was using `target_parameters` instead of `target_modules`, injecting from a `state_dict` will not work correctly. In this case, it is mandatory to use the correct PEFT config for injection.
## Saving the model
To only save the adapter, use the [`get_peft_model_state_dict`] function:

View File

@ -99,12 +99,13 @@ Now you can use the merged model as an instruction-tuned model to write ad copy
<hfoption id="instruct">
```py
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
messages = [
{"role": "user", "content": "Write an essay about Generative AI."},
]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt")
inputs = {k: v.to("cuda") for k, v in inputs.items()}
inputs = {k: v.to(device) for k, v in inputs.items()}
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=True, top_p=0.95, temperature=0.2, repetition_penalty=1.2, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0]))
```
@ -113,13 +114,14 @@ print(tokenizer.decode(outputs[0]))
<hfoption id="ad copy">
```py
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
messages = [
{"role": "system", "content": "Create a text ad given the following product and description."},
{"role": "user", "content": "Product: Sony PS5 PlayStation Console\nDescription: The PS5 console unleashes new gaming possibilities that you never anticipated."},
]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt")
inputs = {k: v.to("cuda") for k, v in inputs.items()}
inputs = {k: v.to(device) for k, v in inputs.items()}
outputs = model.generate(**inputs, max_new_tokens=128, do_sample=True, top_p=0.95, temperature=0.2, repetition_penalty=1.2, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0]))
```
@ -128,13 +130,15 @@ print(tokenizer.decode(outputs[0]))
<hfoption id="SQL">
```py
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
text = """Table: 2-11365528-2
Columns: ['Team', 'Head Coach', 'President', 'Home Ground', 'Location']
Natural Query: Who is the Head Coach of the team whose President is Mario Volarevic?
SQL Query:"""
inputs = tokenizer(text, return_tensors="pt")
inputs = {k: v.to("cuda") for k, v in inputs.items()}
inputs = {k: v.to(device) for k, v in inputs.items()}
outputs = model.generate(**inputs, max_new_tokens=64, repetition_penalty=1.1, eos_token_id=tokenizer("</s>").input_ids[-1])
print(tokenizer.decode(outputs[0]))
```

View File

@ -197,7 +197,9 @@ The models that are quantized using Half-Quadratic Quantization of Large Machine
```python
from hqq.engine.hf import HQQModelForCausalLM
quantized_model = HQQModelForCausalLM.from_quantized(save_dir_or_hfhub, device='cuda')
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
quantized_model = HQQModelForCausalLM.from_quantized(save_dir_or_hfhub, device=device)
peft_config = LoraConfig(...)
quantized_model = get_peft_model(quantized_model, peft_config)
```

View File

@ -145,9 +145,37 @@ As an example, when loading a model that is using the DeBERTa architecture for s
### Extending the vocabulary
For many language fine-tuning tasks, extending the model's vocabulary is necessary since new tokens are being introduced. This requires extending the embedding layer to account for the new tokens and also storing the embedding layer in addition to the adapter weights when saving the adapter.
For many language fine-tuning tasks, extending the model's vocabulary is necessary since new tokens are being introduced. This requires extending the embedding layer to account for the new tokens and, depending on the fine-tuning method, also storing the embedding layer in addition to the adapter weights when saving the adapter. There are a few ways of achieving this ordered by parameter effectiveness:
Save the embedding layer by adding it to the `target_modules` of the config. The embedding layer name must follow the standard naming scheme from Transformers. For example, the Mistral config could look like this:
- [trainable tokens](../package_reference/trainable_tokens), train only the specified tokens, optionally store only the updated values
- training an adapter on the embedding matrix, optionally store only the updated values
- full-finetuning of the embedding layer
#### Using trainable tokens
Let's start with trainable tokens, in this case its [LoRA integration](../developer_guides/lora#efficiently-train-tokens-alongside-lora). If you're interested in only training the new embeddings and nothing else, refer to the [standalone documentation](../package_reference/trainable_tokens).
To enable selective token training of the embedding layer, you'll need to supply the token ids of your newly added tokens via the `trainable_token_indices` parameter. Optionally you can specify which layer to target if there is more than one embedding layer. For a Mistral model this could look like this:
```python
new_tokens = ['<think>', '</think>']
tokenizer.add_tokens(new_tokens)
base_model.resize_token_embeddings(len(tokenizer))
lora_config = LoraConfig(
...,
trainable_token_indices={'embed_tokens': tokenizer.convert_tokens_to_ids(new_tokens)},
)
```
If your model uses tied weights (such as the `lm_head`), trainable tokens will try to resolve those and keep them updated as well, so in that case there should be no need for adding `modules_to_save=["lm_head"]`. This only works if the model uses the Transformers convention for tying weights.
Saving the model with `model.save_pretrained` may save the full embedding matrix instead of
only the difference as a precaution because the embedding matrix was resized. To save space you can disable this behavior by setting `save_embedding_layers=False` when calling `save_pretrained`. This is safe to do as long as you don't modify the embedding matrix through other means as well, as such changes will be not tracked by trainable tokens.
#### Using an adapter, e.g. LoRA
Prepare the embedding layer by adding it to the `target_modules` of your adapter config. For example, the Mistral config could look like this:
```python
config = LoraConfig(..., target_modules=["embed_tokens", "lm_head", "q_proj", "v_proj"])
@ -155,7 +183,7 @@ config = LoraConfig(..., target_modules=["embed_tokens", "lm_head", "q_proj", "v
Once added to `target_modules`, PEFT automatically stores the embedding layer when saving the adapter if the model has the [`~transformers.PreTrainedModel.get_input_embeddings`] and [`~transformers.PreTrainedModel.get_output_embeddings`]. This is generally the case for Transformers models.
If the model's embedding layer doesn't follow the Transformer's naming scheme, you can still save it by manually passing `save_embedding_layers=True` when saving the adapter:
If the model's embedding layer doesn't follow the Transformer's naming scheme but nevertheless implements `get_input_embeddings`, you can still save it by manually passing `save_embedding_layers=True` when saving the adapter:
```python
model = get_peft_model(...)
@ -167,6 +195,14 @@ For inference, load the base model first and resize it the same way you did befo
For a complete example, please check out [this notebook](https://github.com/huggingface/peft/blob/main/examples/causal_language_modeling/peft_lora_clm_with_additional_tokens.ipynb).
#### Full fine-tuning
Full fine-tuning is more costly in terms of VRAM or storage space but if all else fails, you can fall back to this and see if it works for you. Achieve it by adding the name of the embedding layer to `modules_to_save`. Note that you need to add tied layers as well, e.g. `lm_head`. Example for a Mistral model with LoRA:
```python
config = LoraConfig(..., modules_to_save=["embed_tokens", "lm_head"], target_modules=["q_proj", "v_proj"])
```
### Getting a warning about "weights not being initialized from the model checkpoint"
When you load your PEFT model which has been trained on a task (for example, classification), you may get a warning like:

View File

@ -0,0 +1,33 @@
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Functions for PEFT integration
A collection of functions that could be useful for non-PeftModel models, e.g. transformers or diffusers integration
The functions provided here can be considered "public API" of PEFT and hence are safe to be used by packages that provide PEFT integrations.
## Cast the adapter weight dtypes
[[autodoc]] functional.cast_adapter_dtype
- all
## Delete the PEFT adapter from model
[[autodoc]] functional.delete_adapter
- all
## Get the state dict of the PEFT adapter
[[autodoc]] functional.get_peft_model_state_dict
- all
## Inject a PEFT adapter into the model based on a PEFT config
[[autodoc]] functional.inject_adapter_in_model
- all
## Set the active PEFT adapter(s) of the model
[[autodoc]] functional.set_adapter
- all
## Load the weights of the PEFT state dict into the model
[[autodoc]] functional.set_peft_model_state_dict
- all

View File

@ -32,6 +32,10 @@ The abstract from the paper is:
## Utility
### ArrowConfig
[[autodoc]] tuners.lora.config.ArrowConfig
### LoftQ
[[autodoc]] utils.loftq_utils.replace_lora_weights_loftq

View File

@ -0,0 +1,32 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# MiSS
MiSS: Balancing LoRA Performance and Efficiency with Simple Shard Sharing([MiSS](https://huggingface.co/papers/2409.15371)) is a novel PEFT method that adopts a low-rank structure, requires only a single trainable matrix, and introduces a new update mechanism distinct from LoRA, achieving an excellent balance between performance and efficiency.
The abstract from the paper is:
*Parameter-Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), effectively reduce the number of trainable parameters in Large Language Models (LLMs). However, as model scales continue to grow, the demand for computational resources remains a significant challenge. Existing LoRA variants often struggle to strike an optimal balance between adaptability (model performance and convergence speed) and efficiency (computational overhead, memory usage, and initialization time). This paper introduces MiSS(Matrix Shard Sharing ), a novel PEFT approach that addresses this trade-off through a simple shard-sharing mechanism. MiSS leverages the insight that a low-rank adaptation can be achieved by decomposing the weight matrix into multiple fragment matrices and utilizing a shared, trainable common fragment. This method constructs the low-rank update matrix through the replication of these shared, partitioned shards. We also propose a hardware-efficient and broadly applicable implementation for MiSS. Extensive experiments conducted on a range of tasks, alongside a systematic analysis of computational performance, demonstrate MiSS's superiority. The results show that MiSS significantly outperforms standard LoRA and its prominent variants in both model performance metrics and computational efficiency, including initialization speed and training throughput. By effectively balancing expressive power and resource utilization, MiSS offers a compelling solution for efficiently adapting large-scale models*.
## MissConfig
[[autodoc]] tuners.miss.config.MissConfig
## MissModel
[[autodoc]] tuners.miss.model.MissModel

View File

@ -0,0 +1,31 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# RoAd
[RoAd](https://arxiv.org/pdf/2409.00119) is a parameterefficient finetuning technique that adapts large language models by learning a small set of 2×2 rotation matrices (and optional scaling factors) applied to pairs of hidden dimensions. RoAd achieves competitive or superior performance compared to other PEFT methods with under 0.1% trainable parameters. Unlike LoRAs batched lowrank updates, RoAds sparse rotations reformulate to simple elementwise operations, yielding significantly higher serving throughput when handling heterogeneous requests in the same batch, i.e. serving multiple adapters simulatenously. Moreover, RoAd integrates seamlessly into a distributed interchange intervention framework, interpreting its sparse 2D rotations as task-specific interventions within learned subspaces of hidden representations. These orthogonal subspaces can be composed to merge multiple task-specific behaviors—like multilingual capabilities or instruction following—without additional fine-tuning, enabling modular, interpretable adaptations in LLMs.
Finetuning with RoAd typically requires higher learning rate compared to LoRA or similar methods, around 1e-3. Currently RoAd only supports linear layers and it can be used on models quantized with bitsandbytes (4-bit or 8-bit).
For running inference with different RoAd adapters in the same batch see [Inference with different LoRA adapters in the same batch](../developer_guides/lora#inference-with-different-lora-adapters-in-the-same-batch).
## RoadConfig
[[autodoc]] tuners.road.config.RoadConfig
## RoadModel
[[autodoc]] tuners.road.model.RoadModel

View File

@ -0,0 +1,35 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Sparse High Rank Adapters
Sparse High Rank Adapters or [SHiRA](https://arxiv.org/abs/2406.13175) is an alternate type of adapter and has been found to have significant advantages over the low rank adapters. Specifically, SHiRA achieves better accuracy than LoRA for a variety of vision and language tasks. It also offers simpler and higher quality multi-adapter fusion by significantly reducing concept loss, a common problem faced by low rank adapters. SHiRA directly finetunes a small number of the base model's parameters to finetune the model on any adaptation task.
SHiRA currently has the following constraint:
- Only `nn.Linear` layers are supported.
The abstract from the paper is:
> Low Rank Adaptation (LoRA) has gained massive attention in the recent generative AI research. One of the main advantages of LoRA is its ability to be fused with pretrained models, adding no overhead during inference. However, from a mobile deployment standpoint, we can either avoid inference overhead in the fused mode but lose the ability to switch adapters rapidly, or suffer significant (up to 30% higher) inference latency while enabling rapid switching in the unfused mode. LoRA also exhibits concept-loss when multiple adapters are used concurrently. In this paper, we propose Sparse High Rank Adapters (SHiRA), a new paradigm which incurs no inference overhead, enables rapid switching, and significantly reduces concept-loss. Specifically, SHiRA can be trained by directly tuning only 1-2% of the base model weights while leaving others unchanged. This results in a highly sparse adapter which can be switched directly in the fused mode. We further provide theoretical and empirical insights on how high sparsity in SHiRA can aid multi-adapter fusion by reducing concept loss. Our extensive experiments on LVMs and LLMs demonstrate that finetuning only a small fraction of the parameters in the base model significantly outperforms LoRA while enabling both rapid switching and multi-adapter fusion. Finally, we provide a latency- and memory-efficient SHiRA implementation based on Parameter-Efficient Finetuning (PEFT) Library which trains at nearly the same speed as LoRA while consuming up to 16% lower peak GPU memory, thus making SHiRA easy to adopt for practical use cases. To demonstrate rapid switching benefits during inference, we show that loading SHiRA on a base model can be 5x-16x faster than LoRA fusion on a CPU.
## ShiraConfig
[[autodoc]] tuners.shira.config.ShiraConfig
## ShiraModel
[[autodoc]] tuners.shira.model.ShiraModel

View File

@ -33,6 +33,13 @@ Note that this method does not add tokens for you, you have to add tokens to the
embedding matrix of the model accordingly. This method will only re-train the embeddings for the tokens you specify.
This method can also be used in conjunction with LoRA layers! See [the LoRA developer guide](../developer_guides/lora#efficiently-train-tokens-alongside-lora).
> [!TIP]
> Saving the model with [`~PeftModel.save_pretrained`] or retrieving the state dict using
> [`get_peft_model_state_dict`] when adding new tokens may save the full embedding matrix instead of only the difference
> as a precaution because the embedding matrix was resized. To save space you can disable this behavior by setting
> `save_embedding_layers=False` when calling `save_pretrained`. This is safe to do as long as you don't modify the
> embedding matrix through other means as well, as such changes will be not tracked by trainable tokens.
## TrainableTokensConfig
[[autodoc]] tuners.trainable_tokens.config.TrainableTokensConfig

View File

@ -90,7 +90,7 @@ trainer = Trainer(
args=training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["test"],
tokenizer=tokenizer,
processing_class=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics,
)

View File

@ -92,7 +92,7 @@ processed_ds = ds.map(
)
```
Create a training and evaluation [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), and set `pin_memory=True` to speed up data transfer to the GPU during training if your dataset samples are on a CPU.
Create a training and evaluation [`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), and set `pin_memory=True` to speed up data transfer to the accelerator during training if your dataset samples are on a CPU.
```py
from torch.utils.data import DataLoader
@ -159,12 +159,12 @@ lr_scheduler = get_linear_schedule_with_warmup(
)
```
Move the model to the GPU and create a training loop that reports the loss and perplexity for each epoch.
Move the model to the accelerator and create a training loop that reports the loss and perplexity for each epoch.
```py
from tqdm import tqdm
device = "cuda"
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
model = model.to(device)
for epoch in range(num_epochs):
@ -219,7 +219,9 @@ To load the model for inference, use the [`~AutoPeftModelForSeq2SeqLM.from_pretr
```py
from peft import AutoPeftModelForSeq2SeqLM
model = AutoPeftModelForSeq2SeqLM.from_pretrained("<your-hf-account-name>/mt0-large-ia3").to("cuda")
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
model = AutoPeftModelForSeq2SeqLM.from_pretrained("<your-hf-account-name>/mt0-large-ia3").to(device)
tokenizer = AutoTokenizer.from_pretrained("bigscience/mt0-large")
i = 15

View File

@ -281,7 +281,7 @@ trainer = Trainer(
args,
train_dataset=train_ds,
eval_dataset=val_ds,
tokenizer=image_processor,
processing_class=image_processor,
data_collator=collate_fn,
)
trainer.train()

View File

@ -43,7 +43,13 @@ Use the [`~datasets.load_dataset`] function to load the dataset and create a new
```py
from datasets import load_dataset
ds = load_dataset("ought/raft", "twitter_complaints")
ds = load_dataset(
"parquet",
data_files={
"train": "hf://datasets/ought/raft@refs/convert/parquet/twitter_complaints/train/0000.parquet",
"test": "hf://datasets/ought/raft@refs/convert/parquet/twitter_complaints/test/0000.parquet"
}
)
classes = [k.replace("_", " ") for k in ds["train"].features["Label"].names]
ds = ds.map(

View File

@ -0,0 +1,76 @@
# Activated LoRA (aLoRA)
## Introduction
Activated LoRA (aLoRA) is an adapter that selectively activates its weights only after a given invocation sequence, ensuring that hidden states match the base model prior to this point. This allows reusing the base model KVs (stored in the KV cache) for tokens before the invocation,
enabling much faster real-world inference (e.g. vLLM) when switching between generation with the base model and generation with adapters.
See the [paper](https://huggingface.co/papers/2504.12397) for more details.
## Quick start (shown for Mistral 7B)
```python
import torch
from peft import LoraConfig, get_peft_model
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, DataCollatorForLanguageModeling
from datasets import load_dataset
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3", device_map="cuda")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")
dataset = load_dataset("Lots-of-LoRAs/task1660_super_glue_question_generation", split="train")
invocation_string = "[/INST]" # End of user turn in Mistral chat template
invocation_tokens = tokenizer.encode(invocation_string, add_special_tokens=False)
lora_config = LoraConfig(
task_type="CAUSAL_LM",
alora_invocation_tokens=invocation_tokens,
r=32,
target_modules=["q_proj", "k_proj", "v_proj"],
)
peft_model = get_peft_model(model, lora_config)
data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)
trainer = Trainer(
model=peft_model,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=2048,
tokenizer=tokenizer,
data_collator=data_collator,
)
trainer.train()
peft_model.save_pretrained("alora-mistral-7b")
```
### Use the training example script directly
Pass the invocation string with `--invocation_string` when running the training example
script. For Mistral 7B, do:
```bash
python examples/alora_finetuning/alora_finetuning.py --base_model mistralai/Mistral-7B-Instruct-v0.3 --data_path Lots-of-LoRAs/task1660_super_glue_question_generation --invocation_string "[/INST]"
```
and similarly for Llama-3.2-3B-Instruct:
```bash
python examples/alora_finetuning/alora_finetuning.py --base_model meta-llama/Llama-3.2-3B-Instruct --data_path Lots-of-LoRAs/task1660_super_glue_question_generation --invocation_string "<|start_header_id|>assistant<|end_header_id|>"
```
### Full example of the script
```bash
python alora_finetuning.py \
--base_model "PATH_TO_MODEL" \
--data_path "PATH_TO_DATASET" \
--output_dir "PATH_TO_OUTPUT_DIR" \
--batch_size 1 \
--num_epochs 3 \
--learning_rate 3e-4 \
--cutoff_len 512 \
--val_set_size 500 \
--invocation_string "[/INST]" \
--quantize \
--eval_step 10 \
--save_step 100 \
--device "cuda:0" \
--lora_r 32 \
--lora_alpha 32 \
--lora_dropout 0.05 \
--lora_target_modules "q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj" \
--hub_model_id "YOUR_HF_REPO" \
--push_to_hub
```

View File

@ -0,0 +1,251 @@
import os
import torch
from datasets import load_dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
DataCollatorForLanguageModeling,
Trainer,
TrainingArguments,
)
from peft import LoraConfig, PeftModel, get_peft_model, prepare_model_for_kbit_training
def train_model(
base_model: str,
data_path: str,
output_dir: str,
batch_size: int,
num_epochs: int,
learning_rate: float,
cutoff_len: int,
val_set_size: int,
invocation_string: str,
quantize: bool,
eval_step: int,
save_step: int,
device: str,
lora_r: int,
lora_alpha: int,
lora_dropout: float,
lora_target_modules: str,
hub_model_id: str,
push_to_hub: bool,
):
os.environ["TOKENIZERS_PARALLELISM"] = "false"
hf_token = os.getenv("HF_TOKEN")
device = torch.device(device)
print(f"Using device: {device}")
tokenizer = AutoTokenizer.from_pretrained(base_model, token=hf_token)
tokenizer.pad_token = tokenizer.unk_token
invocation_tokens = tokenizer.encode(invocation_string, add_special_tokens=False)
if quantize:
model = AutoModelForCausalLM.from_pretrained(
base_model,
token=hf_token,
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=(
torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float16
),
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
),
)
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True)
else:
model = AutoModelForCausalLM.from_pretrained(base_model, token=hf_token)
lora_config = LoraConfig(
task_type="CAUSAL_LM",
alora_invocation_tokens=invocation_tokens,
r=lora_r,
lora_alpha=lora_alpha,
target_modules=(lora_target_modules.split(",") if lora_target_modules else ["q_proj", "k_proj", "v_proj"]),
lora_dropout=lora_dropout,
bias="none",
)
model = get_peft_model(model, lora_config)
model.to(device)
tokenizer.pad_token = tokenizer.eos_token
dataset = load_dataset(data_path)
def tokenize_function(examples):
formatted_texts = [
tokenizer.apply_chat_template(
[
{"role": "user", "content": user_msg},
{"role": "assistant", "content": assistant_msg},
],
tokenize=False, # get plain text first
add_generation_prompt=False,
)
for user_msg, assistant_msg in zip(examples["input"], examples["output"])
]
# 2) Tokenize those texts
model_inputs = tokenizer(
formatted_texts,
padding="max_length",
truncation=True,
max_length=cutoff_len,
)
labels = []
for ids in model_inputs["input_ids"]:
labels.append([(token_id if token_id != tokenizer.pad_token_id else -100) for token_id in ids])
model_inputs["labels"] = labels
return model_inputs
# Tokenize the dataset and prepare for training
tokenized_datasets = dataset.map(tokenize_function, batched=True, remove_columns=dataset["train"].column_names)
# Data collator to dynamically pad the batched examples
data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)
training_args = TrainingArguments(
output_dir=output_dir,
num_train_epochs=num_epochs,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
warmup_steps=100,
weight_decay=0.01,
logging_dir="./logs",
logging_steps=eval_step,
save_steps=save_step,
save_total_limit=2,
push_to_hub=push_to_hub,
hub_model_id=hub_model_id,
gradient_accumulation_steps=16,
fp16=True,
learning_rate=learning_rate,
hub_token=hf_token,
)
torch.cuda.empty_cache()
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["test"],
data_collator=data_collator,
)
trainer.train()
if push_to_hub:
trainer.push_to_hub(commit_message="Fine-tuned model")
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)
def model_inference(model_path: str, adapter_path: str, prompt: str = None, data_path: str = None):
"""
Simple inference with the tuned aLoRA adapter. Optionally (reuse_cache = True) demonstrates
that the aLoRA adapter can (but does not need to) use KV cache created by the base model,
perhaps during a prior generation turn.
Purely for demonstration purposes. See the [paper](https://huggingface.co/papers/2504.12397)
for realistic multiturn cache reuse examples.
"""
if prompt is None:
# Use first row of test data
dataset = load_dataset(data_path)
prompt = dataset["test"][0]["input"]
tokenizer = AutoTokenizer.from_pretrained(model_path)
base_model = AutoModelForCausalLM.from_pretrained(model_path)
alora_model = PeftModel.from_pretrained(base_model, adapter_path)
chat = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(base_model.device)
# Generate answer with adapter
output_dict = alora_model.generate(**inputs, return_dict_in_generate=True, max_new_tokens=20)
alora_outputs = output_dict.sequences
# Print results
print(f"Prompt: {text}")
response = tokenizer.decode(alora_outputs[0][inputs["input_ids"].shape[1] :], skip_special_tokens=True)
print(f"Trained adapter response: {response}")
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="Fine-tune Mistral with Activated LoRA")
parser.add_argument(
"--base_model", type=str, default="mistralai/Mistral-7B-Instruct-v0.3", help="Base model path or name"
)
parser.add_argument(
"--data_path",
type=str,
default="Lots-of-LoRAs/task1660_super_glue_question_generation",
help="Dataset path or name",
)
parser.add_argument(
"--output_dir", type=str, default="path/to/output", help="Output directory for the fine-tuned model"
)
parser.add_argument("--batch_size", type=int, default=2, help="Batch size")
parser.add_argument("--num_epochs", type=int, default=1, help="Number of training epochs")
parser.add_argument("--learning_rate", type=float, default=1e-4, help="Learning rate")
parser.add_argument("--cutoff_len", type=int, default=2048, help="Cutoff length for tokenization")
parser.add_argument("--val_set_size", type=int, default=500, help="Validation set size")
parser.add_argument(
"--invocation_string",
type=str,
default="[/INST]",
help="String that activates the aLoRA adapter. Model dependent.",
)
parser.add_argument("--quantize", action="store_true", help="Use quantization")
parser.add_argument("--eval_step", type=int, default=10, help="Evaluation step interval")
parser.add_argument("--save_step", type=int, default=100, help="Save step interval")
parser.add_argument("--device", type=str, default="cuda:0", help="Device to use for training")
parser.add_argument("--lora_r", type=int, default=32, help="LoRA rank")
parser.add_argument("--lora_alpha", type=int, default=32, help="LoRA alpha")
parser.add_argument("--lora_dropout", type=float, default=0.05, help="LoRA dropout rate")
parser.add_argument(
"--lora_target_modules", type=str, default=None, help="Comma-separated list of target modules for LoRA"
)
parser.add_argument(
"--hub_model_id",
type=str,
default="path/to/repo",
help="Repository name to push the model on the Hugging Face Hub",
)
parser.add_argument("--push_to_hub", action="store_true", help="Whether to push the model to Hugging Face Hub")
args = parser.parse_args()
train_model(
base_model=args.base_model,
data_path=args.data_path,
output_dir=args.output_dir,
batch_size=args.batch_size,
num_epochs=args.num_epochs,
learning_rate=args.learning_rate,
cutoff_len=args.cutoff_len,
val_set_size=args.val_set_size,
invocation_string=args.invocation_string,
quantize=args.quantize,
eval_step=args.eval_step,
save_step=args.save_step,
device=args.device,
lora_r=args.lora_r,
lora_alpha=args.lora_alpha,
lora_dropout=args.lora_dropout,
lora_target_modules=args.lora_target_modules,
hub_model_id=args.hub_model_id,
push_to_hub=args.push_to_hub,
)
print("Model trained. Running test inference.")
model_inference(model_path=args.base_model, adapter_path=args.output_dir, data_path=args.data_path)

View File

@ -0,0 +1,375 @@
# Copyright 2025-present the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
This script provides a simple evaluation pipeline for multiple-choice reasoning datasets
(e.g., BoolQ, HellaSwag, ARC, OpenBookQA, Winogrande) with different composition strategies.
Usage examples:
python arrow_phi3_mini.py --strategy base --ds_name arc-challenge
python arrow_phi3_mini.py --strategy arrow --ds_name boolq
python arrow_phi3_mini.py --strategy gks --ds_name hswag
Key features:
- Supports three strategies:
"base" → Evaluate the quantized base model directly
"arrow" → Use Arrow modular routing with task-specific adapters
"gks" → Use Arrow + GenKnowSub (subtracting general-domain knowledge)
- Loads evaluation datasets from the Hugging Face Hub
- Implements a batched evaluation loop that computes per-option likelihoods and selects
the answer with the lowest average loss
- Reports simple accuracy
Implementation details:
- The base model is quantized to 4-bit using `BitsAndBytesConfig` (nf4, bf16 compute).
- For Arrow and GKS, task-specific adapters are loaded from the Hugging Face Hub:
TahaBa/phi3-mini-clustered-flan/ts_expert_i
- Task-specific adapters were trained on 10 clusters of FLAN tasks.
- The clusters were created using Model-Based Clustering (MBC):
1. Train a LoRA adapter for each individual task.
2. Apply k-means clustering to group tasks based on these adapters.
3. Train a LoRA adapter for each resulting cluster.
For more details, see the Arrow paper: https://huggingface.co/papers/2405.11157
- For GKS, general adapters are loaded from:
TahaBa/phi3-mini-general-adapters/...
- These adapters were trained on English, French, and German Wikipedia data
using a causal language modeling objective with (507-token context → 5-token completion) pairs.
- This setup encodes general knowledge into the LoRA space, which can then be
subtracted from task-specific adapters during inference to isolate and purify them.
For more details, see the GenKnowSub paper: https://huggingface.co/papers/2505.10939
- `evaluate_on_multi_choice_batched` handles tokenization, masking context tokens,
and computing per-choice log-likelihoods for fair comparison.
- Accuracy is printed at the end for the selected dataset.
This script is mainly meant for demonstration purposes and lightweight evaluation,
not full-scale benchmarking (batch size / max length can be tuned).
=======================================================================================
Results (evaluated with microsoft/Phi-3-mini-4k-instruct, 4-bit quantization):
| Dataset | Base Acc. | Arrow Acc. | Arrow+GKS Acc. |
|--------------|-----------|------------|----------------|
| ARC-Challenge| 0.4515 | 0.5418 | 0.5585 |
| ARC-Easy | 0.6894 | 0.8404 | 0.8473 |
| Winogrande | 0.5769 | 0.6550 | 0.6724 |
| BoolQ | 0.8146 | 0.8030 | 0.8247 |
| OpenBookQA | 0.43 | 0.448 | 0.472 |
| HellaSwag | 0.7318 | 0.7150 | 0.7376 |
Observations:
- Arrow generally improves over the base model by routing tokens to the most relevant task adapters.
- Applying GKS (general knowledge subtraction) consistently gives further gains compared to Arrow and Base.
These numbers are not meant as leaderboard results, but as a sanity check
to verify that the implementation works as expected and demonstrates
the benefits of Arrow and GenKnowSub.
"""
import argparse
import random
import numpy as np
import torch
from datasets import load_dataset
from sklearn.metrics import accuracy_score
from tqdm import tqdm
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import ArrowConfig, create_arrow_model
MODEL_NAME = "microsoft/Phi-3-mini-4k-instruct"
MODEL_MAX_LEN = 2048
def parse_args():
parser = argparse.ArgumentParser(description="Training script with strategy selection")
parser.add_argument(
"--strategy",
type=str,
choices=["base", "arrow", "gks"],
default="base",
help="Training strategy to use: base, arrow, or gks",
)
parser.add_argument(
"--ds_name",
type=str,
choices=["boolq", "hswag", "arc-easy", "arc-challenge", "oqa", "wg"],
default="arc-challenge",
help="Dataset to use: boolq, hswag, arc-easy, arc-challenge, oqa, wg",
)
return parser.parse_args()
def read_test_dataset(ds_name):
if ds_name == "boolq":
ds = load_dataset("google/boolq", split="validation", trust_remote_code=True)
elif ds_name == "hswag":
ds = load_dataset("Rowan/hellaswag", split="validation", trust_remote_code=True)
elif ds_name == "arc-challenge":
ds = load_dataset("allenai/ai2_arc", "ARC-Challenge", split="validation", trust_remote_code=True)
elif ds_name == "arc-easy":
ds = load_dataset("allenai/ai2_arc", "ARC-Easy", split="validation", trust_remote_code=True)
elif ds_name == "oqa":
ds = load_dataset("allenai/openbookqa", split="validation", trust_remote_code=True)
elif ds_name == "wg":
ds = load_dataset("allenai/winogrande", "winogrande_xl", split="validation", trust_remote_code=True)
else:
raise f"Dataset {ds_name} is not supported yet."
return ds
def extract_input_content(ds_name, row):
if ds_name == "boolq":
return f"[passage]{row['passage']}[question]{row['question']}"
if ds_name == "hswag":
return row["ctx"]
if (ds_name == "arc-challenge") or (ds_name == "arc-easy"):
return row["question"]
if ds_name == "oqa":
return row["question_stem"]
if ds_name == "wg":
return row["sentence"]
def create_multi_choice_options(row, ds_name):
options_texts = []
content = extract_input_content(ds_name, row)
if ds_name == "boolq":
choices = ["true", "false"]
if ds_name == "hswag":
choices = row["endings"]
if (ds_name == "arc-challenge") or (ds_name == "arc-easy"):
choices = row["choices"]["text"]
if ds_name == "wg":
choices = [row["option1"], row["option2"]]
if ds_name == "oqa":
choices = row["choices"]["text"]
for choice in choices:
options_texts.append(f"<|user|>\n{content}<|end|>\n<|assistant|>{choice}<|end|>\n")
return options_texts
def extract_multi_choice_target_index(row, ds_name):
if ds_name == "boolq":
return 0 if row["answer"] is True else 1
if ds_name == "hswag":
return int(row["label"])
if (ds_name == "arc-challenge") or (ds_name == "arc-easy"):
return row["choices"]["label"].index(row["answerKey"])
if ds_name == "wg":
return int(row["answer"]) - 1
if ds_name == "oqa":
return row["choices"]["label"].index(row["answerKey"])
def set_seed(seed: int):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
def compute_loglike_loss(logits, labels, reduction="none"):
bs = logits.size(0)
vocab_size = logits.size(-1)
labels = labels.squeeze(-1)
shift_logits = logits[..., :-1, :].contiguous()
shift_labels = labels[..., 1:].contiguous()
# Flatten the tokens
loss_fct = torch.nn.CrossEntropyLoss(reduction=reduction)
shift_logits = shift_logits.view(-1, vocab_size)
shift_labels = shift_labels.view(-1)
shift_labels = shift_labels.to(shift_logits.device)
loss = loss_fct(shift_logits, shift_labels)
# reshape back
if reduction == "none":
loss = loss.view((bs, -1))
non_zero_loss = (loss != 0).sum(dim=-1)
non_zero_loss[non_zero_loss == 0] = 1
loss = loss.sum(dim=-1) / non_zero_loss
return loss.float() # Convert to float32 before returning
def evaluate_on_multi_choice_batched(
eval_dataset, model, tokenizer, ds_name, labels, predictions, args, batch_size=32, max_length=512, device="cuda"
):
# Local import to mirror your original function
model.eval()
for start in tqdm(
range(0, len(eval_dataset), batch_size), total=(len(eval_dataset) + batch_size - 1) // batch_size
):
rows = [eval_dataset[i] for i in range(start, min(start + batch_size, len(eval_dataset)))]
# Build the flattened option texts for this batch
all_texts = []
options_per_sample = [] # number of options for each sample
ctx_lens_per_option = [] # context length replicated per option
for row in rows:
# options: ["<|user|>...<|assistant|>choiceA<|end|>", ...]
options = create_multi_choice_options(row, ds_name)
options_per_sample.append(len(options))
# compute context length once per sample (align with your -1 shift)
content = extract_input_content(ds_name, row)
context_prompt = f"<|user|>\n{content}<|end|>\n<|assistant|>"
ctx_len = len(tokenizer.encode(context_prompt)) - 1
all_texts.extend(options)
ctx_lens_per_option.extend([ctx_len] * len(options))
# collect gold label
labels.append(extract_multi_choice_target_index(row, ds_name))
# Tokenize all options in one go
tokenized = tokenizer(
all_texts,
return_tensors="pt",
padding=True,
truncation=True,
max_length=max_length,
)
tokenized = {k: v.to(device) for k, v in tokenized.items()}
# Create masked labels: ignore context and padding
masked_labels = tokenized["input_ids"].clone()
for i, ctx_len in enumerate(ctx_lens_per_option):
masked_labels[i, :ctx_len] = -100
masked_labels[tokenized["attention_mask"] == 0] = -100
with torch.no_grad():
logits = model(input_ids=tokenized["input_ids"], attention_mask=tokenized["attention_mask"]).logits
# per-sequence losses
losses = compute_loglike_loss(logits, masked_labels, reduction="none").detach().cpu()
# Reduce per sample (argmin across its options)
idx = 0
for n_opt in options_per_sample:
pred = torch.argmin(losses[idx : idx + n_opt]).item()
predictions.append(pred)
idx += n_opt
print(
f"Accuracy for dataset {args.ds_name} and strategy {args.strategy} is: {accuracy_score(labels, predictions)}"
)
if __name__ == "__main__":
args = parse_args()
print(f"Selected strategy: {args.strategy}")
print(f"Dataset name: {args.ds_name}")
# Loading the tokeniser
tokenizer = AutoTokenizer.from_pretrained(
MODEL_NAME,
use_fast=True,
padding_side="right",
model_max_length=MODEL_MAX_LEN,
)
# Quantisation config
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=False,
)
# Loading the model
base_model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
torch_dtype=torch.bfloat16,
device_map="auto",
quantization_config=bnb_config,
)
# Loading the test dataset
test_dataset = read_test_dataset(args.ds_name)
print(f"{args.ds_name} is loaded with size: {len(test_dataset)}.")
labels, predictions = [], []
if args.strategy == "base":
# Batch-wise inference
with torch.no_grad():
evaluate_on_multi_choice_batched(
test_dataset,
base_model,
tokenizer,
args.ds_name,
labels,
predictions,
args,
batch_size=64, # tune this
max_length=512, # tune if options are long
device="cuda",
)
else:
general_adapter_paths = []
if args.strategy == "gks":
arrow_config = ArrowConfig(
top_k=3,
router_temperature=1.0,
use_gks=True,
)
# General adapter paths from the hub
general_adapter_paths = [
"TahaBa/phi3-mini-general-adapters/cluster0_batch16_prop1.0_langen/checkpoint-17",
"TahaBa/phi3-mini-general-adapters/cluster0_batch16_prop1.0_langfr/checkpoint-35",
"TahaBa/phi3-mini-general-adapters/cluster0_batch16_prop1.0_langger/checkpoint-17",
]
else:
arrow_config = ArrowConfig(
top_k=3,
router_temperature=1.0,
)
# Task-specific adapter paths from the hub
task_specific_adapter_paths = [f"TahaBa/phi3-mini-clustered-flan/ts_expert_{i}" for i in range(10)]
# Creating the Arrow model
model = create_arrow_model(
base_model=base_model,
task_specific_adapter_paths=task_specific_adapter_paths,
general_adapter_paths=general_adapter_paths,
arrow_config=arrow_config,
)
# Batch-wise inference
with torch.no_grad():
evaluate_on_multi_choice_batched(
test_dataset,
model,
tokenizer,
args.ds_name,
labels,
predictions,
args,
batch_size=32, # tune this
max_length=512, # tune if options are long
device="cuda",
)

View File

@ -0,0 +1,8 @@
torch
transformers
accelerate
datasets
scikit-learn
tqdm
numpy
bitsandbytes

View File

@ -32,8 +32,14 @@ from utils.args_loader import parse_args
from utils.dataset import make_dataset
detect_model = face_alignment.FaceAlignment(face_alignment.LandmarksType.TWO_D, device="cuda:0", flip_input=False)
# Determine the best available device
if torch.cuda.is_available():
device = "cuda:0"
else:
# TODO: xpu support in facealignment will be ready after this PR is merged:https://github.com/1adrianb/face-alignment/pull/371
device = "cpu"
detect_model = face_alignment.FaceAlignment(face_alignment.LandmarksType.TWO_D, device=device, flip_input=False)
# with open('./data/celebhq-text/prompt_val_blip_full.json', 'rt') as f: # fill50k, COCO
# for line in f:
# val_data = json.loads(line)

View File

@ -1,8 +1,10 @@
datasets==2.16.1
diffusers==0.17.1
transformers=>4.48.0
accelerate==0.25.0
diffusers==0.34.0
transformers==4.54.0
accelerate==1.9.0
wandb==0.16.1
scikit-image==0.22.0
opencv-python==4.9.0.80
face-alignment==1.4.1
git+https://github.com/1adrianb/face-alignment.git
huggingface_hub==0.34.3
numpy<2.0.0

View File

@ -42,7 +42,12 @@ from peft import PeftModel # noqa: E402
# Will error if the minimal version of diffusers is not installed. Remove at your own risks.
check_min_version("0.10.0.dev0")
device = torch.device("cuda:0")
if torch.xpu.is_available():
device = "xpu:0"
elif torch.cuda.is_available():
device = "cuda:0"
else:
device = "cpu"
def main(args):

View File

@ -13,7 +13,7 @@ export DATASET_NAME="oftverse/control-celeba-hq"
export CKPT_NAME="checkpoint-${ITER_NUM}"
export OUTPUT_DIR="./output/${DATASET_NAME}/${RUN_NAME}/${CKPT_NAME}"
export CONTROLNET_PATH="${OUTPUT_DIR}/controlnet/model.safetensors"
export UNET_PATH="${OUTPUT_DIR}/unet/${RUN_NAME}"
export UNET_PATH="${OUTPUT_DIR}/unet"
export RESULTS_PATH="${OUTPUT_DIR}/results"

View File

@ -215,7 +215,9 @@ def main(args):
text_encoder.to(accelerator.device, dtype=weight_dtype)
if args.enable_xformers_memory_efficient_attention:
if is_xformers_available():
if accelerator.device.type == "xpu":
logger.warning("XPU doesn't support xformers yet, xformers is not applied.")
elif is_xformers_available():
import xformers
xformers_version = version.parse(xformers.__version__)
@ -513,11 +515,17 @@ def main(args):
break
# Printing the GPU memory usage details such as allocated memory, peak memory, and total memory usage
accelerator.print(f"GPU Memory before entering the train : {b2mb(tracemalloc.begin)}")
accelerator.print(f"GPU Memory consumed at the end of the train (end-begin): {tracemalloc.used}")
accelerator.print(f"GPU Peak Memory consumed during the train (max-begin): {tracemalloc.peaked}")
accelerator.print(
f"GPU Total Peak Memory consumed during the train (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
f"{accelerator.device.type.upper()} Memory before entering the train : {b2mb(tracemalloc.begin)}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Memory consumed at the end of the train (end-begin): {tracemalloc.used}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Peak Memory consumed during the train (max-begin): {tracemalloc.peaked}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Total Peak Memory consumed during the train (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
)
accelerator.print(f"CPU Memory before entering the train : {b2mb(tracemalloc.cpu_begin)}")

View File

@ -20,7 +20,7 @@ import torch
from diffusers.configuration_utils import ConfigMixin, register_to_config
from diffusers.models.attention_processor import AttentionProcessor, AttnProcessor
from diffusers.models.modeling_utils import ModelMixin
from diffusers.models.unet_2d_blocks import (
from diffusers.models.unets.unet_2d_blocks import (
CrossAttnDownBlock2D,
DownBlock2D,
)

View File

@ -20,7 +20,7 @@ import PIL.Image
import torch
from diffusers.pipelines.controlnet.multicontrolnet import MultiControlNetModel
from diffusers.pipelines.controlnet.pipeline_controlnet import StableDiffusionControlNetPipeline
from diffusers.utils import BaseOutput, is_compiled_module, logging
from diffusers.utils import BaseOutput, logging
from torch.nn import functional as F
from utils.light_controlnet import ControlNetModel
@ -302,7 +302,7 @@ class LightControlNetPipeline(StableDiffusionControlNetPipeline):
# corresponds to doing no classifier free guidance.
do_classifier_free_guidance = guidance_scale > 1.0
controlnet = self.controlnet._orig_mod if is_compiled_module(self.controlnet) else self.controlnet
controlnet = self.controlnet._orig_mod if hasattr(self.controlnet, "_orig_mod") else self.controlnet
if isinstance(controlnet, MultiControlNetModel) and isinstance(controlnet_conditioning_scale, float):
controlnet_conditioning_scale = [controlnet_conditioning_scale] * len(controlnet.nets)
@ -426,7 +426,10 @@ class LightControlNetPipeline(StableDiffusionControlNetPipeline):
if hasattr(self, "final_offload_hook") and self.final_offload_hook is not None:
self.unet.to("cpu")
self.controlnet.to("cpu")
torch.cuda.empty_cache()
if torch.cuda.is_available():
torch.cuda.empty_cache()
elif torch.xpu.is_available():
torch.xpu.empty_cache()
if not output_type == "latent":
image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False)[0]

View File

@ -13,10 +13,12 @@ def b2mb(x):
# This context manager is used to track the peak memory usage of the process
class TorchTracemalloc:
def __enter__(self):
self.device_type = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
self.device_module = getattr(torch, self.device_type, torch.cuda)
gc.collect()
torch.cuda.empty_cache()
torch.cuda.reset_max_memory_allocated() # reset the peak gauge to zero
self.begin = torch.cuda.memory_allocated()
self.device_module.empty_cache()
self.device_module.reset_peak_memory_stats() # reset the peak gauge to zero
self.begin = self.device_module.memory_allocated()
self.process = psutil.Process()
self.cpu_begin = self.cpu_mem_used()
@ -46,9 +48,9 @@ class TorchTracemalloc:
self.peak_monitoring = False
gc.collect()
torch.cuda.empty_cache()
self.end = torch.cuda.memory_allocated()
self.peak = torch.cuda.max_memory_allocated()
self.device_module.empty_cache()
self.end = self.device_module.memory_allocated()
self.peak = self.device_module.max_memory_allocated()
self.used = b2mb(self.end - self.begin)
self.peaked = b2mb(self.peak - self.begin)

View File

@ -40,6 +40,7 @@ cd peft/examples/boft_dreambooth
Set up your environment: install PEFT, and all the required libraries. At the time of writing this guide we recommend installing PEFT from source. The following environment setup should work on A100 and H100:
### CUDA
```bash
conda create --name peft python=3.10
conda activate peft
@ -48,6 +49,16 @@ conda install xformers -c xformers
pip install -r requirements.txt
pip install git+https://github.com/huggingface/peft
```
The follwing environment setuo is validated work on Intel XPU:
### Intel XPU
```bash
conda create --name peft python=3.10
conda activate peft
pip install pip install torch==2.8.0.dev20250615+xpu torchvision==0.23.0.dev20250615+xpu torchaudio==2.8.0.dev20250615+xpu --index-url https://download.pytorch.org/whl/nightly/xpu --no-cache-dir
pip install -r requirements.txt
pip install git+https://github.com/huggingface/peft
```
## Download the data

View File

@ -44,8 +44,10 @@
"outputs": [],
"source": [
"def get_boft_sd_pipeline(\n",
" ckpt_dir, base_model_name_or_path=None, epoch=int, dtype=torch.float32, device=\"cuda\", adapter_name=\"default\"\n",
" ckpt_dir, base_model_name_or_path=None, epoch=int, dtype=torch.float32, device=\"auto\", adapter_name=\"default\"\n",
"):\n",
" if device == \"auto\":\n",
" device = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
"\n",
" if base_model_name_or_path is None:\n",
" raise ValueError(\"Please specify the base model name or path\")\n",
@ -152,14 +154,6 @@
"image = pipe(prompt, num_inference_steps=50, guidance_scale=7, negative_prompt=negative_prompt).images[0]\n",
"image"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f534eca2-94a4-432b-b092-7149ac44b12f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {

View File

@ -1,13 +1,13 @@
transformers=>4.48.0
accelerate==0.25.0
transformers==4.54.0
accelerate==1.9.0
evaluate
tqdm
datasets==2.16.1
diffusers==0.17.1
datasets==4.0.0
diffusers==0.34.0
Pillow
huggingface_hub
safetensors
nb_conda_kernels
ipykernel
ipywidgets
wandb==0.16.1
wandb==0.21.0

View File

@ -139,7 +139,7 @@ def main(args):
cur_class_images = len(list(class_images_dir.iterdir()))
if cur_class_images < args.num_class_images:
torch_dtype = torch.float16 if accelerator.device.type == "cuda" else torch.float32
torch_dtype = torch.float16 if accelerator.device.type in ["cuda", "xpu"] else torch.float32
if args.prior_generation_precision == "fp32":
torch_dtype = torch.float32
elif args.prior_generation_precision == "fp16":
@ -176,6 +176,8 @@ def main(args):
del pipeline
if torch.cuda.is_available():
torch.cuda.empty_cache()
elif torch.xpu.is_available():
torch.xpu.empty_cache()
# Handle the repository creation
if accelerator.is_main_process:
@ -263,7 +265,9 @@ def main(args):
text_encoder.to(accelerator.device, dtype=weight_dtype)
if args.enable_xformers_memory_efficient_attention:
if is_xformers_available():
if accelerator.device.type == "xpu":
logger.warn("XPU hasn't support xformers yet, ignore it.")
elif is_xformers_available():
unet.enable_xformers_memory_efficient_attention()
else:
raise ValueError("xformers is not available. Make sure it is installed correctly")
@ -276,7 +280,7 @@ def main(args):
# Enable TF32 for faster training on Ampere GPUs,
# cf https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices
if args.allow_tf32:
if args.allow_tf32 and torch.cuda.is_available():
torch.backends.cuda.matmul.allow_tf32 = True
if args.scale_lr:
@ -581,18 +585,27 @@ def main(args):
)
del pipeline
torch.cuda.empty_cache()
if torch.cuda.is_available():
torch.cuda.empty_cache()
elif torch.xpu.is_available():
torch.xpu.empty_cache()
if global_step >= args.max_train_steps:
break
# Printing the GPU memory usage details such as allocated memory, peak memory, and total memory usage
# Printing the accelerator memory usage details such as allocated memory, peak memory, and total memory usage
if not args.no_tracemalloc:
accelerator.print(f"GPU Memory before entering the train : {b2mb(tracemalloc.begin)}")
accelerator.print(f"GPU Memory consumed at the end of the train (end-begin): {tracemalloc.used}")
accelerator.print(f"GPU Peak Memory consumed during the train (max-begin): {tracemalloc.peaked}")
accelerator.print(
f"GPU Total Peak Memory consumed during the train (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
f"{accelerator.device.type.upper()} Memory before entering the train : {b2mb(tracemalloc.begin)}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Memory consumed at the end of the train (end-begin): {tracemalloc.used}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Peak Memory consumed during the train (max-begin): {tracemalloc.peaked}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Total Peak Memory consumed during the train (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
)
accelerator.print(f"CPU Memory before entering the train : {b2mb(tracemalloc.cpu_begin)}")

View File

@ -13,10 +13,12 @@ def b2mb(x):
# This context manager is used to track the peak memory usage of the process
class TorchTracemalloc:
def __enter__(self):
self.device_type = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
self.device_module = getattr(torch, self.device_type, torch.cuda)
gc.collect()
torch.cuda.empty_cache()
torch.cuda.reset_max_memory_allocated() # reset the peak gauge to zero
self.begin = torch.cuda.memory_allocated()
self.device_module.empty_cache()
self.device_module.reset_peak_memory_stats() # reset the peak gauge to zero
self.begin = self.device_module.memory_allocated()
self.process = psutil.Process()
self.cpu_begin = self.cpu_mem_used()
@ -46,9 +48,9 @@ class TorchTracemalloc:
self.peak_monitoring = False
gc.collect()
torch.cuda.empty_cache()
self.end = torch.cuda.memory_allocated()
self.peak = torch.cuda.max_memory_allocated()
self.device_module.empty_cache()
self.end = self.device_module.memory_allocated()
self.peak = self.device_module.max_memory_allocated()
self.used = b2mb(self.end - self.begin)
self.peaked = b2mb(self.peak - self.begin)

View File

@ -33,7 +33,7 @@ trainer = SFTTrainer(
model=peft_model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
processing_class=tokenizer,
)
trainer.train()
peft_model.save_pretrained("bone-llama-2-7b")

View File

@ -90,7 +90,7 @@ trainer = SFTTrainer(
model=peft_model,
args=script_args,
train_dataset=dataset,
tokenizer=tokenizer,
processing_class=tokenizer,
)
trainer.train()
trainer.save_state()

View File

@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "71fbfca2",
"metadata": {},
"outputs": [],
@ -16,10 +16,9 @@
"from torch.utils.data import DataLoader\n",
"from transformers import default_data_collator, get_linear_schedule_with_warmup\n",
"from tqdm import tqdm\n",
"from datasets import load_dataset\n",
"\n",
"# Hyper-parameters\n",
"device = \"cuda\"\n",
"device = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
"model_name_or_path = \"bigscience/bloomz-560m\"\n",
"tokenizer_name_or_path = \"bigscience/bloomz-560m\"\n",
"peft_config = LNTuningConfig(\n",
@ -48,7 +47,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"id": "e1a3648b",
"metadata": {},
"outputs": [
@ -84,9 +83,13 @@
}
],
"source": [
"from datasets import load_dataset\n",
"\n",
"dataset = load_dataset(\"ought/raft\", dataset_name)\n",
"dataset = load_dataset(\n",
" \"parquet\",\n",
" data_files={\n",
" \"train\": f\"hf://datasets/ought/raft@refs/convert/parquet/{dataset_name}/train/0000.parquet\",\n",
" \"test\": f\"hf://datasets/ought/raft@refs/convert/parquet/{dataset_name}/test/0000.parquet\"\n",
" }\n",
")\n",
"\n",
"classes = [k.replace(\"_\", \" \") for k in dataset[\"train\"].features[\"Label\"].names]\n",
"print(classes)\n",

View File

@ -1,481 +0,0 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "71fbfca2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"===================================BUG REPORT===================================\n",
"Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n",
"For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link\n",
"================================================================================\n",
"CUDA SETUP: CUDA runtime path found: /home/sourab/miniconda3/envs/ml/lib/libcudart.so\n",
"CUDA SETUP: Highest compute capability among GPUs detected: 7.5\n",
"CUDA SETUP: Detected CUDA version 117\n",
"CUDA SETUP: Loading binary /home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...\n"
]
}
],
"source": [
"from transformers import AutoModelForCausalLM\n",
"from peft import PeftModel, PeftConfig\n",
"import torch\n",
"from datasets import load_dataset\n",
"import os\n",
"from transformers import AutoTokenizer\n",
"from torch.utils.data import DataLoader\n",
"from transformers import default_data_collator, get_linear_schedule_with_warmup\n",
"from tqdm import tqdm\n",
"from datasets import load_dataset\n",
"\n",
"device = \"cuda\"\n",
"model_name_or_path = \"bigscience/bloomz-7b1\"\n",
"tokenizer_name_or_path = \"bigscience/bloomz-7b1\"\n",
"dataset_name = \"twitter_complaints\"\n",
"text_column = \"Tweet text\"\n",
"label_column = \"text_label\"\n",
"max_length = 64\n",
"lr = 1e-3\n",
"num_epochs = 50\n",
"batch_size = 8"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e1a3648b",
"metadata": {},
"outputs": [],
"source": [
"from datasets import load_dataset\n",
"\n",
"dataset = load_dataset(\"ought/raft\", dataset_name)\n",
"\n",
"classes = [k.replace(\"_\", \" \") for k in dataset[\"train\"].features[\"Label\"].names]\n",
"print(classes)\n",
"dataset = dataset.map(\n",
" lambda x: {\"text_label\": [classes[label] for label in x[\"Label\"]]},\n",
" batched=True,\n",
" num_proc=1,\n",
")\n",
"print(dataset)\n",
"dataset[\"train\"][0]"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "fe12d4d3",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "10cabeec92ab428f9a660ebaecbaf865",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Running tokenizer on dataset: 0%| | 0/1 [00:00<?, ?ba/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "8a344e989ab34c71b230acee68b477e8",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Running tokenizer on dataset: 0%| | 0/4 [00:00<?, ?ba/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# data preprocessing\n",
"tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)\n",
"if tokenizer.pad_token_id is None:\n",
" tokenizer.pad_token_id = tokenizer.eos_token_id\n",
"target_max_length = max([len(tokenizer(class_label)[\"input_ids\"]) for class_label in classes])\n",
"print(target_max_length)\n",
"\n",
"\n",
"def preprocess_function(examples):\n",
" batch_size = len(examples[text_column])\n",
" inputs = [f\"{text_column} : {x} Label : \" for x in examples[text_column]]\n",
" targets = [str(x) for x in examples[label_column]]\n",
" model_inputs = tokenizer(inputs)\n",
" labels = tokenizer(targets, add_special_tokens=False) # don't add bos token because we concatenate with inputs\n",
" for i in range(batch_size):\n",
" sample_input_ids = model_inputs[\"input_ids\"][i]\n",
" label_input_ids = labels[\"input_ids\"][i] + [tokenizer.eos_token_id]\n",
" # print(i, sample_input_ids, label_input_ids)\n",
" model_inputs[\"input_ids\"][i] = sample_input_ids + label_input_ids\n",
" labels[\"input_ids\"][i] = [-100] * len(sample_input_ids) + label_input_ids\n",
" model_inputs[\"attention_mask\"][i] = [1] * len(model_inputs[\"input_ids\"][i])\n",
" # print(model_inputs)\n",
" for i in range(batch_size):\n",
" sample_input_ids = model_inputs[\"input_ids\"][i]\n",
" label_input_ids = labels[\"input_ids\"][i]\n",
" model_inputs[\"input_ids\"][i] = [tokenizer.pad_token_id] * (\n",
" max_length - len(sample_input_ids)\n",
" ) + sample_input_ids\n",
" model_inputs[\"attention_mask\"][i] = [0] * (max_length - len(sample_input_ids)) + model_inputs[\n",
" \"attention_mask\"\n",
" ][i]\n",
" labels[\"input_ids\"][i] = [-100] * (max_length - len(sample_input_ids)) + label_input_ids\n",
" model_inputs[\"input_ids\"][i] = torch.tensor(model_inputs[\"input_ids\"][i][:max_length])\n",
" model_inputs[\"attention_mask\"][i] = torch.tensor(model_inputs[\"attention_mask\"][i][:max_length])\n",
" labels[\"input_ids\"][i] = torch.tensor(labels[\"input_ids\"][i][:max_length])\n",
" model_inputs[\"labels\"] = labels[\"input_ids\"]\n",
" return model_inputs\n",
"\n",
"\n",
"processed_datasets = dataset.map(\n",
" preprocess_function,\n",
" batched=True,\n",
" num_proc=1,\n",
" remove_columns=dataset[\"train\"].column_names,\n",
" load_from_cache_file=False,\n",
" desc=\"Running tokenizer on dataset\",\n",
")\n",
"\n",
"train_dataset = processed_datasets[\"train\"]\n",
"\n",
"\n",
"train_dataloader = DataLoader(\n",
" train_dataset, shuffle=True, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2795b9d0",
"metadata": {},
"outputs": [],
"source": [
"def test_preprocess_function(examples):\n",
" batch_size = len(examples[text_column])\n",
" inputs = [f\"{text_column} : {x} Label : \" for x in examples[text_column]]\n",
" model_inputs = tokenizer(inputs)\n",
" # print(model_inputs)\n",
" for i in range(batch_size):\n",
" sample_input_ids = model_inputs[\"input_ids\"][i]\n",
" model_inputs[\"input_ids\"][i] = [tokenizer.pad_token_id] * (\n",
" max_length - len(sample_input_ids)\n",
" ) + sample_input_ids\n",
" model_inputs[\"attention_mask\"][i] = [0] * (max_length - len(sample_input_ids)) + model_inputs[\n",
" \"attention_mask\"\n",
" ][i]\n",
" model_inputs[\"input_ids\"][i] = torch.tensor(model_inputs[\"input_ids\"][i][:max_length])\n",
" model_inputs[\"attention_mask\"][i] = torch.tensor(model_inputs[\"attention_mask\"][i][:max_length])\n",
" return model_inputs\n",
"\n",
"\n",
"processed_datasets = dataset.map(\n",
" test_preprocess_function,\n",
" batched=True,\n",
" num_proc=1,\n",
" remove_columns=dataset[\"train\"].column_names,\n",
" load_from_cache_file=False,\n",
" desc=\"Running tokenizer on dataset\",\n",
")\n",
"\n",
"eval_dataset = processed_datasets[\"train\"]\n",
"test_dataset = processed_datasets[\"test\"]\n",
"\n",
"eval_dataloader = DataLoader(eval_dataset, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True)\n",
"test_dataloader = DataLoader(test_dataset, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True)\n",
"print(next(iter(eval_dataloader)))\n",
"print(next(iter(test_dataloader)))"
]
},
{
"cell_type": "markdown",
"id": "42b14a11",
"metadata": {},
"source": [
"You can load model from hub or local\n",
"\n",
"- Load model from Hugging Face Hub, you can change to your own model id\n",
"```python\n",
"peft_model_id = \"username/twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM\"\n",
"```\n",
"- Or load model form local\n",
"```python\n",
"peft_model_id = \"twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM\"\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "9caac014",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/sourab/pet/src/peft/tuners/lora.py:143: UserWarning: fan_in_fan_out is set to True but the target module is not a Conv1D. Setting fan_in_fan_out to False.\n",
" warnings.warn(\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "bc38030106a14173a1363eb1ee388eda",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Downloading: 0%| | 0.00/15.8M [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from peft import PeftModel, PeftConfig\n",
"\n",
"max_memory = {0: \"1GIB\", 1: \"1GIB\", 2: \"2GIB\", 3: \"10GIB\", \"cpu\": \"30GB\"}\n",
"peft_model_id = \"smangrul/twitter_complaints_bigscience_bloomz-7b1_LORA_CAUSAL_LM\"\n",
"config = PeftConfig.from_pretrained(peft_model_id)\n",
"model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, device_map=\"auto\", max_memory=max_memory)\n",
"model = PeftModel.from_pretrained(model, peft_model_id, device_map=\"auto\", max_memory=max_memory)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "6fac10b5",
"metadata": {},
"outputs": [],
"source": [
"# model"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "2a08ee6d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'base_model.model.transformer.word_embeddings': 3,\n",
" 'base_model.model.lm_head': 3,\n",
" 'base_model.model.transformer.word_embeddings_layernorm': 3,\n",
" 'base_model.model.transformer.h.0': 3,\n",
" 'base_model.model.transformer.h.1': 3,\n",
" 'base_model.model.transformer.h.2': 3,\n",
" 'base_model.model.transformer.h.3': 3,\n",
" 'base_model.model.transformer.h.4': 3,\n",
" 'base_model.model.transformer.h.5': 3,\n",
" 'base_model.model.transformer.h.6': 3,\n",
" 'base_model.model.transformer.h.7': 3,\n",
" 'base_model.model.transformer.h.8': 'cpu',\n",
" 'base_model.model.transformer.h.9': 'cpu',\n",
" 'base_model.model.transformer.h.10': 'cpu',\n",
" 'base_model.model.transformer.h.11': 'cpu',\n",
" 'base_model.model.transformer.h.12': 'cpu',\n",
" 'base_model.model.transformer.h.13': 'cpu',\n",
" 'base_model.model.transformer.h.14': 'cpu',\n",
" 'base_model.model.transformer.h.15': 'cpu',\n",
" 'base_model.model.transformer.h.16': 'cpu',\n",
" 'base_model.model.transformer.h.17': 'cpu',\n",
" 'base_model.model.transformer.h.18': 'cpu',\n",
" 'base_model.model.transformer.h.19': 'cpu',\n",
" 'base_model.model.transformer.h.20': 'cpu',\n",
" 'base_model.model.transformer.h.21': 'cpu',\n",
" 'base_model.model.transformer.h.22': 'cpu',\n",
" 'base_model.model.transformer.h.23': 'cpu',\n",
" 'base_model.model.transformer.h.24': 'cpu',\n",
" 'base_model.model.transformer.h.25': 'cpu',\n",
" 'base_model.model.transformer.h.26': 'cpu',\n",
" 'base_model.model.transformer.h.27': 'cpu',\n",
" 'base_model.model.transformer.h.28': 'cpu',\n",
" 'base_model.model.transformer.h.29': 'cpu',\n",
" 'base_model.model.transformer.ln_f': 'cpu'}"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.hf_device_map"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "b33be5e6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"@HondaCustSvc Your customer service has been horrible during the recall process. I will never purchase a Honda again.\n",
"{'input_ids': tensor([[227985, 5484, 915, 2566, 216744, 38, 1316, 54, 42705,\n",
" 32465, 52166, 9440, 1809, 3784, 88483, 9411, 368, 84342,\n",
" 4451, 17, 473, 2152, 11705, 82406, 267, 51591, 5734,\n",
" 17, 77658, 915, 210]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1]])}\n",
"tensor([[227985, 5484, 915, 2566, 216744, 38, 1316, 54, 42705,\n",
" 32465, 52166, 9440, 1809, 3784, 88483, 9411, 368, 84342,\n",
" 4451, 17, 473, 2152, 11705, 82406, 267, 51591, 5734,\n",
" 17, 77658, 915, 210, 16449, 5952, 3, 3, 3,\n",
" 3, 3, 3, 3, 3]])\n",
"['Tweet text : @HondaCustSvc Your customer service has been horrible during the recall process. I will never purchase a Honda again. Label : complaint']\n"
]
}
],
"source": [
"model.eval()\n",
"i = 89\n",
"inputs = tokenizer(f'{text_column} : {dataset[\"test\"][i][\"Tweet text\"]} Label : ', return_tensors=\"pt\")\n",
"print(dataset[\"test\"][i][\"Tweet text\"])\n",
"print(inputs)\n",
"\n",
"with torch.no_grad():\n",
" outputs = model.generate(input_ids=inputs[\"input_ids\"], max_new_tokens=10)\n",
" print(outputs)\n",
" print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "b6d6cd5b",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [01:42<00:00, 14.70s/it]\n"
]
}
],
"source": [
"model.eval()\n",
"eval_preds = []\n",
"for _, batch in enumerate(tqdm(eval_dataloader)):\n",
" batch = {k: v for k, v in batch.items() if k != \"labels\"}\n",
" with torch.no_grad():\n",
" outputs = model.generate(**batch, max_new_tokens=10)\n",
" preds = outputs[:, max_length:].detach().cpu().numpy()\n",
" eval_preds.extend(tokenizer.batch_decode(preds, skip_special_tokens=True))"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "61264abe",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"accuracy=100.0\n",
"eval_preds[:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']\n",
"dataset['train'][label_column][:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']\n"
]
}
],
"source": [
"correct = 0\n",
"total = 0\n",
"for pred, true in zip(eval_preds, dataset[\"train\"][label_column]):\n",
" if pred.strip() == true.strip():\n",
" correct += 1\n",
" total += 1\n",
"accuracy = correct / total * 100\n",
"print(f\"{accuracy=}\")\n",
"print(f\"{eval_preds[:10]=}\")\n",
"print(f\"{dataset['train'][label_column][:10]=}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a70802a3",
"metadata": {},
"outputs": [],
"source": [
"model.eval()\n",
"test_preds = []\n",
"\n",
"for _, batch in enumerate(tqdm(test_dataloader)):\n",
" batch = {k: v for k, v in batch.items() if k != \"labels\"}\n",
" with torch.no_grad():\n",
" outputs = model.generate(**batch, max_new_tokens=10)\n",
" preds = outputs[:, max_length:].detach().cpu().numpy()\n",
" test_preds.extend(tokenizer.batch_decode(preds, skip_special_tokens=True))\n",
" if len(test_preds) > 100:\n",
" break\n",
"test_preds"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e1c4ad9c",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
},
"vscode": {
"interpreter": {
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -61,9 +61,11 @@ def b2mb(x):
class TorchTracemalloc:
def __enter__(self):
gc.collect()
torch.cuda.empty_cache()
torch.cuda.reset_max_memory_allocated() # reset the peak gauge to zero
self.begin = torch.cuda.memory_allocated()
self.device_type = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
self.device_module = getattr(torch, self.device_type, torch.cuda)
self.device_module.empty_cache()
self.device_module.reset_peak_memory_stats() # reset the peak gauge to zero
self.begin = self.device_module.memory_allocated()
self.process = psutil.Process()
self.cpu_begin = self.cpu_mem_used()
@ -93,9 +95,9 @@ class TorchTracemalloc:
self.peak_monitoring = False
gc.collect()
torch.cuda.empty_cache()
self.end = torch.cuda.memory_allocated()
self.peak = torch.cuda.max_memory_allocated()
self.device_module.empty_cache()
self.end = self.device_module.memory_allocated()
self.peak = self.device_module.max_memory_allocated()
self.used = b2mb(self.end - self.begin)
self.peaked = b2mb(self.peak - self.begin)
@ -120,7 +122,13 @@ def main():
do_test = False
set_seed(seed)
dataset = load_dataset("ought/raft", dataset_name)
dataset = load_dataset(
"parquet",
data_files={
"train": f"hf://datasets/ought/raft@refs/convert/parquet/{dataset_name}/train/0000.parquet",
"test": f"hf://datasets/ought/raft@refs/convert/parquet/{dataset_name}/test/0000.parquet",
},
)
classes = [k.replace("_", " ") for k in dataset["train"].features["Label"].names]
dataset = dataset.map(
lambda x: {"text_label": [classes[label] for label in x["Label"]]},
@ -162,7 +170,6 @@ def main():
batch_size = len(examples[text_column])
inputs = [f"{text_column} : {x} Label : " for x in examples[text_column]]
model_inputs = tokenizer(inputs)
# print(model_inputs)
for i in range(batch_size):
sample_input_ids = model_inputs["input_ids"][i]
model_inputs["input_ids"][i] = [tokenizer.pad_token_id] * (
@ -248,12 +255,18 @@ def main():
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
# Printing the GPU memory usage details such as allocated memory, peak memory, and total memory usage
accelerator.print(f"GPU Memory before entering the train : {b2mb(tracemalloc.begin)}")
accelerator.print(f"GPU Memory consumed at the end of the train (end-begin): {tracemalloc.used}")
accelerator.print(f"GPU Peak Memory consumed during the train (max-begin): {tracemalloc.peaked}")
# Printing the memory usage details such as allocated memory, peak memory, and total memory usage
accelerator.print(
f"GPU Total Peak Memory consumed during the train (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
f"{accelerator.device.type.upper()} Memory before entering the train : {b2mb(tracemalloc.begin)}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Memory consumed at the end of the train (end-begin): {tracemalloc.used}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Peak Memory consumed during the train (max-begin): {tracemalloc.peaked}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Total Peak Memory consumed during the train (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
)
accelerator.print(f"CPU Memory before entering the train : {b2mb(tracemalloc.cpu_begin)}")
@ -280,12 +293,18 @@ def main():
preds = preds[:, max_length:].detach().cpu().numpy()
eval_preds.extend(tokenizer.batch_decode(preds, skip_special_tokens=True))
# Printing the GPU memory usage details such as allocated memory, peak memory, and total memory usage
accelerator.print(f"GPU Memory before entering the eval : {b2mb(tracemalloc.begin)}")
accelerator.print(f"GPU Memory consumed at the end of the eval (end-begin): {tracemalloc.used}")
accelerator.print(f"GPU Peak Memory consumed during the eval (max-begin): {tracemalloc.peaked}")
# Printing the memory usage details such as allocated memory, peak memory, and total memory usage
accelerator.print(
f"GPU Total Peak Memory consumed during the eval (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
f"{accelerator.device.type.upper()} Memory before entering the eval : {b2mb(tracemalloc.begin)}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Memory consumed at the end of the eval (end-begin): {tracemalloc.used}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Peak Memory consumed during the eval (max-begin): {tracemalloc.peaked}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Total Peak Memory consumed during the eval (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
)
accelerator.print(f"CPU Memory before entering the eval : {b2mb(tracemalloc.cpu_begin)}")

View File

@ -26,14 +26,13 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "6f864c90",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"3\"\n",
"os.environ[\"WANDB_PROJECT\"] = \"PeftExamples\"\n",
"import transformers\n",
"from peft import (\n",
@ -740,7 +739,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"id": "71851793",
"metadata": {},
"outputs": [
@ -763,7 +762,8 @@
"context = dataset[\"test\"][i][\"context\"]\n",
"\n",
"batch = tokenizer(context, return_tensors=\"pt\")\n",
"batch = {k: v.to(\"cuda\") for k, v in batch.items()}\n",
"device = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
"batch = {k: v.to(device) for k, v in batch.items()}\n",
"model.eval()\n",
"output_tokens = model.generate(\n",
" **batch,\n",
@ -892,7 +892,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"id": "589c46d7-d567-40b4-ab7d-e0a9e1cab40e",
"metadata": {},
"outputs": [
@ -961,7 +961,7 @@
"inference_model.resize_token_embeddings(len(tokenizer))\n",
"\n",
"inference_model = PeftModel.from_pretrained(inference_model, \"smangrul/mistral_lora_clm_with_added_tokens\")\n",
"inference_model.to(\"cuda\")\n",
"inference_model.to(device)\n",
"inference_model.eval()\n",
"\n",
"output_tokens = inference_model.generate(\n",

View File

@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"id": "71fbfca2",
"metadata": {},
"outputs": [],
@ -16,9 +16,8 @@
"from torch.utils.data import DataLoader\n",
"from transformers import default_data_collator, get_linear_schedule_with_warmup\n",
"from tqdm import tqdm\n",
"from datasets import load_dataset\n",
"\n",
"device = \"cuda\"\n",
"device = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
"model_name_or_path = \"bigscience/bloomz-560m\"\n",
"tokenizer_name_or_path = \"bigscience/bloomz-560m\"\n",
"peft_config = PrefixTuningConfig(task_type=TaskType.CAUSAL_LM, num_virtual_tokens=30)\n",
@ -37,7 +36,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"id": "e1a3648b",
"metadata": {},
"outputs": [
@ -102,9 +101,14 @@
}
],
"source": [
"from datasets import load_dataset\n",
"dataset = load_dataset(\n",
" \"parquet\",\n",
" data_files={\n",
" \"train\": f\"hf://datasets/ought/raft@refs/convert/parquet/{dataset_name}/train/0000.parquet\",\n",
" \"test\": f\"hf://datasets/ought/raft@refs/convert/parquet/{dataset_name}/test/0000.parquet\"\n",
" }\n",
")\n",
"\n",
"dataset = load_dataset(\"ought/raft\", dataset_name)\n",
"\n",
"classes = [k.replace(\"_\", \" \") for k in dataset[\"train\"].features[\"Label\"].names]\n",
"print(classes)\n",
@ -318,24 +322,6 @@
"model.print_trainable_parameters()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "bd419634",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"trainable params: 1474560 || all params: 560689152 || trainable%: 0.26299064191632515\n"
]
}
],
"source": [
"model.print_trainable_parameters()"
]
},
{
"cell_type": "code",
"execution_count": null,
@ -1276,7 +1262,7 @@
"metadata": {},
"outputs": [],
"source": [
"ckpt = f\"{peft_model_id}/adapter_model.bin\"\n",
"ckpt = f\"{peft_model_id}/adapter_model.safetensors\"\n",
"!du -h $ckpt"
]
},

View File

@ -16,9 +16,8 @@
"from torch.utils.data import DataLoader\n",
"from transformers import default_data_collator, get_linear_schedule_with_warmup\n",
"from tqdm import tqdm\n",
"from datasets import load_dataset\n",
"\n",
"device = \"cuda\"\n",
"device = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
"model_name_or_path = \"bigscience/bloomz-560m\"\n",
"tokenizer_name_or_path = \"bigscience/bloomz-560m\"\n",
"peft_config = PromptTuningConfig(\n",
@ -48,9 +47,13 @@
"metadata": {},
"outputs": [],
"source": [
"from datasets import load_dataset\n",
"\n",
"dataset = load_dataset(\"ought/raft\", dataset_name)\n",
"dataset = load_dataset(\n",
" \"parquet\",\n",
" data_files={\n",
" \"train\": f\"hf://datasets/ought/raft@refs/convert/parquet/{dataset_name}/train/0000.parquet\",\n",
" \"test\": f\"hf://datasets/ought/raft@refs/convert/parquet/{dataset_name}/test/0000.parquet\"\n",
" }\n",
")\n",
"\n",
"classes = [k.replace(\"_\", \" \") for k in dataset[\"train\"].features[\"Label\"].names]\n",
"print(classes)\n",
@ -1115,24 +1118,12 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"id": "4928c7f1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
"To disable this warning, you can either:\n",
"\t- Avoid using `tokenizers` before the fork if possible\n",
"\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n",
"36K\tbigscience/bloomz-560m_PROMPT_TUNING_CAUSAL_LM/adapter_model.bin\n"
]
}
],
"outputs": [],
"source": [
"ckpt = f\"{peft_model_id}/adapter_model.bin\"\n",
"ckpt = f\"{peft_model_id}/adapter_model.safetensors\"\n",
"!du -h $ckpt"
]
},

View File

@ -1,6 +1,7 @@
transformers
transformers<4.54.0
accelerate
evaluate
deepspeed
tqdm
datasets
dataclass-csv
datasets==3.6.0

View File

@ -9,12 +9,13 @@
},
"outputs": [],
"source": [
"import torch\n",
"from datasets import load_dataset\n",
"from transformers import set_seed, AutoModelForSeq2SeqLM, AutoTokenizer\n",
"from peft import get_peft_model, MultitaskPromptTuningConfig, TaskType, MultitaskPromptTuningInit\n",
"\n",
"set_seed(42)\n",
"\n",
"device = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
"model_name = \"google/flan-t5-base\"\n",
"\n",
"peft_config = MultitaskPromptTuningConfig(\n",
@ -31,18 +32,18 @@
"model = AutoModelForSeq2SeqLM.from_pretrained(model_name)\n",
"model = get_peft_model(model, peft_config)\n",
"\n",
"model = model.cuda()\n",
"model = model.to(device)\n",
"\n",
"\n",
"def send_to_device(batch):\n",
" for i in batch:\n",
" batch[i] = batch[i].cuda()\n",
" batch[i] = batch[i].to(device)\n",
" return batch"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 9,
"id": "eb112bc1-ffaf-49fa-a216-0d601ec304ee",
"metadata": {
"tags": []
@ -86,7 +87,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 10,
"id": "e5a16ec4-8fef-4ba9-95b6-a661eb51e50c",
"metadata": {
"tags": []
@ -159,7 +160,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 11,
"id": "cceecc94-f43a-4f62-8d45-926f2f02f36d",
"metadata": {
"tags": []
@ -293,7 +294,7 @@
" num_tasks=1,\n",
" task_type=TaskType.SEQ_2_SEQ_LM,\n",
" prompt_tuning_init=MultitaskPromptTuningInit.EXACT_SOURCE_TASK,\n",
" prompt_tuning_init_state_dict_path=\"checkpoints_source/50000/adapter_model.bin\",\n",
" prompt_tuning_init_state_dict_path=\"checkpoints_source/50000/adapter_model.safetensors\",\n",
" num_virtual_tokens=50,\n",
" num_transformer_submodules=1,\n",
")\n",
@ -302,7 +303,7 @@
"model = AutoModelForSeq2SeqLM.from_pretrained(model_name)\n",
"model = get_peft_model(model, peft_config)\n",
"\n",
"model = model.cuda()"
"model = model.to(device)"
]
},
{
@ -360,8 +361,9 @@
"source": [
"# load last checkpoint for now\n",
"from peft import set_peft_model_state_dict\n",
"from safetensors.torch import load_file\n",
"\n",
"sd_6000 = torch.load(\"checkpoints_target/6000/adapter_model.bin\")\n",
"sd_6000 = load_file(\"checkpoints_target/6000/adapter_model.safetensors\")\n",
"set_peft_model_state_dict(model, sd_6000)\n",
"\n",
"# evaluate val\n",
@ -382,6 +384,22 @@
"f1 = {f1}\"\"\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1d18325c-9607-4cb5-a5b0-5b44dfee2a75",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "43988e92-af42-45cb-8bca-f19c193ad04f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@ -400,7 +418,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.13"
"version": "3.11.13"
}
},
"nbformat": 4,

View File

@ -11,7 +11,7 @@ from peft import AdaLoraConfig, PeftConfig, PeftModel, TaskType, get_peft_model
os.environ["TOKENIZERS_PARALLELISM"] = "false"
device = "cuda"
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
model_name_or_path = "facebook/bart-base"
tokenizer_name_or_path = "facebook/bart-base"
@ -24,6 +24,20 @@ num_epochs = 8
batch_size = 8
# loading dataset
dataset = load_dataset("financial_phrasebank", "sentences_allagree")
dataset = dataset["train"].train_test_split(test_size=0.1)
dataset["validation"] = dataset["test"]
del dataset["test"]
classes = dataset["train"].features["label"].names
dataset = dataset.map(
lambda x: {"text_label": [classes[label] for label in x["label"]]},
batched=True,
num_proc=1,
)
# creating model
peft_config = AdaLoraConfig(
init_r=12,
@ -37,6 +51,7 @@ peft_config = AdaLoraConfig(
lora_dropout=0.1,
task_type=TaskType.SEQ_2_SEQ_LM,
inference_mode=False,
total_step=len(dataset["train"]) * num_epochs,
)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
@ -44,20 +59,6 @@ model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
# loading dataset
dataset = load_dataset("financial_phrasebank", "sentences_allagree")
dataset = dataset["train"].train_test_split(test_size=0.1)
dataset["validation"] = dataset["test"]
del dataset["test"]
classes = dataset["train"].features["label"].names
dataset = dataset.map(
lambda x: {"text_label": [classes[label] for label in x["label"]]},
batched=True,
num_proc=1,
)
# data preprocessing
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
@ -159,7 +160,7 @@ peft_model_id = f"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task
model.save_pretrained(peft_model_id)
ckpt = f"{peft_model_id}/adapter_model.bin"
ckpt = f"{peft_model_id}/adapter_model.safetensors"
# get_ipython().system('du -h $ckpt')

View File

@ -2,7 +2,8 @@
"cells": [
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"id": "0c152fc8",
"metadata": {
"id": "5f93b7d1"
},
@ -22,7 +23,7 @@
"from tqdm import tqdm\n",
"from datasets import load_dataset\n",
"\n",
"device = \"cuda\"\n",
"device = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
"model_name_or_path = \"bigscience/mt0-large\"\n",
"tokenizer_name_or_path = \"bigscience/mt0-large\"\n",
"\n",
@ -37,7 +38,8 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 2,
"id": "4e23624f",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
@ -49,10 +51,10 @@
{
"data": {
"text/plain": [
"<module 'peft' from '/usr/local/lib/python3.10/dist-packages/peft/__init__.py'>"
"<module 'peft' from '/usr/local/lib/python3.11/dist-packages/peft/__init__.py'>"
]
},
"execution_count": 13,
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
@ -65,7 +67,8 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": null,
"id": "da74b569",
"metadata": {
"id": "8d0850ac"
},
@ -79,7 +82,8 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 4,
"id": "df33fce2",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
@ -233,7 +237,7 @@
")"
]
},
"execution_count": 15,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
@ -244,7 +248,8 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 5,
"id": "63d7bc2d",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
@ -257,7 +262,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"trainable params: 282,624 || all params: 1,229,863,936 || trainable%: 0.022980103060766553\n"
"trainable params: 282,624 || all params: 1,229,863,936 || trainable%: 0.0230\n"
]
},
{
@ -276,11 +281,11 @@
" (SelfAttention): MT5Attention(\n",
" (q): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (k): Linear(\n",
" in_features=1024, out_features=1024, bias=False\n",
" (base_layer): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (ia3_l): ParameterDict( (default): Parameter containing: [torch.FloatTensor of size 1024x1])\n",
" )\n",
" (v): Linear(\n",
" in_features=1024, out_features=1024, bias=False\n",
" (base_layer): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (ia3_l): ParameterDict( (default): Parameter containing: [torch.FloatTensor of size 1024x1])\n",
" )\n",
" (o): Linear(in_features=1024, out_features=1024, bias=False)\n",
@ -293,7 +298,7 @@
" (DenseReluDense): MT5DenseGatedActDense(\n",
" (wi_0): Linear(in_features=1024, out_features=2816, bias=False)\n",
" (wi_1): Linear(\n",
" in_features=1024, out_features=2816, bias=False\n",
" (base_layer): Linear(in_features=1024, out_features=2816, bias=False)\n",
" (ia3_l): ParameterDict( (default): Parameter containing: [torch.FloatTensor of size 2816x1])\n",
" )\n",
" (wo): Linear(in_features=2816, out_features=1024, bias=False)\n",
@ -311,11 +316,11 @@
" (SelfAttention): MT5Attention(\n",
" (q): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (k): Linear(\n",
" in_features=1024, out_features=1024, bias=False\n",
" (base_layer): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (ia3_l): ParameterDict( (default): Parameter containing: [torch.FloatTensor of size 1024x1])\n",
" )\n",
" (v): Linear(\n",
" in_features=1024, out_features=1024, bias=False\n",
" (base_layer): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (ia3_l): ParameterDict( (default): Parameter containing: [torch.FloatTensor of size 1024x1])\n",
" )\n",
" (o): Linear(in_features=1024, out_features=1024, bias=False)\n",
@ -327,7 +332,7 @@
" (DenseReluDense): MT5DenseGatedActDense(\n",
" (wi_0): Linear(in_features=1024, out_features=2816, bias=False)\n",
" (wi_1): Linear(\n",
" in_features=1024, out_features=2816, bias=False\n",
" (base_layer): Linear(in_features=1024, out_features=2816, bias=False)\n",
" (ia3_l): ParameterDict( (default): Parameter containing: [torch.FloatTensor of size 2816x1])\n",
" )\n",
" (wo): Linear(in_features=2816, out_features=1024, bias=False)\n",
@ -352,11 +357,11 @@
" (SelfAttention): MT5Attention(\n",
" (q): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (k): Linear(\n",
" in_features=1024, out_features=1024, bias=False\n",
" (base_layer): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (ia3_l): ParameterDict( (default): Parameter containing: [torch.FloatTensor of size 1024x1])\n",
" )\n",
" (v): Linear(\n",
" in_features=1024, out_features=1024, bias=False\n",
" (base_layer): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (ia3_l): ParameterDict( (default): Parameter containing: [torch.FloatTensor of size 1024x1])\n",
" )\n",
" (o): Linear(in_features=1024, out_features=1024, bias=False)\n",
@ -369,11 +374,11 @@
" (EncDecAttention): MT5Attention(\n",
" (q): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (k): Linear(\n",
" in_features=1024, out_features=1024, bias=False\n",
" (base_layer): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (ia3_l): ParameterDict( (default): Parameter containing: [torch.FloatTensor of size 1024x1])\n",
" )\n",
" (v): Linear(\n",
" in_features=1024, out_features=1024, bias=False\n",
" (base_layer): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (ia3_l): ParameterDict( (default): Parameter containing: [torch.FloatTensor of size 1024x1])\n",
" )\n",
" (o): Linear(in_features=1024, out_features=1024, bias=False)\n",
@ -385,7 +390,7 @@
" (DenseReluDense): MT5DenseGatedActDense(\n",
" (wi_0): Linear(in_features=1024, out_features=2816, bias=False)\n",
" (wi_1): Linear(\n",
" in_features=1024, out_features=2816, bias=False\n",
" (base_layer): Linear(in_features=1024, out_features=2816, bias=False)\n",
" (ia3_l): ParameterDict( (default): Parameter containing: [torch.FloatTensor of size 2816x1])\n",
" )\n",
" (wo): Linear(in_features=2816, out_features=1024, bias=False)\n",
@ -403,11 +408,11 @@
" (SelfAttention): MT5Attention(\n",
" (q): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (k): Linear(\n",
" in_features=1024, out_features=1024, bias=False\n",
" (base_layer): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (ia3_l): ParameterDict( (default): Parameter containing: [torch.FloatTensor of size 1024x1])\n",
" )\n",
" (v): Linear(\n",
" in_features=1024, out_features=1024, bias=False\n",
" (base_layer): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (ia3_l): ParameterDict( (default): Parameter containing: [torch.FloatTensor of size 1024x1])\n",
" )\n",
" (o): Linear(in_features=1024, out_features=1024, bias=False)\n",
@ -419,11 +424,11 @@
" (EncDecAttention): MT5Attention(\n",
" (q): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (k): Linear(\n",
" in_features=1024, out_features=1024, bias=False\n",
" (base_layer): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (ia3_l): ParameterDict( (default): Parameter containing: [torch.FloatTensor of size 1024x1])\n",
" )\n",
" (v): Linear(\n",
" in_features=1024, out_features=1024, bias=False\n",
" (base_layer): Linear(in_features=1024, out_features=1024, bias=False)\n",
" (ia3_l): ParameterDict( (default): Parameter containing: [torch.FloatTensor of size 1024x1])\n",
" )\n",
" (o): Linear(in_features=1024, out_features=1024, bias=False)\n",
@ -435,7 +440,7 @@
" (DenseReluDense): MT5DenseGatedActDense(\n",
" (wi_0): Linear(in_features=1024, out_features=2816, bias=False)\n",
" (wi_1): Linear(\n",
" in_features=1024, out_features=2816, bias=False\n",
" (base_layer): Linear(in_features=1024, out_features=2816, bias=False)\n",
" (ia3_l): ParameterDict( (default): Parameter containing: [torch.FloatTensor of size 2816x1])\n",
" )\n",
" (wo): Linear(in_features=2816, out_features=1024, bias=False)\n",
@ -457,7 +462,7 @@
")"
]
},
"execution_count": 16,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
@ -470,7 +475,8 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 6,
"id": "155b8728",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
@ -519,27 +525,14 @@
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:datasets.builder:Found cached dataset financial_phrasebank (/root/.cache/huggingface/datasets/financial_phrasebank/sentences_allagree/1.0.0/550bde12e6c30e2674da973a55f57edde5181d53f5a5a34c1531c53f93b7e141)\n"
"Using the latest cached version of the dataset since financial_phrasebank couldn't be found on the Hugging Face Hub\n",
"Found the latest cached dataset configuration 'sentences_allagree' at /root/.cache/huggingface/datasets/financial_phrasebank/sentences_allagree/1.0.0/550bde12e6c30e2674da973a55f57edde5181d53f5a5a34c1531c53f93b7e141 (last modified on Thu Jul 31 03:15:41 2025).\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "bbfb7533b5ca459194e171df56b79566",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "9e12d97af6124a5a8c6627708b300c1e",
"model_id": "43b03e9b6de94bf0921228482d7be1e5",
"version_major": 2,
"version_minor": 0
},
@ -553,7 +546,7 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "0c561dab67914ea9b6e1aab803600551",
"model_id": "d08de1efca67472781017b806f33870c",
"version_major": 2,
"version_minor": 0
},
@ -567,12 +560,12 @@
{
"data": {
"text/plain": [
"{'sentence': 'It will be operated by Nokia , and supported by its Nokia NetAct network and service management system .',\n",
"{'sentence': 'SCOPI Chief Business Excellence Officer , Eng .',\n",
" 'label': 1,\n",
" 'text_label': 'neutral'}"
]
},
"execution_count": 17,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
@ -596,7 +589,8 @@
},
{
"cell_type": "code",
"execution_count": 18,
"execution_count": 7,
"id": "723fb67d",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
@ -633,7 +627,63 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e1e80a68a9e7429397cafc96c3c11f80",
"model_id": "7e08a312e5454c188f52fc2ca902c463",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"tokenizer_config.json: 0%| | 0.00/430 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "25d5de12709748c9959cd011c5c641de",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"spiece.model: 0%| | 0.00/4.31M [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "5b39c130813843c18e7f9187ffec37df",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"tokenizer.json: 0%| | 0.00/16.3M [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "de27076e123243fd89dbad1c9e1f0596",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"special_tokens_map.json: 0%| | 0.00/74.0 [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "1b55669bf13a4e2886f34c12d5f50354",
"version_major": 2,
"version_minor": 0
},
@ -647,7 +697,7 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "21f582e1208a4a38ae3c0cdce87e5c14",
"model_id": "f914229f180b4188925d9e804b92475c",
"version_major": 2,
"version_minor": 0
},
@ -695,7 +745,8 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 8,
"id": "36d56ea7",
"metadata": {
"id": "f733a3c6"
},
@ -712,7 +763,8 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 9,
"id": "6b0a0536",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
@ -725,45 +777,45 @@
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 255/255 [02:33<00:00, 1.67it/s]\n",
"100%|██████████| 29/29 [00:08<00:00, 3.48it/s]\n"
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:52<00:00, 4.86it/s]\n",
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 12.67it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=0: train_ppl=tensor(1.4939, device='cuda:0') train_epoch_loss=tensor(0.4014, device='cuda:0') eval_ppl=tensor(1.0514, device='cuda:0') eval_epoch_loss=tensor(0.0501, device='cuda:0')\n"
"epoch=0: train_ppl=tensor(1.4686, device='xpu:0') train_epoch_loss=tensor(0.3843, device='xpu:0') eval_ppl=tensor(1.0421, device='xpu:0') eval_epoch_loss=tensor(0.0412, device='xpu:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 255/255 [02:32<00:00, 1.67it/s]\n",
"100%|██████████| 29/29 [00:08<00:00, 3.43it/s]\n"
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:49<00:00, 5.20it/s]\n",
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 13.62it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=1: train_ppl=tensor(1.0523, device='cuda:0') train_epoch_loss=tensor(0.0510, device='cuda:0') eval_ppl=tensor(1.0383, device='cuda:0') eval_epoch_loss=tensor(0.0376, device='cuda:0')\n"
"epoch=1: train_ppl=tensor(1.0683, device='xpu:0') train_epoch_loss=tensor(0.0661, device='xpu:0') eval_ppl=tensor(1.0264, device='xpu:0') eval_epoch_loss=tensor(0.0261, device='xpu:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 255/255 [02:32<00:00, 1.68it/s]\n",
"100%|██████████| 29/29 [00:08<00:00, 3.44it/s]"
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:49<00:00, 5.20it/s]\n",
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 13.63it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=2: train_ppl=tensor(1.0397, device='cuda:0') train_epoch_loss=tensor(0.0389, device='cuda:0') eval_ppl=tensor(1.0392, device='cuda:0') eval_epoch_loss=tensor(0.0385, device='cuda:0')\n"
"epoch=2: train_ppl=tensor(1.0451, device='xpu:0') train_epoch_loss=tensor(0.0441, device='xpu:0') eval_ppl=tensor(1.0191, device='xpu:0') eval_epoch_loss=tensor(0.0190, device='xpu:0')\n"
]
},
{
@ -814,6 +866,7 @@
{
"cell_type": "code",
"execution_count": 21,
"id": "761b90e4",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
@ -849,6 +902,7 @@
{
"cell_type": "code",
"execution_count": 22,
"id": "8e0658ac",
"metadata": {
"id": "a8de6005"
},
@ -861,7 +915,8 @@
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": null,
"id": "ef7fbf9c",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
@ -874,18 +929,19 @@
"name": "stdout",
"output_type": "stream",
"text": [
"1.2M\tbigscience/mt0-large_IA3_SEQ_2_SEQ_LM/adapter_model.bin\n"
"1.2M\tbigscience/mt0-large_IA3_SEQ_2_SEQ_LM/adapter_model.safetensors\n"
]
}
],
"source": [
"ckpt = f\"{peft_model_id}/adapter_model.bin\"\n",
"ckpt = f\"{peft_model_id}/adapter_model.safetensors\"\n",
"!du -h $ckpt"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "4774d931",
"metadata": {
"id": "76c2fc29"
},
@ -903,6 +959,7 @@
{
"cell_type": "code",
"execution_count": 25,
"id": "996ddf0a",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
@ -946,6 +1003,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "701eda1b",
"metadata": {
"id": "66c65ea4"
},
@ -955,6 +1013,7 @@
{
"cell_type": "code",
"execution_count": null,
"id": "7d7718c5",
"metadata": {
"id": "65e71f78"
},
@ -970,7 +1029,7 @@
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -984,7 +1043,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
"version": "3.11.13"
},
"vscode": {
"interpreter": {

View File

@ -2,26 +2,10 @@
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "5f93b7d1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"===================================BUG REPORT===================================\n",
"Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n",
"For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link\n",
"================================================================================\n",
"CUDA SETUP: CUDA runtime path found: /home/sourab/miniconda3/envs/ml/lib/libcudart.so\n",
"CUDA SETUP: Highest compute capability among GPUs detected: 7.5\n",
"CUDA SETUP: Detected CUDA version 117\n",
"CUDA SETUP: Loading binary /home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...\n"
]
}
],
"outputs": [],
"source": [
"from transformers import AutoModelForSeq2SeqLM\n",
"from peft import get_peft_config, get_peft_model, get_peft_model_state_dict, LoraConfig, TaskType\n",
@ -36,7 +20,7 @@
"from tqdm import tqdm\n",
"from datasets import load_dataset\n",
"\n",
"device = \"cuda\"\n",
"device = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
"model_name_or_path = \"bigscience/mt0-large\"\n",
"tokenizer_name_or_path = \"bigscience/mt0-large\"\n",
"\n",
@ -51,7 +35,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"id": "8d0850ac",
"metadata": {},
"outputs": [],
@ -75,18 +59,19 @@
"name": "stderr",
"output_type": "stream",
"text": [
"Found cached dataset financial_phrasebank (/home/sourab/.cache/huggingface/datasets/financial_phrasebank/sentences_allagree/1.0.0/550bde12e6c30e2674da973a55f57edde5181d53f5a5a34c1531c53f93b7e141)\n"
"Using the latest cached version of the dataset since financial_phrasebank couldn't be found on the Hugging Face Hub\n",
"Found the latest cached dataset configuration 'sentences_allagree' at /root/.cache/huggingface/datasets/financial_phrasebank/sentences_allagree/1.0.0/550bde12e6c30e2674da973a55f57edde5181d53f5a5a34c1531c53f93b7e141 (last modified on Thu Jul 31 05:47:32 2025).\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3403bf3d718042018b0531848cc30209",
"model_id": "867f7bbb679d4b6eae344812fb797c19",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1 [00:00<?, ?it/s]"
"Map: 0%| | 0/2037 [00:00<?, ? examples/s]"
]
},
"metadata": {},
@ -95,26 +80,12 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d3d5c45e3776469f9560b6eaa9346f8f",
"model_id": "a6964a9de5e64d4e80c1906e2bed9f21",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/3 [00:00<?, ?ba/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e9736f26e9aa450b8d65f95c0b9c81cc",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1 [00:00<?, ?ba/s]"
"Map: 0%| | 0/227 [00:00<?, ? examples/s]"
]
},
"metadata": {},
@ -123,7 +94,7 @@
{
"data": {
"text/plain": [
"{'sentence': \"The 10,000-odd square metre plot that Stockmann has bought for the Nevsky Center shopping center is located on Nevsky Prospect , St Petersburg 's high street , next to the Vosstaniya Square underground station , in the immediate vicinity of Moscow Station .\",\n",
"{'sentence': 'The bank VTB24 provides mortgage loans to buy apartments in the complex at 11-13 % per annum in rubles .',\n",
" 'label': 1,\n",
" 'text_label': 'neutral'}"
]
@ -159,12 +130,12 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "c460989d4ab24e3f97d81ef040b1d1b4",
"model_id": "a867fe83918c435ab8a52bee2737f4f3",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Running tokenizer on dataset: 0%| | 0/3 [00:00<?, ?ba/s]"
"Running tokenizer on dataset: 0%| | 0/2037 [00:00<?, ? examples/s]"
]
},
"metadata": {},
@ -173,12 +144,12 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "1acc389b08b94f8a87900b9fbdbccce4",
"model_id": "97ceaf1285f348bd8272e2bec54050c6",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Running tokenizer on dataset: 0%| | 0/1 [00:00<?, ?ba/s]"
"Running tokenizer on dataset: 0%| | 0/227 [00:00<?, ? examples/s]"
]
},
"metadata": {},
@ -237,63 +208,10 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"id": "6b3a4090",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [02:21<00:00, 1.81it/s]\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:07<00:00, 4.13it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=0: train_ppl=tensor(14.6341, device='cuda:0') train_epoch_loss=tensor(2.6834, device='cuda:0') eval_ppl=tensor(1.0057, device='cuda:0') eval_epoch_loss=tensor(0.0057, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [02:00<00:00, 2.11it/s]\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:05<00:00, 5.66it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=1: train_ppl=tensor(1.7576, device='cuda:0') train_epoch_loss=tensor(0.5640, device='cuda:0') eval_ppl=tensor(1.0052, device='cuda:0') eval_epoch_loss=tensor(0.0052, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [01:33<00:00, 2.74it/s]\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:04<00:00, 6.23it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=2: train_ppl=tensor(1.3830, device='cuda:0') train_epoch_loss=tensor(0.3243, device='cuda:0') eval_ppl=tensor(1.0035, device='cuda:0') eval_epoch_loss=tensor(0.0035, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"outputs": [],
"source": [
"# training and evaluation\n",
"model = model.to(device)\n",
@ -375,7 +293,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": null,
"id": "bd20cd4c",
"metadata": {},
"outputs": [
@ -383,12 +301,12 @@
"name": "stdout",
"output_type": "stream",
"text": [
"9,2M\tbigscience/mt0-large_LORA_SEQ_2_SEQ_LM/adapter_model.bin\r\n"
"9,2M\tbigscience/mt0-large_LORA_SEQ_2_SEQ_LM/adapter_model.safetensors\r\n"
]
}
],
"source": [
"ckpt = f\"{peft_model_id}/adapter_model.bin\"\n",
"ckpt = f\"{peft_model_id}/adapter_model.safetensors\"\n",
"!du -h $ckpt"
]
},
@ -473,7 +391,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
"version": "3.11.13"
},
"vscode": {
"interpreter": {

View File

@ -1,253 +0,0 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "71fbfca2",
"metadata": {},
"outputs": [],
"source": [
"from transformers import AutoModelForSeq2SeqLM\n",
"from peft import PeftModel, PeftConfig\n",
"import torch\n",
"from datasets import load_dataset\n",
"import os\n",
"from transformers import AutoTokenizer\n",
"from torch.utils.data import DataLoader\n",
"from transformers import default_data_collator, get_linear_schedule_with_warmup\n",
"from tqdm import tqdm\n",
"from datasets import load_dataset\n",
"\n",
"dataset_name = \"twitter_complaints\"\n",
"text_column = \"Tweet text\"\n",
"label_column = \"text_label\"\n",
"batch_size = 8\n",
"\n",
"peft_model_id = \"smangrul/twitter_complaints_bigscience_T0_3B_LORA_SEQ_2_SEQ_LM\"\n",
"config = PeftConfig.from_pretrained(peft_model_id)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "cc55820a",
"metadata": {},
"outputs": [],
"source": [
"peft_model_id = \"smangrul/twitter_complaints_bigscience_T0_3B_LORA_SEQ_2_SEQ_LM\"\n",
"max_memory = {0: \"6GIB\", 1: \"0GIB\", 2: \"0GIB\", 3: \"0GIB\", 4: \"0GIB\", \"cpu\": \"30GB\"}\n",
"config = PeftConfig.from_pretrained(peft_model_id)\n",
"model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, device_map=\"auto\", max_memory=max_memory)\n",
"model = PeftModel.from_pretrained(model, peft_model_id, device_map=\"auto\", max_memory=max_memory)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e1a3648b",
"metadata": {},
"outputs": [],
"source": [
"from datasets import load_dataset\n",
"\n",
"dataset = load_dataset(\"ought/raft\", dataset_name)\n",
"\n",
"classes = [k.replace(\"_\", \" \") for k in dataset[\"train\"].features[\"Label\"].names]\n",
"print(classes)\n",
"dataset = dataset.map(\n",
" lambda x: {\"text_label\": [classes[label] for label in x[\"Label\"]]},\n",
" batched=True,\n",
" num_proc=1,\n",
")\n",
"print(dataset)\n",
"dataset[\"train\"][0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fe12d4d3",
"metadata": {},
"outputs": [],
"source": [
"tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)\n",
"target_max_length = max([len(tokenizer(class_label)[\"input_ids\"]) for class_label in classes])\n",
"\n",
"\n",
"def preprocess_function(examples):\n",
" inputs = examples[text_column]\n",
" targets = examples[label_column]\n",
" model_inputs = tokenizer(inputs, truncation=True)\n",
" labels = tokenizer(\n",
" targets, max_length=target_max_length, padding=\"max_length\", truncation=True, return_tensors=\"pt\"\n",
" )\n",
" labels = labels[\"input_ids\"]\n",
" labels[labels == tokenizer.pad_token_id] = -100\n",
" model_inputs[\"labels\"] = labels\n",
" return model_inputs\n",
"\n",
"\n",
"processed_datasets = dataset.map(\n",
" preprocess_function,\n",
" batched=True,\n",
" num_proc=1,\n",
" remove_columns=dataset[\"train\"].column_names,\n",
" load_from_cache_file=True,\n",
" desc=\"Running tokenizer on dataset\",\n",
")\n",
"\n",
"train_dataset = processed_datasets[\"train\"]\n",
"eval_dataset = processed_datasets[\"train\"]\n",
"test_dataset = processed_datasets[\"test\"]\n",
"\n",
"\n",
"def collate_fn(examples):\n",
" return tokenizer.pad(examples, padding=\"longest\", return_tensors=\"pt\")\n",
"\n",
"\n",
"train_dataloader = DataLoader(\n",
" train_dataset, shuffle=True, collate_fn=collate_fn, batch_size=batch_size, pin_memory=True\n",
")\n",
"eval_dataloader = DataLoader(eval_dataset, collate_fn=collate_fn, batch_size=batch_size, pin_memory=True)\n",
"test_dataloader = DataLoader(test_dataset, collate_fn=collate_fn, batch_size=batch_size, pin_memory=True)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "b33be5e6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"@NYTsupport i have complained a dozen times &amp; yet my papers are still thrown FAR from my door. Why is this so hard to resolve?\n",
"{'input_ids': tensor([[25335, 1499, 3, 10, 3320, 12056, 382, 20390, 3, 23,\n",
" 43, 25932, 3, 9, 9611, 648, 3, 184, 4624, 117,\n",
" 780, 82, 5778, 33, 341, 3, 12618, 377, 4280, 45,\n",
" 82, 1365, 5, 1615, 19, 48, 78, 614, 12, 7785,\n",
" 58, 16229, 3, 10, 3, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
" 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}\n",
"tensor([[ 0, 10394, 1]], device='cuda:0')\n",
"['complaint']\n"
]
}
],
"source": [
"model.eval()\n",
"i = 15\n",
"inputs = tokenizer(f'{text_column} : {dataset[\"test\"][i][\"Tweet text\"]} Label : ', return_tensors=\"pt\")\n",
"print(dataset[\"test\"][i][\"Tweet text\"])\n",
"print(inputs)\n",
"\n",
"with torch.no_grad():\n",
" outputs = model.generate(input_ids=inputs[\"input_ids\"].to(\"cuda\"), max_new_tokens=10)\n",
" print(outputs)\n",
" print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b6d6cd5b",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" 0%| | 0/7 [00:00<?, ?it/s]You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n",
"100%|████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:10<00:00, 1.48s/it]\n"
]
}
],
"source": [
"model.eval()\n",
"eval_preds = []\n",
"for _, batch in enumerate(tqdm(eval_dataloader)):\n",
" batch = {k: v.to(\"cuda\") for k, v in batch.items() if k != \"labels\"}\n",
" with torch.no_grad():\n",
" outputs = model.generate(**batch, max_new_tokens=10)\n",
" preds = outputs.detach().cpu().numpy()\n",
" eval_preds.extend(tokenizer.batch_decode(preds, skip_special_tokens=True))"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "61264abe",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"accuracy=100.0\n",
"eval_preds[:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']\n",
"dataset['train'][label_column][:10]=['no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint', 'no complaint', 'no complaint', 'complaint', 'complaint', 'no complaint']\n"
]
}
],
"source": [
"correct = 0\n",
"total = 0\n",
"for pred, true in zip(eval_preds, dataset[\"train\"][label_column]):\n",
" if pred.strip() == true.strip():\n",
" correct += 1\n",
" total += 1\n",
"accuracy = correct / total * 100\n",
"print(f\"{accuracy=}\")\n",
"print(f\"{eval_preds[:10]=}\")\n",
"print(f\"{dataset['train'][label_column][:10]=}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a70802a3",
"metadata": {},
"outputs": [],
"source": [
"model.eval()\n",
"test_preds = []\n",
"\n",
"for _, batch in enumerate(tqdm(test_dataloader)):\n",
" batch = {k: v for k, v in batch.items() if k != \"labels\"}\n",
" with torch.no_grad():\n",
" outputs = model.generate(**batch, max_new_tokens=10)\n",
" preds = outputs.detach().cpu().numpy()\n",
" test_preds.extend(tokenizer.batch_decode(preds, skip_special_tokens=True))\n",
" if len(test_preds) > 100:\n",
" break\n",
"test_preds"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5 (v3.10.5:f377153967, Jun 6 2022, 12:36:10) [Clang 13.0.0 (clang-1300.0.29.30)]"
},
"vscode": {
"interpreter": {
"hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -54,10 +54,12 @@ def b2mb(x):
# This context manager is used to track the peak memory usage of the process
class TorchTracemalloc:
def __enter__(self):
self.device_type = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
self.device_module = getattr(torch, self.device_type, torch.cuda)
gc.collect()
torch.cuda.empty_cache()
torch.cuda.reset_max_memory_allocated() # reset the peak gauge to zero
self.begin = torch.cuda.memory_allocated()
self.device_module.empty_cache()
self.device_module.reset_peak_memory_stats() # reset the peak gauge to zero
self.begin = self.device_module.memory_allocated()
self.process = psutil.Process()
self.cpu_begin = self.cpu_mem_used()
@ -87,9 +89,9 @@ class TorchTracemalloc:
self.peak_monitoring = False
gc.collect()
torch.cuda.empty_cache()
self.end = torch.cuda.memory_allocated()
self.peak = torch.cuda.max_memory_allocated()
self.device_module.empty_cache()
self.end = self.device_module.memory_allocated()
self.peak = self.device_module.max_memory_allocated()
self.used = b2mb(self.end - self.begin)
self.peaked = b2mb(self.peak - self.begin)
@ -116,7 +118,14 @@ def main():
do_test = False
set_seed(seed)
dataset = load_dataset("ought/raft", dataset_name)
dataset = load_dataset(
"parquet",
data_files={
"train": f"hf://datasets/ought/raft@refs/convert/parquet/{dataset_name}/train/0000.parquet",
"test": f"hf://datasets/ought/raft@refs/convert/parquet/{dataset_name}/test/0000.parquet",
},
)
classes = [k.replace("_", " ") for k in dataset["train"].features["Label"].names]
dataset = dataset.map(
lambda x: {"text_label": [classes[label] for label in x["Label"]]},
@ -199,12 +208,18 @@ def main():
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
# Printing the GPU memory usage details such as allocated memory, peak memory, and total memory usage
accelerator.print(f"GPU Memory before entering the train : {b2mb(tracemalloc.begin)}")
accelerator.print(f"GPU Memory consumed at the end of the train (end-begin): {tracemalloc.used}")
accelerator.print(f"GPU Peak Memory consumed during the train (max-begin): {tracemalloc.peaked}")
# Printing the device memory usage details such as allocated memory, peak memory, and total memory usage
accelerator.print(
f"GPU Total Peak Memory consumed during the train (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
f"{accelerator.device.type.upper()} Memory before entering the train : {b2mb(tracemalloc.begin)}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Memory consumed at the end of the train (end-begin): {tracemalloc.used}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Peak Memory consumed during the train (max-begin): {tracemalloc.peaked}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Total Peak Memory consumed during the train (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
)
accelerator.print(f"CPU Memory before entering the train : {b2mb(tracemalloc.cpu_begin)}")
@ -230,12 +245,18 @@ def main():
preds = accelerator.gather_for_metrics(outputs).detach().cpu().numpy()
eval_preds.extend(tokenizer.batch_decode(preds, skip_special_tokens=True))
# Printing the GPU memory usage details such as allocated memory, peak memory, and total memory usage
accelerator.print(f"GPU Memory before entering the eval : {b2mb(tracemalloc.begin)}")
accelerator.print(f"GPU Memory consumed at the end of the eval (end-begin): {tracemalloc.used}")
accelerator.print(f"GPU Peak Memory consumed during the eval (max-begin): {tracemalloc.peaked}")
# Printing the device memory usage details such as allocated memory, peak memory, and total memory usage
accelerator.print(
f"GPU Total Peak Memory consumed during the eval (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
f"{accelerator.device.type.upper()} Memory before entering the eval : {b2mb(tracemalloc.begin)}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Memory consumed at the end of the eval (end-begin): {tracemalloc.used}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Peak Memory consumed during the eval (max-begin): {tracemalloc.peaked}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Total Peak Memory consumed during the eval (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
)
accelerator.print(f"CPU Memory before entering the eval : {b2mb(tracemalloc.cpu_begin)}")

View File

@ -2,26 +2,10 @@
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "5f93b7d1",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"===================================BUG REPORT===================================\n",
"Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n",
"For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link\n",
"================================================================================\n",
"CUDA SETUP: CUDA runtime path found: /home/sourab/miniconda3/envs/ml/lib/libcudart.so\n",
"CUDA SETUP: Highest compute capability among GPUs detected: 7.5\n",
"CUDA SETUP: Detected CUDA version 117\n",
"CUDA SETUP: Loading binary /home/sourab/miniconda3/envs/ml/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...\n"
]
}
],
"outputs": [],
"source": [
"from transformers import AutoModelForSeq2SeqLM\n",
"from peft import get_peft_config, get_peft_model, get_peft_model_state_dict, PrefixTuningConfig, TaskType\n",
@ -30,14 +14,13 @@
"import os\n",
"\n",
"os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n",
"os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"3\"\n",
"from transformers import AutoTokenizer\n",
"from torch.utils.data import DataLoader\n",
"from transformers import default_data_collator, get_linear_schedule_with_warmup\n",
"from tqdm import tqdm\n",
"from datasets import load_dataset\n",
"\n",
"device = \"cuda\"\n",
"device = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
"model_name_or_path = \"t5-large\"\n",
"tokenizer_name_or_path = \"t5-large\"\n",
"\n",
@ -52,7 +35,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"id": "8d0850ac",
"metadata": {},
"outputs": [],
@ -76,18 +59,19 @@
"name": "stderr",
"output_type": "stream",
"text": [
"Found cached dataset financial_phrasebank (/home/sourab/.cache/huggingface/datasets/financial_phrasebank/sentences_allagree/1.0.0/550bde12e6c30e2674da973a55f57edde5181d53f5a5a34c1531c53f93b7e141)\n"
"Using the latest cached version of the dataset since financial_phrasebank couldn't be found on the Hugging Face Hub\n",
"Found the latest cached dataset configuration 'sentences_allagree' at /root/.cache/huggingface/datasets/financial_phrasebank/sentences_allagree/1.0.0/550bde12e6c30e2674da973a55f57edde5181d53f5a5a34c1531c53f93b7e141 (last modified on Thu Jul 31 06:23:15 2025).\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "ec4be98991b84181bfa75f8846422b8b",
"model_id": "3b321971d6f942418bd5ef6105a1aa65",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1 [00:00<?, ?it/s]"
"Map: 0%| | 0/2037 [00:00<?, ? examples/s]"
]
},
"metadata": {},
@ -96,26 +80,12 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "82a6bd694c4f4751a23c370ab51f01a4",
"model_id": "5997543529a849bf97719e59a5ec95b2",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/3 [00:00<?, ?ba/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3844878631534468a1495e435563e4b0",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1 [00:00<?, ?ba/s]"
"Map: 0%| | 0/227 [00:00<?, ? examples/s]"
]
},
"metadata": {},
@ -124,7 +94,7 @@
{
"data": {
"text/plain": [
"{'sentence': 'Finnish elevators and escalators maker KONE Corporation said on Tuesday ( 18 March ) that it has received a major order from Sir Robert McAlpine to supply all elevators and escalators for the Watermark Place project in the City of London .',\n",
"{'sentence': \"Progress Group , QPR 's representative in Saudi Arabia and North Africa , has signed a framework agreement for a long term strategic relationship with ISE .\",\n",
" 'label': 2,\n",
" 'text_label': 'positive'}"
]
@ -157,27 +127,15 @@
"id": "adf9608c",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/sourab/transformers/src/transformers/models/t5/tokenization_t5_fast.py:155: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.\n",
"For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.\n",
"- Be aware that you SHOULD NOT rely on t5-large automatically truncating your input to 512 when padding/encoding.\n",
"- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.\n",
"- To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value.\n",
" warnings.warn(\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "4af8c12efb5643659573347509079f3a",
"model_id": "ee9bf13a2e3f4812a51a87346a4614f3",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Running tokenizer on dataset: 0%| | 0/3 [00:00<?, ?ba/s]"
"spiece.model: 0%| | 0.00/792k [00:00<?, ?B/s]"
]
},
"metadata": {},
@ -186,12 +144,40 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "86033b6257384584afd034075af808cb",
"model_id": "332fcaa33dc343e7a20b24cec7ec97e9",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Running tokenizer on dataset: 0%| | 0/1 [00:00<?, ?ba/s]"
"tokenizer.json: 0%| | 0.00/1.39M [00:00<?, ?B/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a2135262c7a44377b35fe32b8d86d6c6",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Running tokenizer on dataset: 0%| | 0/2037 [00:00<?, ? examples/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "dc8598b160484939b27c65be001c694c",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Running tokenizer on dataset: 0%| | 0/227 [00:00<?, ? examples/s]"
]
},
"metadata": {},
@ -250,86 +236,10 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": null,
"id": "6b3a4090",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:49<00:00, 5.15it/s]\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:03<00:00, 7.56it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=0: train_ppl=tensor(2760654.5000, device='cuda:0') train_epoch_loss=tensor(14.8310, device='cuda:0') eval_ppl=tensor(1.0124, device='cuda:0') eval_epoch_loss=tensor(0.0124, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:40<00:00, 6.22it/s]\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:05<00:00, 5.05it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=1: train_ppl=tensor(2.7329, device='cuda:0') train_epoch_loss=tensor(1.0054, device='cuda:0') eval_ppl=tensor(1.0081, device='cuda:0') eval_epoch_loss=tensor(0.0080, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:58<00:00, 4.36it/s]\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:05<00:00, 5.05it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=2: train_ppl=tensor(2.1698, device='cuda:0') train_epoch_loss=tensor(0.7747, device='cuda:0') eval_ppl=tensor(1.0057, device='cuda:0') eval_epoch_loss=tensor(0.0057, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:58<00:00, 4.35it/s]\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:05<00:00, 5.06it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=3: train_ppl=tensor(2.0724, device='cuda:0') train_epoch_loss=tensor(0.7287, device='cuda:0') eval_ppl=tensor(1.0051, device='cuda:0') eval_epoch_loss=tensor(0.0051, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████████████████| 255/255 [01:02<00:00, 4.10it/s]\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:06<00:00, 4.74it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=4: train_ppl=tensor(1.7598, device='cuda:0') train_epoch_loss=tensor(0.5652, device='cuda:0') eval_ppl=tensor(1.0047, device='cuda:0') eval_epoch_loss=tensor(0.0047, device='cuda:0')\n"
]
}
],
"outputs": [],
"source": [
"# training and evaluation\n",
"model = model.to(device)\n",
@ -411,7 +321,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": null,
"id": "bd20cd4c",
"metadata": {},
"outputs": [
@ -419,12 +329,12 @@
"name": "stdout",
"output_type": "stream",
"text": [
"3,8M\tt5-large_PREFIX_TUNING_SEQ_2_SEQ_LM/adapter_model.bin\r\n"
"3,8M\tt5-large_PREFIX_TUNING_SEQ_2_SEQ_LM/adapter_model.safetensors\r\n"
]
}
],
"source": [
"ckpt = f\"{peft_model_id}/adapter_model.bin\"\n",
"ckpt = f\"{peft_model_id}/adapter_model.safetensors\"\n",
"!du -h $ckpt"
]
},
@ -503,7 +413,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.5"
"version": "3.11.13"
},
"vscode": {
"interpreter": {

View File

@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"id": "5f93b7d1",
"metadata": {
"ExecuteTime": {
@ -10,50 +10,7 @@
"start_time": "2023-05-30T08:37:56.881307Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"===================================BUG REPORT===================================\n",
"Welcome to bitsandbytes. For bug reports, please run\n",
"\n",
"python -m bitsandbytes\n",
"\n",
" and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n",
"================================================================================\n",
"bin /udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so\n",
"CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...\n",
"CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0\n",
"CUDA SETUP: Highest compute capability among GPUs detected: 8.0\n",
"CUDA SETUP: Detected CUDA version 117\n",
"CUDA SETUP: Loading binary /udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /udir/tschilla/anaconda3 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...\n",
" warn(msg)\n",
"/udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('Europe/Paris')}\n",
" warn(msg)\n",
"/udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/udir/tschilla/.cache/dotnet_bundle_extract')}\n",
" warn(msg)\n",
"/udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('5002'), PosixPath('http'), PosixPath('//127.0.0.1')}\n",
" warn(msg)\n",
"/udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('() { ( alias;\\n eval ${which_declare} ) | /usr/bin/which --tty-only --read-alias --read-functions --show-tilde --show-dot $@\\n}')}\n",
" warn(msg)\n",
"/udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('module'), PosixPath('//matplotlib_inline.backend_inline')}\n",
" warn(msg)\n",
"/udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.\n",
"Either way, this might cause trouble in the future:\n",
"If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.\n",
" warn(msg)\n"
]
}
],
"outputs": [],
"source": [
"import os\n",
"\n",
@ -66,7 +23,7 @@
"\n",
"os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n",
"\n",
"device = \"cuda\"\n",
"device = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
"model_name_or_path = \"t5-large\"\n",
"tokenizer_name_or_path = \"t5-large\"\n",
"\n",
@ -94,19 +51,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"trainable params: 40960 || all params: 737709056 || trainable%: 0.005552324411210698\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/udir/tschilla/anaconda3/envs/peft/lib/python3.9/site-packages/transformers/models/t5/tokenization_t5_fast.py:155: FutureWarning: This tokenizer was incorrectly instantiated with a model max length of 512 which will be corrected in Transformers v5.\n",
"For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.\n",
"- Be aware that you SHOULD NOT rely on t5-large automatically truncating your input to 512 when padding/encoding.\n",
"- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.\n",
"- To avoid this warning, please instantiate this tokenizer with `model_max_length` set to your preferred value.\n",
" warnings.warn(\n"
"trainable params: 40,960 || all params: 737,709,056 || trainable%: 0.0056\n"
]
},
{
@ -295,27 +240,14 @@
"name": "stderr",
"output_type": "stream",
"text": [
"Found cached dataset financial_phrasebank (/data/proxem/huggingface/datasets/financial_phrasebank/sentences_allagree/1.0.0/550bde12e6c30e2674da973a55f57edde5181d53f5a5a34c1531c53f93b7e141)\n"
"Using the latest cached version of the dataset since financial_phrasebank couldn't be found on the Hugging Face Hub\n",
"Found the latest cached dataset configuration 'sentences_allagree' at /root/.cache/huggingface/datasets/financial_phrasebank/sentences_allagree/1.0.0/550bde12e6c30e2674da973a55f57edde5181d53f5a5a34c1531c53f93b7e141 (last modified on Thu Jul 31 06:37:38 2025).\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "fb63f50cb7cb4f5aae10648ba74d6c4e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"model_id": "2b64258700bd40548ddcd626f3920c9a",
"version_major": 2,
"version_minor": 0
},
@ -329,7 +261,7 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"model_id": "d95de9b1dc3d417da35118c14d02b986",
"version_major": 2,
"version_minor": 0
},
@ -343,9 +275,9 @@
{
"data": {
"text/plain": [
"{'sentence': '`` Lining stone sales were also good in the early autumn , and order books are strong to the end of the year .',\n",
" 'label': 2,\n",
" 'text_label': 'positive'}"
"{'sentence': 'The 2500-passenger ferry will have dimensions of 185 m length overall , 170 m length between perpendiculars , 27.70 m breadth and 6.55 m design draught .',\n",
" 'label': 1,\n",
" 'text_label': 'neutral'}"
]
},
"execution_count": 3,
@ -384,7 +316,7 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"model_id": "a78b4b1a041546248b8e6703eaec0969",
"version_major": 2,
"version_minor": 0
},
@ -398,7 +330,7 @@
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "",
"model_id": "bbe2962a25144b5488c297e186df824f",
"version_major": 2,
"version_minor": 0
},
@ -470,7 +402,7 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": null,
"id": "6b3a4090",
"metadata": {
"ExecuteTime": {
@ -478,90 +410,7 @@
"start_time": "2023-05-30T08:38:50.102263Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:42<00:00, 6.05it/s]\n",
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.40it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=0: train_ppl=tensor(8.0846, device='cuda:0') train_epoch_loss=tensor(2.0900, device='cuda:0') eval_ppl=tensor(1.3542, device='cuda:0') eval_epoch_loss=tensor(0.3032, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:41<00:00, 6.15it/s]\n",
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.42it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=1: train_ppl=tensor(1.5088, device='cuda:0') train_epoch_loss=tensor(0.4113, device='cuda:0') eval_ppl=tensor(1.2692, device='cuda:0') eval_epoch_loss=tensor(0.2384, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:41<00:00, 6.18it/s]\n",
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.45it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=2: train_ppl=tensor(1.5322, device='cuda:0') train_epoch_loss=tensor(0.4267, device='cuda:0') eval_ppl=tensor(1.2065, device='cuda:0') eval_epoch_loss=tensor(0.1877, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:41<00:00, 6.17it/s]\n",
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.38it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=3: train_ppl=tensor(1.4475, device='cuda:0') train_epoch_loss=tensor(0.3699, device='cuda:0') eval_ppl=tensor(1.2346, device='cuda:0') eval_epoch_loss=tensor(0.2107, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 255/255 [00:42<00:00, 5.94it/s]\n",
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 29/29 [00:02<00:00, 14.42it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"epoch=4: train_ppl=tensor(1.3428, device='cuda:0') train_epoch_loss=tensor(0.2948, device='cuda:0') eval_ppl=tensor(1.2041, device='cuda:0') eval_epoch_loss=tensor(0.1857, device='cuda:0')\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"outputs": [],
"source": [
"# training and evaluation\n",
"model = model.to(device)\n",
@ -653,7 +502,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": null,
"id": "bd20cd4c",
"metadata": {
"ExecuteTime": {
@ -666,12 +515,12 @@
"name": "stdout",
"output_type": "stream",
"text": [
"164K\tt5-large_PROMPT_TUNING_SEQ_2_SEQ_LM/adapter_model.bin\r\n"
"164K\tt5-large_PROMPT_TUNING_SEQ_2_SEQ_LM/adapter_model.safetensors\r\n"
]
}
],
"source": [
"ckpt = f\"{peft_model_id}/adapter_model.bin\"\n",
"ckpt = f\"{peft_model_id}/adapter_model.safetensors\"\n",
"!du -h $ckpt"
]
},
@ -735,9 +584,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "peft",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "peft"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
@ -749,7 +598,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
"version": "3.11.13"
},
"toc": {
"base_numbering": 1,

File diff suppressed because one or more lines are too long

View File

@ -3,4 +3,6 @@ accelerate
evaluate
deepspeed
tqdm
datasets
datasets
safetensors
scikit-learn

View File

@ -114,7 +114,7 @@ trainer = SFTTrainer(
model=peft_model,
args=training_args,
train_dataset=dataset,
tokenizer=tokenizer,
processing_class=tokenizer,
)
trainer.train()
peft_model.save_pretrained("corda-llama-2-7b")
@ -150,7 +150,10 @@ corda_config = CordaConfig(
#### Knowledge-preserved adaptation mode
```bash
CUDA_VISIBLE_DEVICES=0 python -u preprocess.py --model_id="meta-llama/Llama-2-7b-hf" \
export CUDA_VISIBLE_DEVICES=0 # force to use device 0 of CUDA GPU
export ZE_AFFINITY_MASK=0 # force to use device 0 of Intel XPU
python -u preprocess.py --model_id="meta-llama/Llama-2-7b-hf" \
--r 128 --seed 233 \
--save_model --save_path {path_to_residual_model} \
--calib_dataset "nqopen"
@ -165,7 +168,10 @@ Arguments:
#### Instruction-previewed adaptation mode
```bash
CUDA_VISIBLE_DEVICES=0 python -u preprocess.py --model_id="meta-llama/Llama-2-7b-hf" \
export CUDA_VISIBLE_DEVICES=0 # force to use device 0 of CUDA GPU
export ZE_AFFINITY_MASK=0 # force to use device 0 of Intel XPU
python -u preprocess.py --model_id="meta-llama/Llama-2-7b-hf" \
--r 128 --seed 233 \
--save_model --save_path {path_to_residual_model} \
--first_eigen --calib_dataset "MetaMATH"
@ -248,4 +254,4 @@ Note that this conversion is not supported if `rslora` is used in combination wi
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
}
```
```

View File

@ -266,7 +266,7 @@ def train():
"train_dataset": train_dataset,
"data_collator": data_collator,
}
trainer = Trainer(model=model, tokenizer=tokenizer, args=script_args, **data_module)
trainer = Trainer(model=model, processing_class=tokenizer, args=script_args, **data_module)
trainer.train()
trainer.save_state()
model.save_pretrained(os.path.join(script_args.output_dir, "ft"))

View File

@ -38,8 +38,11 @@ def main(args):
# Setting random seed of numpy and torch
np.random.seed(args.seed)
torch.manual_seed(args.seed)
torch.cuda.manual_seed_all(args.seed)
torch.backends.cudnn.deterministic = True
if torch.cuda.is_available():
torch.cuda.manual_seed_all(args.seed)
elif torch.xpu.is_available():
torch.xpu.manual_seed_all(args.seed)
torch.use_deterministic_algorithms(True)
# Load model
model_id = args.model_id

View File

@ -1129,7 +1129,7 @@
"# Convert the test dataset to a CPT-compatible format\n",
"cpt_test_dataset = CPTDataset(test_dataset, tokenizer, templates)\n",
"\n",
"# Get the device where the model is loaded (CPU or GPU)\n",
"# Get the device where the model is loaded (CPU, GPU or XPU)\n",
"device = model.device\n",
"list_bool_predictions = []\n",
"\n",
@ -1552,4 +1552,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}

View File

@ -114,7 +114,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": null,
"id": "ca43b893-2d66-4e93-a08f-b17a92040709",
"metadata": {
"colab": {
@ -185,7 +185,8 @@
],
"source": [
"lm.eval()\n",
"lm.to(\"cuda\");"
"device = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
"lm.to(device);"
]
},
{
@ -210,7 +211,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"id": "f5c0b3df-911a-4645-9140-99ee489515e8",
"metadata": {
"colab": {
@ -327,7 +328,8 @@
"source": [
"from datasets import load_dataset\n",
"\n",
"raw_data = load_dataset(\"InstaDeepAI/nucleotide_transformer_downstream_tasks\", \"H3\")"
"raw_data_full = load_dataset(\"InstaDeepAI/nucleotide_transformer_downstream_tasks\")\n",
"raw_data = raw_data_full.filter(lambda example: example['task'] == 'H3')"
]
},
{
@ -592,7 +594,7 @@
},
{
"cell_type": "code",
"execution_count": 32,
"execution_count": null,
"id": "700540f4-0ab8-4f8a-a75c-416a6908af47",
"metadata": {
"colab": {
@ -720,7 +722,7 @@
"# Number of classes for your classification task\n",
"num_labels = 2\n",
"classification_model = DNA_LM(lm, num_labels)\n",
"classification_model.to('cuda');"
"classification_model.to(device);"
]
},
{
@ -991,7 +993,7 @@
},
{
"cell_type": "code",
"execution_count": 40,
"execution_count": null,
"id": "021641ae-f604-4d69-8724-743b7d7c613c",
"metadata": {
"colab": {
@ -1094,7 +1096,7 @@
"# Number of classes for your classification task\n",
"num_labels = 2\n",
"classification_model = DNA_LM(lm, num_labels)\n",
"classification_model.to('cuda');"
"classification_model.to(device);"
]
},
{

View File

@ -59,7 +59,7 @@ def main():
)
parser.add_argument("--ephemeral_gpu_offload", action="store_true", help="Use ephemeral GPU offloading")
parser.add_argument(
"--merge_model_path", type="str", help="Merge the model with the DoRA model and save to the given path"
"--merge_model_path", type=str, help="Merge the model with the DoRA model and save to the given path"
)
args = parser.parse_args()

View File

@ -60,8 +60,9 @@ peft_config = LoraConfig(
eva_config=eva_config
)
# move model to GPU
model = model.cuda()
# move model to accelerator
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
model = model.to(device)
# to optimize memory usage during EVA initialization, set low_cpu_mem_usage=True
peft_model = get_peft_model(model, peft_config, low_cpu_mem_usage=True)
@ -90,7 +91,7 @@ In some cases you might just want to get the state_dict after EVA initialization
- you want to precompute and store the state_dict for different downstream tasks.
- you need to quantize the model for finetuning but want to perform EVA initialization with model weights in full/half precision.
- you do not intend to use a peft model for LoRA finetuning.
- you would like to leverage multiple GPUs for EVA initialization. (At the moment this is not directly supported by `initialize_lora_eva_weights`)
- you would like to leverage multiple accelerators for EVA initialization. (At the moment this is not directly supported by `initialize_lora_eva_weights`)
You can do this by calling `get_eva_state_dict` directly (you only need to pass `peft_config` if `model` is not a PeftModel):
```python
@ -103,9 +104,9 @@ Later you can load the state_dict into a `PeftModel` by using the `eva_state_dic
initialize_lora_eva_weights(peft_model, eva_state_dict=eva_state_dict)
```
## Leveraging multiple GPUs
## Leveraging multiple accelerators
EVA initialization can be parallelized across multiple GPUs. In this case inputs from multiple GPUs are gathered before computing the SVD for the batch. This requires that the model is wrapped in a `torch.nn.DataParallel` or `torch.nn.DistributedDataParallel` class. An example of how to use this can be found in [eva_finetuning_multi_gpu.py](https://github.com/huggingface/peft/blob/main/examples/eva_finetuning/eva_finetuning_multi_gpu.py).
EVA initialization can be parallelized across multiple accelerators. In this case inputs from multiple accelerators are gathered before computing the SVD for the batch. This requires that the model is wrapped in a `torch.nn.DataParallel` or `torch.nn.DistributedDataParallel` class. An example of how to use this can be found in [eva_finetuning_multi_accelerator.py](https://github.com/huggingface/peft/blob/main/examples/eva_finetuning/eva_finetuning_multi_accelerator.py).
## Customizing EVA

View File

@ -21,8 +21,7 @@ from utils import DataCollator, TokenizerMetaMath
from peft import EvaConfig, LoraConfig, get_peft_model, initialize_lora_eva_weights
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DEVICE = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
# config
model_name = "meta-llama/Llama-3.1-8B"
@ -69,7 +68,7 @@ peft_config = LoraConfig(
r=rank, lora_alpha=alpha, target_modules=target_modules, init_lora_weights="eva", eva_config=eva_config
)
# move model to GPU
# move model to accelerator
model = model.to(DEVICE)
# to optimize memory usage during eva initialization, set low_cpu_mem_usage=True

View File

@ -50,6 +50,11 @@ if torch.cuda.is_available():
torch.cuda.set_device(local_rank)
dist.init_process_group("nccl")
world_size = dist.get_world_size()
elif torch.xpu.is_available():
local_rank = int(os.environ.get("LOCAL_RANK", -1))
torch.xpu.set_device(local_rank)
dist.init_process_group("xccl")
world_size = dist.get_world_size()
else:
local_rank = -1
world_size = 1

View File

@ -416,8 +416,7 @@ def main():
for epoch in range(starting_epoch, args.num_train_epochs):
model.train()
if args.with_tracking:
total_loss = 0
total_loss = 0
if args.resume_from_checkpoint and epoch == starting_epoch and resume_step is not None:
# We skip the first `n` batches in the dataloader when resuming from a checkpoint
active_dataloader = accelerator.skip_first_batches(train_dataloader, resume_step)

View File

@ -1,7 +1,7 @@
git+https://github.com/huggingface/peft
git+https://github.com/huggingface/accelerate
git+https://github.com/huggingface/transformers
datasets
peft
accelerate
transformers
datasets==2.18.0
evaluate
hnswlib
pandas

View File

@ -9,8 +9,8 @@ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # force to use CUDA GPU device 0
os.environ["ZE_AFFINITY_MASK"] = "0" # force to use Intel XPU device 0
# -*- coding: utf-8 -*-
"""Finetune-opt-bnb-peft.ipynb
@ -36,11 +36,12 @@ First, run the cells below to install the requirements:
Here let's load the `opt-6.7b` model, its weights in half-precision (float16) are about 13GB on the Hub! If we load them in 8-bit we would require around 7GB of memory instead.
"""
free_in_GB = int(torch.cuda.mem_get_info()[0] / 1024**3)
device_type = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
device_module = getattr(torch, device_type, torch.cuda)
free_in_GB = int(device_module.mem_get_info()[0] / 1024**3)
max_memory = f"{free_in_GB - 2}GB"
n_gpus = torch.cuda.device_count()
n_gpus = device_module.device_count()
max_memory = {i: max_memory for i in range(n_gpus)}
model = AutoModelForCausalLM.from_pretrained(
@ -180,11 +181,12 @@ You can also directly load adapters from the Hub using the commands below:
# You can then directly use the trained model or the model that you have loaded from the 🤗 Hub for inference as you would do it usually in `transformers`.
# """
#
batch = tokenizer("Two things are infinite: ", return_tensors="pt")
batch = tokenizer("Two things are infinite: ", return_tensors="pt").to(model.device)
model.config.use_cache = False # silence the warnings. Please re-enable for inference!
model.eval()
with torch.cuda.amp.autocast():
with torch.amp.autocast(device_type=device_type):
output_tokens = model.generate(**batch, max_new_tokens=50)
print("\n\n", tokenizer.decode(output_tokens[0], skip_special_tokens=True))

View File

@ -88,18 +88,19 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": null,
"id": "98a0d8ac",
"metadata": {},
"outputs": [],
"source": [
"prompt = \"a purple qwe backpack.\"\n",
"negative_prompt = \"low quality, blurry, unfinished\""
"negative_prompt = \"low quality, blurry, unfinished\"\n",
"device = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\""
]
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"id": "d4e888d2",
"metadata": {},
"outputs": [
@ -121,7 +122,7 @@
],
"source": [
"%%time\n",
"pipe = get_hra_sd_pipeline(OUTPUT_DIR, MODEL_NAME, EPOCH_IDX, adapter_name=RUN_NAME)"
"pipe = get_hra_sd_pipeline(OUTPUT_DIR, MODEL_NAME, EPOCH_IDX, adapter_name=RUN_NAME, device=device)"
]
},
{

View File

@ -1,13 +1,12 @@
transformers=>4.48.0
accelerate==0.25.0
transformers==4.55.0
accelerate==1.9.0
evaluate
tqdm
datasets==2.16.1
diffusers==0.17.1
datasets==4.0.0
diffusers==0.34.0
Pillow
huggingface_hub
safetensors
nb_conda_kernels
ipykernel
ipywidgets
wandb==0.16.1
wandb==0.21.0

View File

@ -141,7 +141,7 @@ def main(args):
cur_class_images = len(list(class_images_dir.iterdir()))
if cur_class_images < args.num_class_images:
torch_dtype = torch.float16 if accelerator.device.type == "cuda" else torch.float32
torch_dtype = torch.float16 if accelerator.device.type in ["cuda", "xpu"] else torch.float32
if args.prior_generation_precision == "fp32":
torch_dtype = torch.float32
elif args.prior_generation_precision == "fp16":
@ -178,6 +178,8 @@ def main(args):
del pipeline
if torch.cuda.is_available():
torch.cuda.empty_cache()
elif torch.xpu.is_available():
torch.xpu.empty_cache()
# Handle the repository creation
if accelerator.is_main_process:
@ -261,7 +263,9 @@ def main(args):
text_encoder.to(accelerator.device, dtype=weight_dtype)
if args.enable_xformers_memory_efficient_attention:
if is_xformers_available():
if accelerator.device.type == "xpu":
logger.warning("XPU hasn't support xformers yet, ignore it.")
elif is_xformers_available():
unet.enable_xformers_memory_efficient_attention()
else:
raise ValueError("xformers is not available. Make sure it is installed correctly")
@ -578,18 +582,26 @@ def main(args):
)
del pipeline
torch.cuda.empty_cache()
if torch.cuda.is_available():
torch.cuda.empty_cache()
elif torch.xpu.is_available():
torch.xpu.empty_cache()
if global_step >= args.max_train_steps:
break
# Printing the GPU memory usage details such as allocated memory, peak memory, and total memory usage
# Printing the device memory usage details such as allocated memory, peak memory, and total memory usage
if not args.no_tracemalloc:
accelerator.print(f"GPU Memory before entering the train : {b2mb(tracemalloc.begin)}")
accelerator.print(f"GPU Memory consumed at the end of the train (end-begin): {tracemalloc.used}")
accelerator.print(f"GPU Peak Memory consumed during the train (max-begin): {tracemalloc.peaked}")
accelerator.print(
f"GPU Total Peak Memory consumed during the train (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
f"{accelerator.device.type.upper()} Memory before entering the train : {b2mb(tracemalloc.begin)}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Memory consumed at the end of the train (end-begin): {tracemalloc.used}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Peak Memory consumed during the train (max-begin): {tracemalloc.peaked}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Total Peak Memory consumed during the train (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
)
accelerator.print(f"CPU Memory before entering the train : {b2mb(tracemalloc.cpu_begin)}")

View File

@ -15,10 +15,12 @@ def b2mb(x):
# This context manager is used to track the peak memory usage of the process
class TorchTracemalloc:
def __enter__(self):
self.device_type = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
self.device_module = getattr(torch, self.device_type, torch.cuda)
gc.collect()
torch.cuda.empty_cache()
torch.cuda.reset_max_memory_allocated() # reset the peak gauge to zero
self.begin = torch.cuda.memory_allocated()
self.device_module.empty_cache()
self.device_module.reset_peak_memory_stats() # reset the peak gauge to zero
self.begin = self.device_module.memory_allocated()
self.process = psutil.Process()
self.cpu_begin = self.cpu_mem_used()
@ -48,9 +50,9 @@ class TorchTracemalloc:
self.peak_monitoring = False
gc.collect()
torch.cuda.empty_cache()
self.end = torch.cuda.memory_allocated()
self.peak = torch.cuda.max_memory_allocated()
self.device_module.empty_cache()
self.end = self.device_module.memory_allocated()
self.peak = self.device_module.max_memory_allocated()
self.used = b2mb(self.end - self.begin)
self.peaked = b2mb(self.peak - self.begin)

View File

@ -1142,7 +1142,7 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
@ -1747,7 +1747,7 @@
" args,\n",
" train_dataset=train_ds,\n",
" eval_dataset=val_ds,\n",
" tokenizer=image_processor,\n",
" processing_class=image_processor,\n",
" compute_metrics=compute_metrics,\n",
" data_collator=collate_fn,\n",
")\n",

View File

@ -477,7 +477,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": null,
"id": "8cc5c5db",
"metadata": {},
"outputs": [
@ -490,7 +490,7 @@
}
],
"source": [
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
"device = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
"peft_model = peft.get_peft_model(model, config).to(device)\n",
"optimizer = torch.optim.Adam(peft_model.parameters(), lr=2e-4)\n",
"criterion = torch.nn.CrossEntropyLoss()\n",

View File

@ -71,7 +71,8 @@
}
],
"source": [
"!pip install -q bitsandbytes datasets accelerate\n",
"!pip install -q datasets==3.6.0 accelerate\n",
"!pip install -q git+https://github.com/bitsandbytes-foundation/bitsandbytes.git\n",
"!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git@main"
]
},
@ -1000,7 +1001,7 @@
"source": [
"model.eval()\n",
"input_text = \"In January-September 2009 , the Group 's net interest income increased to EUR 112.4 mn from EUR 74.3 mn in January-September 2008 .\"\n",
"inputs = tokenizer(input_text, return_tensors=\"pt\")\n",
"inputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\n",
"\n",
"outputs = model.generate(input_ids=inputs[\"input_ids\"], max_new_tokens=10)\n",
"\n",
@ -1209,7 +1210,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": null,
"id": "jmjwWYt0KI_I",
"metadata": {
"colab": {
@ -1247,7 +1248,7 @@
"source": [
"model.eval()\n",
"input_text = \"In January-September 2009 , the Group 's net interest income increased to EUR 112.4 mn from EUR 74.3 mn in January-September 2008 .\"\n",
"inputs = tokenizer(input_text, return_tensors=\"pt\")\n",
"inputs = tokenizer(input_text, return_tensors=\"pt\").to(model.device)\n",
"\n",
"outputs = model.generate(input_ids=inputs[\"input_ids\"], max_new_tokens=10)\n",
"\n",

View File

@ -26,7 +26,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
@ -59,7 +59,8 @@
}
],
"source": [
"!pip install -q bitsandbytes datasets accelerate\n",
"!pip install -q datasets==3.6.0 accelerate\n",
"!pip install -q git+https://github.com/bitsandbytes-foundation/bitsandbytes.git\n",
"!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git"
]
},
@ -1485,7 +1486,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
@ -1514,9 +1515,10 @@
}
],
"source": [
"batch = tokenizer(\"Two things are infinite: \", return_tensors=\"pt\")\n",
"batch = tokenizer(\"Two things are infinite: \", return_tensors=\"pt\").to(model.device)\n",
"\n",
"with torch.cuda.amp.autocast():\n",
"device_type = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
"with torch.amp.autocast(device_type=device_type):\n",
" output_tokens = model.generate(**batch, max_new_tokens=50)\n",
"\n",
"print(\"\\n\\n\", tokenizer.decode(output_tokens[0], skip_special_tokens=True))"

View File

@ -0,0 +1,19 @@
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: MULTI_XPU
downcast_bf16: 'no'
enable_cpu_affinity: false
gpu_ids: all
ipex_config:
ipex: false
machine_rank: 0
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

View File

@ -78,7 +78,7 @@ train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=2, collate
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)
device = "cuda" if torch.cuda.is_available() else "cpu"
device = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
model.train()

View File

@ -374,8 +374,9 @@ def evaluation_loop(model, eval_dataloader, processor, normalizer, metric, force
references = []
normalized_predictions = []
normalized_references = []
device_type = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
for _, batch in enumerate(tqdm(eval_dataloader)):
with torch.cuda.amp.autocast():
with torch.amp.autocast(device_type=device_type):
with torch.no_grad():
generated_tokens = (
model.generate(
@ -487,10 +488,8 @@ def main():
train_split = "train+validation"
test_split = "test"
raw_datasets["train"] = loading_method(
args.dataset_name, args.language_abbr, split=train_split, use_auth_token=True
)
raw_datasets["test"] = loading_method(args.dataset_name, args.language_abbr, split=test_split, use_auth_token=True)
raw_datasets["train"] = loading_method(args.dataset_name, args.language_abbr, split=train_split)
raw_datasets["test"] = loading_method(args.dataset_name, args.language_abbr, split=test_split)
raw_datasets = raw_datasets.cast_column("audio", Audio(sampling_rate=16000))
logger.info("Dataset loaded: %s", raw_datasets)
@ -540,9 +539,9 @@ def main():
)
model.config.forced_decoder_ids = None
model.config.suppress_tokens = []
if len(set(model.hf_device_map.values()).intersection({"cpu", "disk"})) > 0:
if hasattr(model, "hf_device_map") and len(set(model.hf_device_map.values()).intersection({"cpu", "disk"})) > 0:
raise ValueError("Training on CPU or disk is not supported.")
if len(set(model.hf_device_map.values())) > 1:
if hasattr(model, "hf_device_map") and len(set(model.hf_device_map.values())) > 1:
device_map = model.hf_device_map.copy()
# required because `labels` are on main execution device (0) while the output of `proj_out` is on other device.
# So, this leads to device mismatch error when calculation cross-entropy between logits and labels.
@ -567,6 +566,13 @@ def main():
model.model.encoder.conv1.register_forward_hook(make_inputs_require_grad)
# Calculate total steps first for AdaLoRA
if args.max_train_steps is None:
num_update_steps_per_epoch = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps)
total_steps = args.num_train_epochs * num_update_steps_per_epoch
else:
total_steps = args.max_train_steps
# wrapping model with adalora tuner
if args.use_adalora:
config = AdaLoraConfig(
@ -581,6 +587,7 @@ def main():
lora_dropout=args.lora_dropout,
target_modules=["k_proj", "q_proj", "v_proj", "out_proj", "fc1", "fc2"],
orth_reg_weight=args.orth_reg_weight,
total_step=total_steps,
)
else:
config = LoraConfig(
@ -620,8 +627,14 @@ def main():
# Note here that the max steps is adjusted by the accelerator's num_processes
args.max_train_steps = math.ceil(args.max_train_steps / accelerator.num_processes)
if args.use_peft and args.use_adalora:
model.base_model.peft_config["default"].total_step = args.max_train_steps
# model.base_model.peft_config.total_step = args.max_train_steps
# Update the total_step in the config to reflect the adjusted max_train_steps
# Handle DDP case where model is wrapped
if hasattr(model, "module"):
# DDP case
model.module.base_model.peft_config["default"].total_step = args.max_train_steps
else:
# Non-DDP case
model.base_model.peft_config["default"].total_step = args.max_train_steps
# We need to initialize the trackers we use, and also store our configuration.
# The trackers initializes automatically on the main process.
@ -683,7 +696,21 @@ def main():
# Note that this requires parameter gradients.
# Hence being called before optimizer.zero_grad().
if args.use_peft and args.use_adalora:
model.update_and_allocate(global_step)
# Handle DDP case where model is wrapped
if hasattr(model, "module"):
# DDP case
peft_model = model.module
else:
# Non-DDP case
peft_model = model
# Check if rank_pattern exists before calling update_and_allocate
if (
hasattr(peft_model, "peft_config")
and peft_model.peft_config["default"].rank_pattern is not None
and global_step >= args.tinit # Only start updating after tinit steps
):
peft_model.update_and_allocate(global_step)
optimizer.zero_grad()
global_step += 1
@ -750,7 +777,18 @@ def main():
if args.load_best_model:
# load the best model
accelerator.load_state(os.path.join(args.output_dir, "best_checkpoint"))
model.resize_modules_by_rank_pattern(model.peft_config["default"].rank_pattern, "default")
# Handle DDP case where model is wrapped
if hasattr(model, "module"):
# DDP case
peft_model = model.module
else:
# Non-DDP case
peft_model = model
# Only resize if rank_pattern exists
if hasattr(peft_model, "peft_config") and peft_model.peft_config["default"].rank_pattern is not None:
peft_model.resize_modules_by_rank_pattern(peft_model.peft_config["default"].rank_pattern, "default")
eval_metrics = evaluation_loop(
model, eval_dataloader, processor, normalizer, metric, forced_decoder_ids, accelerator
)

View File

@ -58,13 +58,14 @@
},
"outputs": [],
"source": [
"!pip install datasets>=2.6.1\n",
"!pip install datasets==3.6.0\n",
"!pip install git+https://github.com/huggingface/transformers\n",
"!pip install librosa\n",
"!pip install evaluate>=0.30\n",
"!pip install jiwer\n",
"!pip install gradio\n",
"!pip install -q bitsandbytes datasets accelerate\n",
"!pip install -q datasets accelerate\n",
"!pip install -q git+https://github.com/bitsandbytes-foundation/bitsandbytes.git\n",
"!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git@main"
]
},
@ -426,8 +427,8 @@
"\n",
"common_voice = DatasetDict()\n",
"\n",
"common_voice[\"train\"] = load_dataset(dataset_name, language_abbr, split=\"train+validation\", use_auth_token=True)\n",
"common_voice[\"test\"] = load_dataset(dataset_name, language_abbr, split=\"test\", use_auth_token=True)\n",
"common_voice[\"train\"] = load_dataset(dataset_name, language_abbr, split=\"train+validation\")\n",
"common_voice[\"test\"] = load_dataset(dataset_name, language_abbr, split=\"test\")\n",
"\n",
"print(common_voice)"
]
@ -1323,7 +1324,7 @@
" eval_dataset=common_voice[\"test\"],\n",
" data_collator=data_collator,\n",
" # compute_metrics=compute_metrics,\n",
" tokenizer=processor.feature_extractor,\n",
" processing_class=processor.feature_extractor,\n",
" callbacks=[SavePeftModelCallback],\n",
")\n",
"model.config.use_cache = False # silence the warnings. Please re-enable for inference!"
@ -1586,7 +1587,7 @@
],
"source": [
"model_name_or_path = \"openai/whisper-large-v2\"\n",
"peft_model_id = \"smangrul/\" + f\"{model_name_or_path}-{model.peft_config.peft_type}-colab\".replace(\"/\", \"-\")\n",
"peft_model_id = \"smangrul/\" + f\"{model_name_or_path}-{model.peft_config['default'].peft_type.value}-colab\".replace(\"/\", \"-\")\n",
"model.push_to_hub(peft_model_id)\n",
"print(peft_model_id)"
]
@ -1664,16 +1665,17 @@
"import numpy as np\n",
"import gc\n",
"\n",
"device_type = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
"eval_dataloader = DataLoader(common_voice[\"test\"], batch_size=8, collate_fn=data_collator)\n",
"\n",
"model.eval()\n",
"for step, batch in enumerate(tqdm(eval_dataloader)):\n",
" with torch.cuda.amp.autocast():\n",
" with torch.amp.autocast(device_type=device_type):\n",
" with torch.no_grad():\n",
" generated_tokens = (\n",
" model.generate(\n",
" input_features=batch[\"input_features\"].to(\"cuda\"),\n",
" decoder_input_ids=batch[\"labels\"][:, :4].to(\"cuda\"),\n",
" input_features=batch[\"input_features\"].to(model.device),\n",
" decoder_input_ids=batch[\"labels\"][:, :4].to(model.device),\n",
" max_new_tokens=255,\n",
" )\n",
" .cpu()\n",

View File

@ -0,0 +1,9 @@
accelerate
git+https://github.com/bitsandbytes-foundation/bitsandbytes.git
datasets==3.6.0
evaluate
jiwer
librosa
soundfile
transformers==4.52.4
wandb

View File

@ -69,14 +69,17 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": null,
"id": "06cfd506",
"metadata": {},
"outputs": [],
"source": [
"def get_lora_sd_pipeline(\n",
" ckpt_dir, base_model_name_or_path=None, dtype=torch.float16, device=\"cuda\", adapter_name=\"default\"\n",
" ckpt_dir, base_model_name_or_path=None, dtype=torch.float16, device=\"auto\", adapter_name=\"default\"\n",
"):\n",
" if device == \"auto\":\n",
" device = torch.accelerator.current_accelerator().type if hasattr(torch, \"accelerator\") else \"cuda\"\n",
"\n",
" unet_sub_dir = os.path.join(ckpt_dir, \"unet\")\n",
" text_encoder_sub_dir = os.path.join(ckpt_dir, \"text_encoder\")\n",
" if os.path.exists(text_encoder_sub_dir) and base_model_name_or_path is None:\n",

View File

@ -277,7 +277,7 @@ def parse_args(input_args=None):
"--scale_lr",
action="store_true",
default=False,
help="Scale the learning rate by the number of GPUs, gradient accumulation steps, and batch size.",
help="Scale the learning rate by the number of accelerators, gradient accumulation steps, and batch size.",
)
parser.add_argument(
"--lr_scheduler",
@ -359,7 +359,7 @@ def parse_args(input_args=None):
choices=["no", "fp16", "bf16"],
help=(
"Whether to use mixed precision. Choose between fp16 and bf16 (bfloat16). Bf16 requires PyTorch >="
" 1.10.and an Nvidia Ampere GPU. Default to the value of accelerate config of the current system or the"
" 1.10.and an Nvidia Ampere GPU or Intel XPU. Default to the value of accelerate config of the current system or the"
" flag passed with the `accelerate.launch` command. Use this argument to override the accelerate config."
),
)
@ -370,7 +370,7 @@ def parse_args(input_args=None):
choices=["no", "fp32", "fp16", "bf16"],
help=(
"Choose prior generation precision between fp32, fp16 and bf16 (bfloat16). Bf16 requires PyTorch >="
" 1.10.and an Nvidia Ampere GPU. Default to fp16 if a GPU is available else fp32."
" 1.10.and an Nvidia Ampere GPU or Intel XPU. Default to fp16 if a GPU is available else fp32."
),
)
parser.add_argument("--local_rank", type=int, default=-1, help="For distributed training: local_rank")
@ -411,9 +411,11 @@ def b2mb(x):
class TorchTracemalloc:
def __enter__(self):
gc.collect()
torch.cuda.empty_cache()
torch.cuda.reset_max_memory_allocated() # reset the peak gauge to zero
self.begin = torch.cuda.memory_allocated()
self.device_type = torch.accelerator.current_accelerator().type if hasattr(torch, "accelerator") else "cuda"
self.device_module = getattr(torch, self.device_type, torch.cuda)
self.device_module.empty_cache()
self.device_module.reset_peak_memory_stats() # reset the peak gauge to zero
self.begin = self.device_module.memory_allocated()
self.process = psutil.Process()
self.cpu_begin = self.cpu_mem_used()
@ -443,9 +445,9 @@ class TorchTracemalloc:
self.peak_monitoring = False
gc.collect()
torch.cuda.empty_cache()
self.end = torch.cuda.memory_allocated()
self.peak = torch.cuda.max_memory_allocated()
self.device_module.empty_cache()
self.end = self.device_module.memory_allocated()
self.peak = self.device_module.max_memory_allocated()
self.used = b2mb(self.end - self.begin)
self.peaked = b2mb(self.peak - self.begin)
@ -559,7 +561,7 @@ def collate_fn(examples, with_prior_preservation=False):
class PromptDataset(Dataset):
"A simple dataset to prepare the prompts to generate class images on multiple GPUs."
"A simple dataset to prepare the prompts to generate class images on multiple accelerators."
def __init__(self, prompt, num_samples):
self.prompt = prompt
@ -626,7 +628,7 @@ def main(args):
cur_class_images = len(list(class_images_dir.iterdir()))
if cur_class_images < args.num_class_images:
torch_dtype = torch.float16 if accelerator.device.type == "cuda" else torch.float32
torch_dtype = torch.float16 if accelerator.device.type in ["cuda", "xpu"] else torch.float32
if args.prior_generation_precision == "fp32":
torch_dtype = torch.float32
elif args.prior_generation_precision == "fp16":
@ -663,6 +665,8 @@ def main(args):
del pipeline
if torch.cuda.is_available():
torch.cuda.empty_cache()
elif torch.xpu.is_available():
torch.xpu.empty_cache()
# Handle the repository creation
if accelerator.is_main_process:
@ -740,7 +744,9 @@ def main(args):
print(text_encoder)
if args.enable_xformers_memory_efficient_attention:
if is_xformers_available():
if accelerator.device.type == "xpu":
logger.warn("XPU hasn't support xformers yet, ignore it.")
elif is_xformers_available():
unet.enable_xformers_memory_efficient_attention()
else:
raise ValueError("xformers is not available. Make sure it is installed correctly")
@ -753,7 +759,7 @@ def main(args):
# Enable TF32 for faster training on Ampere GPUs,
# cf https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices
if args.allow_tf32:
if args.allow_tf32 and torch.cuda.is_available():
torch.backends.cuda.matmul.allow_tf32 = True
if args.scale_lr:
@ -761,7 +767,7 @@ def main(args):
args.learning_rate * args.gradient_accumulation_steps * args.train_batch_size * accelerator.num_processes
)
# Use 8-bit Adam for lower memory usage or to fine-tune the model in 16GB GPUs
# Use 8-bit Adam for lower memory usage or to fine-tune the model in 16GB accelerators
if args.use_8bit_adam:
try:
import bitsandbytes as bnb
@ -1032,18 +1038,27 @@ def main(args):
)
del pipeline
torch.cuda.empty_cache()
if torch.cuda.is_available():
torch.cuda.empty_cache()
elif torch.xpu.is_available():
torch.xpu.empty_cache()
if global_step >= args.max_train_steps:
break
# Printing the GPU memory usage details such as allocated memory, peak memory, and total memory usage
# Printing the accelerator memory usage details such as allocated memory, peak memory, and total memory usage
if not args.no_tracemalloc:
accelerator.print(f"GPU Memory before entering the train : {b2mb(tracemalloc.begin)}")
accelerator.print(f"GPU Memory consumed at the end of the train (end-begin): {tracemalloc.used}")
accelerator.print(f"GPU Peak Memory consumed during the train (max-begin): {tracemalloc.peaked}")
accelerator.print(
f"GPU Total Peak Memory consumed during the train (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
f"{accelerator.device.type.upper()} Memory before entering the train : {b2mb(tracemalloc.begin)}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Memory consumed at the end of the train (end-begin): {tracemalloc.used}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Peak Memory consumed during the train (max-begin): {tracemalloc.peaked}"
)
accelerator.print(
f"{accelerator.device.type.upper()} Total Peak Memory consumed during the train (max): {tracemalloc.peaked + b2mb(tracemalloc.begin)}"
)
accelerator.print(f"CPU Memory before entering the train : {b2mb(tracemalloc.cpu_begin)}")

Some files were not shown because too many files have changed in this diff Show More