Commit Graph

1384 Commits

Author SHA1 Message Date
53c25fe4fd Release: 0.17.1 changes (#2739)
* FIX Multiple issues with target_parameters (#2710)
* Bump version to 0.17.1
v0.17.1
2025-08-21 11:06:08 +02:00
48f6493f94 Release 0.17.0 (#2691)
- Bump versions
- Fix a few TODO comments
- A bit of clean up in test_target_paramters.py
v0.17.0
2025-08-01 18:44:24 +02:00
337be05f03 ENH: Adapter injection based on state_dict (#2637)
Make it possible to inject the PEFT adapters based on a state_dict
instead of the PEFT config.

See https://github.com/huggingface/diffusers/issues/11874 for context.

Description

Right now, when creating a PEFT adapter like LoRA, the adapter layers
are injected based on the PEFT config, most notably the entries in
`target_modules`, but other arguments also play into this. Generally,
this is a good approach, but it breaks down in some situations. For
instance, in diffusers, we often have the situation that the checkpoint
was created without PEFT/diffusers, thus there is no PEFT config, only
the `state_dict`. To load these checkpoints in diffusers, the current
approach is to reverse-engineer a valid PEFT config based on the keys in
the `state_dict`.

Unfortunately, this is error prone. Moreover, not every combination of
`state_dict` keys can be easily expressed in a PEFT config through a
combination of `target_modules`, `exclude_modules`, etc. Yes, in theory
everything can be expressed by passing `target_module=<regex_pattern>`,
but reverse-engineering such a regex correctly and efficiently is very
hard (and thus currently not done).

This PR implements a completely different approach to inject adapters.
Instead of relying on the PEFT config to determine which layers to
target, it takes the `state_dict` directly as the source of truth. This
should allow to exactly match what is desired.

Implementation details

I took care to implement this change in a way that if no `state_dict` is
passed, the exact same code path as previously is taken. The risk of
breaking anything should thus be minimized.

Technically, it is not necessary to pass the `state_dict`, we are only
interested in the keys. I still called the argument `state_dict`, since
that is typically what we have at this point, but this can be easily
changed.

I thought it might be a good idea, if the `state_dict` is used, to still
check what modules would have been targeted if we had used the PEFT
config. Then, the results are compared and a warning is given if they
differ. This allows the user to see if the PEFT config is not correctly
specified. While running some diffusers tests, I never encountered this
warning, which is good. However, if we plan, for instance, to get rid of
all the reverse engineering of the PEFT config in diffusers, it would
make more sense to not give this warning.

Caveats

When the original LoRA model was using `target_parameters`, injecting
from `state_dict` will not work correctly. The problem is that the
`state_dict` looks the same, whether the module or a parameter was
targeted. Therefore, we cannot correctly determine the user's intent.

For now, what I decided to do is:

1. Always assume that `target_modules` is meant, as it's the far more
   common occurrence.
2. When we detect `target_parameters` while using `state_dict` for
   injection, we raise an error.
3. If we don't detect this, injection might just slip through, resulting
   in modules being targeted (if they are valid modules) instead of
   parameters.
4. Document that these two features don't work together.

I think overall, this is not too concerning, as both features are rather
niche and thus unlikely to be used in conjunction.

Related changes

While working on this PR, I made a couple of related, though not
strictly necessary, changes:

- Refactor tests in `test_low_level_api.py` to use pytest instead of
  unittest
- Add default target modules for LoHa and LoKr (just copying LoRA)
- Most PEFT method's model classes like `LoraModel` had an `__init__`
  that effectively just called `super()` with the same arguments. I
  removed these `__init__` methods.
2025-08-01 18:39:53 +02:00
J.L
bb4fb50e2b FEAT Add MiSS as a replacement for Bone. (#2604)
Add MiSS, an evolution of Bone, from https://arxiv.org/abs/2409.15371.

MiSS will replace Bone, which is now deprecated. A script to convert Bone
checkpoints to MiSS checkpoints is included.
2025-08-01 18:37:20 +02:00
a91ec33fc5 Fix not detecting regex-targeted embedding layer (#2649)
This issue was found in PR #2638 and is defined thusly:

> When calling `get_peft_model_state_dict(..., save_embedding_layers="auto")` we check if the
> embedding layer is targetted to determine if the embedding layers need saving. This is not
> done when `PeftConfig.target_modules` is a regex-string, potentially missing to save embeddings.

This is fixed by adding a check similar to the existing query of whether `EMBEDDING_LAYER_NAMES` is
a subset of the defined target modules, only that the regex matching from `BaseTuner.inject_adapter`
is used. To avoid code duplication, the matching was moved to its own utility function
`match_target_against_key`.

The main complication was to define the test-cases as it was non-trivial to find what the meaning
of `save_embedding_layers="auto"` entails. I've assembled a list of cases that I think are correct
in the corresponding unit test.
2025-07-31 16:08:32 +02:00
25e5c6b25c FIX Missing device map for facebook/opt-125m (#2675)
Fixes the failing EETQ test in the nighly multi device CI.

In #2612, fixed device_maps were added for multi-GPU training as we
could not rely on device_map="auto". While doing this change, one
device_map was missing, namely for facebook/opt-125m, which is used in
the EETQ multi device test. This device_map was now added. This makes
the test pass locally.
2025-07-30 20:02:22 +02:00
5e00266e85 TST: Add more HF Hub model caching (#2682)
A bunch of tests in test_tuners_utils.py didn't use the decorator so
far, which is now fixed. This should hopefully help reduce timeouts.

Moreover, the iris dataset loading is now moved to a module-scoped
fixture (before, it was just loaded on module level). This doesn't help
with caching, but it prevents loading of this dataset when the
corresponding tests are not even run.
2025-07-30 20:02:07 +02:00
46ae69ac29 FIX Small fixes to target_parameters (#2677)
1. Better error message when same layer targeted twice
2. Remove unused attribute num_experts from _LoraParameterProxy
2025-07-30 14:34:04 +02:00
1c853eaaad Fix trainable tokens with fsdp (#2681)
When using FSDP with trainable tokens, there was an error when
retrieving the state_dict of the TrainableTokensWrapper. The reason is
that for the state_dict that is passed to get_peft_model_state_dict, the
FSDP wrapper was already unwrapped, which means the keys don't have the
FSDP-specific prefix. However, in the PEFT code, when looking up keys
from said state_dict, the prefix was not removed. Now it is removed,
making the lookup succeed. The same logic applies to
set_peft_model_state_dict.

I could successfully start training with FSDP and trainable tokens
locally by adjusting the examples/sft script to include trainable
tokens. Checkpoints could be successfully created and resumed from. The
only change I needed to make was to configure use_orig_params=True for
FSDP.
2025-07-30 14:33:53 +02:00
c11a9dfeaa FIX Failing target_parameters param usage count (#2676)
For testing target_parameters, we use a tiny Llama4 model. This model
was refactored in
https://github.com/huggingface/transformers/pull/39501, resulting in one
parameter being accessed an additional time:

https://github.com/huggingface/transformers/pull/39501/files#diff-e668ec07f78afdb2cb805d939e47453757f0b9437436cb860fcb7cb2431c9cf5R69

Therefore, a unit test that relied on how often this parameter was
accessed started failing. This PR updates the count to the correct
number.

Additionally debug print statements that were accidentally left over are
now removed.
2025-07-30 12:29:51 +02:00
92d65cafa5 Update extending vocab docs (#2669)
- Recommends trainable tokens as first measure
- Clarifies a few things about saving embeddings
- Adds full-finetuning as an option of last resort

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-07-25 13:09:00 +02:00
434651346c ENH: Targeting multiple parameters on the same module (#2665)
When the target_parameters feature for LoRA was introduced in #2638,
there was one gap, namely the possibility to target multiple
nn.Parameters on the same module (there was only a workaround involving
multiple adapters, but that is not user friendly). With this PR, it is
now possible to achieve this.

The mechanism to enable this is a bit crude, namely allowing to nest
multiple ParamWrappers. This should generally be fine as long as there
are only a couple of nn.Parameters being targeted on the same module.
When there are dozens or hundreds, this approach could load to slow
downs or other issues.

A side effect of this implementation is that the ParamWrapper, when it
removes the parametrization, now only removes its own parametrization.
When using nn.utils.parametrize.remove_parametrization, it removes all
parametrizations, which is bad when we have nested parametrizations.

Alternative approaches

Some alternative approaches were discussed internally but the chosen one
was considered most practical.

Allow to have more than one adapted parameter per LoRA layer. This would
require to have nested dicts for the LoRA parameters, something like
self.lora_A[adapter_name][parameter_name]. We don't have this anywhere
so far and it would probably break implicit assumptions about PEFT
layers in many places (like, parsing of state_dict keys), requiring many
adjustments. Have an auxiliary module that contains the individual LoRA
layers that target the individual parameters. This could be the cleanest
solution and would probably be more efficient if there are a huge number
of targeted parameters per module. However, this also brings extra
complexity, as it requires implementing the logic of how to route the
information to the right parameter, and it may be a solution to a
problem that is irrelevant in practice (large number of targets per
module).
2025-07-24 19:42:19 +02:00
43845f9b14 Method Comparison: Improve formatting/layout of table (#2670)
* Method Comparison: Improve formatting/layout of table

Quick improvement to reduce the dominance of columns like `{peft,train}_config` and make
numbers a bit more readable through proper decimal/thousands formatting.

* Bump gradio version to accomodate required fixes
2025-07-24 19:02:09 +02:00
663b1209fd ENH Llama-Adapters support for GPT2 (#2643)
aka "adaption prompt"
2025-07-24 14:51:16 +02:00
04a5ed7b2f DOC Fix error in code example (#2666) 2025-07-24 12:13:41 +02:00
a795199ffa Update tokenizer parameter in sfttrainer across multiple examples (#2664)
* REFAC Update tokenizer parameter to processing_class in SFTTrainer instances across multiple examples

* REFAC Replace tokenizer parameter with processing_class in Trainer instances across documentation and examples

* Refactor tokenizer parameter to processing_class in various examples

- Updated the Trainer initialization in corda_finetuning.py to use processing_class instead of tokenizer.
- Changed the execution_count to null in image_classification_peft_lora.ipynb.
- Modified the tokenizer parameter to processing_class in image_classification_peft_lora.ipynb.
- Adjusted the tokenizer parameter to processing_class in peft_bnb_whisper_large_v2_training.ipynb.
- Updated the README.md in lorafa_finetune to reflect the change from tokenizer to processing_class in Trainer initialization.

* REFAC Update tokenizer parameter to processing_class in Seq2SeqTrainer instantiation

* REFAC Replace tokenizer parameter with processing_class in README and notebook examples
2025-07-23 15:30:28 +02:00
f650b08abb make method comparison device agnostic, so it can expand to more accelerators like XPU (#2610)
make method comparision device agnostic, so it can expand to more
accelerators like XPU

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-07-22 15:25:56 +02:00
e77924563a FIX Prefix tuning after transformers PR 38635 (#2662)
Due to https://github.com/huggingface/transformers/pull/38635, several
tests involving prefix tuning broke:

https://github.com/huggingface/peft/actions/runs/16417140904/job/46385751329

This PR fixes this by resoling two issues:

1. The _supports_cache_class attribute was removed, we can now assume
that it is True if the attribute does not exist.

2. We had special handling of past_key_values for GPTBigCodeForCausalLM
which is no longer required (nor valid) after that PR, so it is removed
depending on the transformers version.
2025-07-22 13:59:34 +02:00
fa85d10a7f Update README.md (#2659)
Update bibtex entry.
2025-07-21 14:36:02 +02:00
f3b97c3704 FEAT Allow LoRA to target nn.Parameter (#2638)
Normally, nn.Parameter cannot be targeted with LoRA adapters. This can
be problematic, e.g. when there are MoE layers that use nn.Parameter
directly, or when there is nn.Linear but the weight is passed directly
instead of calling forward (e.g. MHA).

It would be possible to craft a solution involving a special LoRA layer
for each of the modules that use nn.Parameter directly (e.g. lora.MHA)
but that doesn't scale. This PR is implements a direct way to target
nn.Parameter making use of torch.nn.utils.parametrize.

Using the feature requires passing target_parameters to the LoraConfig.
During the forward pass, when the parameter is acceessed, the LoRA
weights are added to the weights while still ensuring that gradients
flow correctly to the LoRA weights.

Right now, only LoRA supports this feature. Moreover, it is not possible
to target multiple parameters of the same module with the same adapter.
A workaround is to use multiple adapters (i.e. with different names).

---------

Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>
2025-07-15 16:18:46 +02:00
22506a8e42 FIX Deploy method comp app: error in workflow file (#2645)
Fixing the error:

permissions:
  contents: {}
 Check failure on line 11 in .github/workflows/deploy_method_comparison_app.yml

GitHub Actions
/ Deploy "method_comparison" Gradio to Spaces
Invalid workflow file

The workflow is not valid.
.github/workflows/deploy_method_comparison_app.yml (Line: 11, Col: 13):
A mapping was not expected
2025-07-14 14:48:06 +02:00
1c75d96aca FIX: Prompt learning methods modules_to_save issue (#2646)
When using prompt learning methods, modules_to_save was not correctly
set automatically. This is really bad when using, for instance, sequence
classification tasks, which require the classifier layer to be added to
modules_to_save.

The issue was introduced in #2220 where it is wrongly assumed that the
PEFT config always has a modules_to_save attribute, which is not true
for prompt learning. In #2481, this was partly fixed by using getattr to
avoid an error. However, this did not resolve the fundamental issue that
for prompt learning, there is no such attribute, resulting in
module_to_save not being applied.

This PR proposes to fix this by adding modules_to_save to the prompt
learning configs.
2025-07-14 13:57:33 +02:00
a4f9334f12 FEAT Add SHiRA Adapters (#2584)
Implements: Sparse High Rank Adapters

Paper: https://arxiv.org/abs/2406.13175
2025-07-14 11:16:10 +02:00
35000fda88 Fix #2634: Allow peft_method to be a string (#2635)
The auto-tagging code assumed that every `PeftConfig.peft_type` value is an Enum value but
when adding custom types without modifying the enum it is possible to have strings as well
(and the interface supports that).

This change allows for string values of `PeftConfig.peft_type` in the auto-tagging code.
2025-07-08 11:13:06 +02:00
0755ab93f6 FIX Faulty OFT parameter device test (#2630)
There is an error in an OFT test because .cpu() is called on a parameter
instead of a module. Calling it on parameter is not an in-place
operation, so it has no effect.
2025-07-07 15:57:06 +02:00
fa9e429e93 FIX Correctly skip AWQ test based on torch version (#2631)
There is currently an issue with a multi-GPU test using AutoAWQ. Thus,
PR #2529 introduced an unconditional skip for this test. In #2596, a
condition was added to only skip with torch 2.7, as other torch versions
are not affected. However, the is_torch_version function does not
actually match minor and patch versions, so

is_torch_version("==", "2.7")

returns False when using version 2.7.1.

This PR fixes that by checking both "2.7.0" and "2.7.1" explicitly. This
is not very robust in case that there are further patch releases of
PyTorch. However, that is unlikely, and introducing a more general
solution is IMO not worth it just for this instance.
2025-07-07 15:55:37 +02:00
d76f3fe98c FIX Create mask function signature change (#2633)
We use create_mask_for_generate from transformers. It was introduced in
v4.53.0 but in v4.53.1, the function signature was changed to include
position_ids as mandatory argument:

https://github.com/huggingface/transformers/pull/39194

This breaks our function call in PEFT. This PR fixes the function call
by passing position_ids. This in turn would break the function call with
transformers v4.53.0, thus a strict version check is being used for >=
v4.53.1.
2025-07-07 11:46:57 +02:00
b960d259e8 ENH Enable FSDP example for GPTQ quantized model (#2626)
Besides fixes, includes an example script that uses
`hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4`

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-07-07 11:08:03 +02:00
9f01809e70 FEAT: Add GH action to deploy method comparison app (#2625)
* FEAT Add GH action to deploy method comparison app

* Add to git credentials

* Different approach

* More fixes

* Fix for requirements

* Another approach

* Bah

* Change trigger to changes in method_comparison/

Manual trigger still possible

* Update method_comparison/README.md

* Satisfy Zizmor
2025-07-04 14:46:59 +02:00
4ad953aefb Bump version to 0.16.1.dev0 after release (#2632) 2025-07-04 14:46:48 +02:00
45996a1d6e Release 0.16.0 (#2629)
- Bump versions
- Update a comment to poin to new PR
- Remove a test skip that is obsolete after #2579
v0.16.0
2025-07-03 17:24:25 +02:00
79955723d8 Auto-tagging of PEFT models (#2599)
Features like inference need correctly set tags on the repo / the model card
in order to be available. Also the Hub uses tags to index the models and make
them searchable.

With this change PEFT tags models automatically as lora if they happen to
be trained with LoRA, the base model and a custom `peft:method:<the method>`
tag.

* Base model tags were never supported, they are now

Before PEFT simply ignored tags provided by the base model. Now the
base model tags are added to the PEFT-specific model tags.

* Tag 'tranformers' and add pipeline tag if possible

We remove the `peft:method:*` tag because this change needs more discussion
and is partially unrelated to this change. It is replaced by the necessary
`transformers` tag if the model is based on transformers.

We're also trying to resolve the pipeline tag automatically if it isn't set.
While there is the `transformers.pipelines.base.SUPPORTED_PEFT_TASKS` mapping
it is not sufficient to resolve the pipeline tag automatically since it is
not a 1:1 mapping. Only the causal LM case is a unique mapping.

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-07-03 11:45:26 +02:00
180777ea97 TST Update diffusers hotswap tests (#2619)
When the diffusers hotswap tests were added to PEFT in #2120, the
diffusers test was marked as xfail because hotswapping was not yet
implemented in diffusers. This has long been achieved but the test was
not updated.

This PR now updates the diffusers test in PEFT and removes the xfail.
The new test is basically a copy of the corresponding test in diffusers.
Moreover, I enhanced the test according to #2611 to also ensure that
there are no CUDA graph re-records.
2025-07-02 16:56:55 +02:00
ce3b995f5b FIX CI Multi-GPU tests require device_map (#2612)
As discussed internally, since
https://github.com/huggingface/transformers/pull/37982, some multi-GPU
tests started failing because all parameters are loaded onto a single
GPU. This should now be fixed by providing an explicit device_map
instead of relying on "auto".

Furthermore, for an unknown reason, the HQQ test started failing as the
correlation dipped below 0.97 -- to 0.9696 actually. I think this is
close enough to not warrant further investigation. Therefore, I only
decreased the threshold.
2025-07-02 16:56:18 +02:00
05395fb2de FIX Type annotation error in method comparison (#2628)
Resolves an issue introduced by #2617
2025-07-02 16:33:22 +02:00
2bc97c02b7 FIX Improved handling of conv groups (#2567)
More generalized handling of groups argument in LoRA/DoRA conv layers
(previous solution: #2403).
2025-06-30 16:49:09 +02:00
e6577076bf FEAT Add C3A (Circular Convolution Adaptation) (#2577)
Add new PEFT method C³A (Circular Convolution Adaptation).

From "Parameter-Efficient Fine-Tuning via Circular Convolution":
https://arxiv.org/abs/2407.19342
2025-06-30 14:17:11 +02:00
456292649a FIX Update signature for resolve_lora_variant (#2618)
The function signature was missing **kwargs, which results in a failure
after merging #2571.
2025-06-27 16:57:05 +02:00
87703ba0e5 TST Skip (more) failing MacOS tests (#2620)
We have new MacOS tests that are failing, presumably due to the old
torch version used for MacOS GH CI runners. It's just a handful of tests
related to prefix tuning, IMO not worth trying to fix, as the error is
deep within transformers. Therefore, just skip these tests.
2025-06-27 16:56:51 +02:00
171da8ed60 FIX Attention mask dict issue, generate w/ gemma (#2579)
Resolves CI errors such as this one:

https://github.com/huggingface/peft/actions/runs/15481482956/job/43588020111#step:5:53182

After resolving that error, other errors can occur, but they're
unrelated and investigated independently.

After the transformers change in
https://github.com/huggingface/transformers/pull/37866, it can happen
that:

> Models using different types of attention in different layers (i.e.
gemma3) will now have a dict returned by
prepare_inputd_for_generation (one dict entry per attention type)

As PEFT operates on the attention mask for prompt learning methods, we
need to adjust the code for the possibility of attention_mask being a
dict. Right now, I simply extract the single value if the dict is just
one element. For other sizes, I just raise an error, as I don't know how
to deal with that. For our tests, this is enough but we might need to
find a better solution in the future.
2025-06-27 13:40:09 +02:00
bbc9f5dc8b FIX Avoid CUDA Graph re-record with hotswap (#2611) 2025-06-27 11:33:09 +02:00
d26f332543 ENH Method comparison: temp result files with ts (#2617)
In #2593, the timestamp was removed from the file name of result files.
This makes sense for the proper results, as those should have unique
file names and are tracked in git. However, for temporary and cancelled
results, this is not true. Therefore, the timestamp is added back in.

Moreover, I applied ruff to the MetaMathQA/ directory (it's not applied
automatically) and fixed some imports. Ruff seems to get confused about
local modules, thus the data and utils import are treated differently,
but IMO no big deal.
2025-06-26 16:48:10 +02:00
5af0cbe4ee FIX: Trainable tokens error with DeepSpeed ZeRO3 (#2605)
Resolves #2603

Trainable tokens are erroring when using DS Z3 because the embedding
weights are not available on all ranks. This solution fixes this in an
efficient way that collects these weights on a single rank, initializes
them, and then broadcasts only the slice that is affected.
2025-06-26 16:47:58 +02:00
d936478f07 ENH Make OFT faster and more memory efficient (#2575)
Make OFT faster and more memory efficient. This new version of OFT is
not backwards compatible with older checkpoints and vice versa. To load
older checkpoints, downgrade PEFT to 0.15.2 or lower.
2025-06-26 14:27:03 +02:00
e34852f7b6 ENH Support Quantization-Aware LoRA with GPTQ (#2571)
Support for Quantization-Aware Low-Rank Adaptation (QALoRA) for GPTQ.
2025-06-26 11:51:38 +02:00
bda9665bc9 Results with number of parameters + full fine tuning (#2602)
This change updates all results with their respective number of
parameters (trained + absolute) and adds the newly introduced
full-finetuning.

In addition to these results there was also an issue with the
Makefile as it didn't consider the possibility of having experiments
that don't have an adapter config (e.g., full fine-tuning).
2025-06-24 18:00:46 +02:00
d67d03439c TST XPU regression tests with deterministic (#2600)
---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-06-24 15:42:03 +02:00
59ef3b93c8 FIX: Transformers VLM architecture changes (#2574)
FIX Transformers VLM architecture changes

Follow up to #2554
See discussion in https://github.com/huggingface/transformers/pull/38627

To quote:

> transformers PR #37033 re-arranges the way visual language models are
built by moving the LM head from the language model to the top-level
VLM (among other things).

A consequence of this is that the keys in the PEFT state_dict now also
follow the new architecture. This means that:

1. If a PEFT checkpoint was saved with the old architecture but is
   loaded with the new architecture, loading fails.
2. If a PEFT checkpoint was saved with the new architecture but is
   loaded with the old architecture, loading fails.

1. can be addressed by making use of the newly added
_checkpoint_conversion_mapping attribute for models with the new
architecture. In transformers, this is used to map old model state_dicts
to the new state_dict format. In PEFT, with some fiddling, we can use
the same mapping to make old PEFT state_dicts compatible with the new
architecture (backwards compatibility).

However, 2. is not easily addressed. We would need a reverse mapping for
this. This could be easily derived from _checkpoint_conversion_mapping,
but since this attribute doesn't exist on old models, we cannot do that.
Therefore, new checkpoints created with PEFT on these models won't load
successfully when users use old transformers (forward compatibility).

These cases are covered by the added unit tests, which means that the
test covering case 2 are marked as xfail.

If we could reliably detect that we are in case 2, we could warn the
user and advise them to upgrade transformers, but I don't know if it's
possible to figure this out.

We also allow users to pass their own key_mapping to from_pretrained and
load_adapter, though the documentation advises against it. This argument
could theoretically be used as a workaround in case there is indeed an
issue with prompt learning state_dicts.

Apart from these changes, I also made a small change to account for
https://github.com/huggingface/transformers/issues/38017#issuecomment-2935889679.
2025-06-23 17:39:40 +02:00
bd893a8a36 TST Enable some further XPU tests to pass (#2596)
---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-06-23 14:51:49 +02:00
5fe7f8f8ab ENH: Method comparison allow full finetuning (#2597)
- Allow full fine-tuning
- Add an experiment for full fine-tuning
- Rename some column names with wrong names
- Remove redundant metric
- Factor out file size calculation (estimate for FT)
2025-06-19 18:10:20 +02:00