frozenleaves/peft - peft - Gitea: Git for Me

mirror of https://github.com/huggingface/peft.git synced 2025-10-20 15:33:48 +08:00

Author	SHA1	Message	Date
Benjamin Bossan	53c25fe4fd	Release: 0.17.1 changes (#2739 ) * FIX Multiple issues with target_parameters (#2710) * Bump version to 0.17.1 v0.17.1	2025-08-21 11:06:08 +02:00
Benjamin Bossan	48f6493f94	Release 0.17.0 (#2691 ) - Bump versions - Fix a few TODO comments - A bit of clean up in test_target_paramters.py v0.17.0	2025-08-01 18:44:24 +02:00
Benjamin Bossan	337be05f03	ENH: Adapter injection based on state_dict (#2637 ) Make it possible to inject the PEFT adapters based on a state_dict instead of the PEFT config. See https://github.com/huggingface/diffusers/issues/11874 for context. Description Right now, when creating a PEFT adapter like LoRA, the adapter layers are injected based on the PEFT config, most notably the entries in `target_modules`, but other arguments also play into this. Generally, this is a good approach, but it breaks down in some situations. For instance, in diffusers, we often have the situation that the checkpoint was created without PEFT/diffusers, thus there is no PEFT config, only the `state_dict`. To load these checkpoints in diffusers, the current approach is to reverse-engineer a valid PEFT config based on the keys in the `state_dict`. Unfortunately, this is error prone. Moreover, not every combination of `state_dict` keys can be easily expressed in a PEFT config through a combination of `target_modules`, `exclude_modules`, etc. Yes, in theory everything can be expressed by passing `target_module=<regex_pattern>`, but reverse-engineering such a regex correctly and efficiently is very hard (and thus currently not done). This PR implements a completely different approach to inject adapters. Instead of relying on the PEFT config to determine which layers to target, it takes the `state_dict` directly as the source of truth. This should allow to exactly match what is desired. Implementation details I took care to implement this change in a way that if no `state_dict` is passed, the exact same code path as previously is taken. The risk of breaking anything should thus be minimized. Technically, it is not necessary to pass the `state_dict`, we are only interested in the keys. I still called the argument `state_dict`, since that is typically what we have at this point, but this can be easily changed. I thought it might be a good idea, if the `state_dict` is used, to still check what modules would have been targeted if we had used the PEFT config. Then, the results are compared and a warning is given if they differ. This allows the user to see if the PEFT config is not correctly specified. While running some diffusers tests, I never encountered this warning, which is good. However, if we plan, for instance, to get rid of all the reverse engineering of the PEFT config in diffusers, it would make more sense to not give this warning. Caveats When the original LoRA model was using `target_parameters`, injecting from `state_dict` will not work correctly. The problem is that the `state_dict` looks the same, whether the module or a parameter was targeted. Therefore, we cannot correctly determine the user's intent. For now, what I decided to do is: 1. Always assume that `target_modules` is meant, as it's the far more common occurrence. 2. When we detect `target_parameters` while using `state_dict` for injection, we raise an error. 3. If we don't detect this, injection might just slip through, resulting in modules being targeted (if they are valid modules) instead of parameters. 4. Document that these two features don't work together. I think overall, this is not too concerning, as both features are rather niche and thus unlikely to be used in conjunction. Related changes While working on this PR, I made a couple of related, though not strictly necessary, changes: - Refactor tests in `test_low_level_api.py` to use pytest instead of unittest - Add default target modules for LoHa and LoKr (just copying LoRA) - Most PEFT method's model classes like `LoraModel` had an `__init__` that effectively just called `super()` with the same arguments. I removed these `__init__` methods.	2025-08-01 18:39:53 +02:00
J.L	bb4fb50e2b	FEAT Add MiSS as a replacement for Bone. (#2604 ) Add MiSS, an evolution of Bone, from https://arxiv.org/abs/2409.15371. MiSS will replace Bone, which is now deprecated. A script to convert Bone checkpoints to MiSS checkpoints is included.	2025-08-01 18:37:20 +02:00
githubnemo	a91ec33fc5	Fix not detecting regex-targeted embedding layer (#2649 ) This issue was found in PR #2638 and is defined thusly: > When calling `get_peft_model_state_dict(..., save_embedding_layers="auto")` we check if the > embedding layer is targetted to determine if the embedding layers need saving. This is not > done when `PeftConfig.target_modules` is a regex-string, potentially missing to save embeddings. This is fixed by adding a check similar to the existing query of whether `EMBEDDING_LAYER_NAMES` is a subset of the defined target modules, only that the regex matching from `BaseTuner.inject_adapter` is used. To avoid code duplication, the matching was moved to its own utility function `match_target_against_key`. The main complication was to define the test-cases as it was non-trivial to find what the meaning of `save_embedding_layers="auto"` entails. I've assembled a list of cases that I think are correct in the corresponding unit test.	2025-07-31 16:08:32 +02:00
Benjamin Bossan	25e5c6b25c	FIX Missing device map for facebook/opt-125m (#2675 ) Fixes the failing EETQ test in the nighly multi device CI. In #2612, fixed device_maps were added for multi-GPU training as we could not rely on device_map="auto". While doing this change, one device_map was missing, namely for facebook/opt-125m, which is used in the EETQ multi device test. This device_map was now added. This makes the test pass locally.	2025-07-30 20:02:22 +02:00
Benjamin Bossan	5e00266e85	TST: Add more HF Hub model caching (#2682 ) A bunch of tests in test_tuners_utils.py didn't use the decorator so far, which is now fixed. This should hopefully help reduce timeouts. Moreover, the iris dataset loading is now moved to a module-scoped fixture (before, it was just loaded on module level). This doesn't help with caching, but it prevents loading of this dataset when the corresponding tests are not even run.	2025-07-30 20:02:07 +02:00
Benjamin Bossan	46ae69ac29	FIX Small fixes to target_parameters (#2677 ) 1. Better error message when same layer targeted twice 2. Remove unused attribute num_experts from _LoraParameterProxy	2025-07-30 14:34:04 +02:00
Benjamin Bossan	1c853eaaad	Fix trainable tokens with fsdp (#2681 ) When using FSDP with trainable tokens, there was an error when retrieving the state_dict of the TrainableTokensWrapper. The reason is that for the state_dict that is passed to get_peft_model_state_dict, the FSDP wrapper was already unwrapped, which means the keys don't have the FSDP-specific prefix. However, in the PEFT code, when looking up keys from said state_dict, the prefix was not removed. Now it is removed, making the lookup succeed. The same logic applies to set_peft_model_state_dict. I could successfully start training with FSDP and trainable tokens locally by adjusting the examples/sft script to include trainable tokens. Checkpoints could be successfully created and resumed from. The only change I needed to make was to configure use_orig_params=True for FSDP.	2025-07-30 14:33:53 +02:00
Benjamin Bossan	c11a9dfeaa	FIX Failing target_parameters param usage count (#2676 ) For testing target_parameters, we use a tiny Llama4 model. This model was refactored in https://github.com/huggingface/transformers/pull/39501, resulting in one parameter being accessed an additional time: https://github.com/huggingface/transformers/pull/39501/files#diff-e668ec07f78afdb2cb805d939e47453757f0b9437436cb860fcb7cb2431c9cf5R69 Therefore, a unit test that relied on how often this parameter was accessed started failing. This PR updates the count to the correct number. Additionally debug print statements that were accidentally left over are now removed.	2025-07-30 12:29:51 +02:00
githubnemo	92d65cafa5	Update extending vocab docs (#2669 ) - Recommends trainable tokens as first measure - Clarifies a few things about saving embeddings - Adds full-finetuning as an option of last resort --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>	2025-07-25 13:09:00 +02:00
Benjamin Bossan	434651346c	ENH: Targeting multiple parameters on the same module (#2665 ) When the target_parameters feature for LoRA was introduced in #2638, there was one gap, namely the possibility to target multiple nn.Parameters on the same module (there was only a workaround involving multiple adapters, but that is not user friendly). With this PR, it is now possible to achieve this. The mechanism to enable this is a bit crude, namely allowing to nest multiple ParamWrappers. This should generally be fine as long as there are only a couple of nn.Parameters being targeted on the same module. When there are dozens or hundreds, this approach could load to slow downs or other issues. A side effect of this implementation is that the ParamWrapper, when it removes the parametrization, now only removes its own parametrization. When using nn.utils.parametrize.remove_parametrization, it removes all parametrizations, which is bad when we have nested parametrizations. Alternative approaches Some alternative approaches were discussed internally but the chosen one was considered most practical. Allow to have more than one adapted parameter per LoRA layer. This would require to have nested dicts for the LoRA parameters, something like self.lora_A[adapter_name][parameter_name]. We don't have this anywhere so far and it would probably break implicit assumptions about PEFT layers in many places (like, parsing of state_dict keys), requiring many adjustments. Have an auxiliary module that contains the individual LoRA layers that target the individual parameters. This could be the cleanest solution and would probably be more efficient if there are a huge number of targeted parameters per module. However, this also brings extra complexity, as it requires implementing the logic of how to route the information to the right parameter, and it may be a solution to a problem that is irrelevant in practice (large number of targets per module).	2025-07-24 19:42:19 +02:00
githubnemo	43845f9b14	Method Comparison: Improve formatting/layout of table (#2670 ) * Method Comparison: Improve formatting/layout of table Quick improvement to reduce the dominance of columns like `{peft,train}_config` and make numbers a bit more readable through proper decimal/thousands formatting. * Bump gradio version to accomodate required fixes	2025-07-24 19:02:09 +02:00
Efraim Dahl	663b1209fd	ENH Llama-Adapters support for GPT2 (#2643 ) aka "adaption prompt"	2025-07-24 14:51:16 +02:00
Quentin Gallouédec	04a5ed7b2f	DOC Fix error in code example (#2666 )	2025-07-24 12:13:41 +02:00
gapsong	a795199ffa	Update tokenizer parameter in sfttrainer across multiple examples (#2664 ) * REFAC Update tokenizer parameter to processing_class in SFTTrainer instances across multiple examples * REFAC Replace tokenizer parameter with processing_class in Trainer instances across documentation and examples * Refactor tokenizer parameter to processing_class in various examples - Updated the Trainer initialization in corda_finetuning.py to use processing_class instead of tokenizer. - Changed the execution_count to null in image_classification_peft_lora.ipynb. - Modified the tokenizer parameter to processing_class in image_classification_peft_lora.ipynb. - Adjusted the tokenizer parameter to processing_class in peft_bnb_whisper_large_v2_training.ipynb. - Updated the README.md in lorafa_finetune to reflect the change from tokenizer to processing_class in Trainer initialization. * REFAC Update tokenizer parameter to processing_class in Seq2SeqTrainer instantiation * REFAC Replace tokenizer parameter with processing_class in README and notebook examples	2025-07-23 15:30:28 +02:00
Yao Matrix	f650b08abb	make method comparison device agnostic, so it can expand to more accelerators like XPU (#2610 ) make method comparision device agnostic, so it can expand to more accelerators like XPU --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-07-22 15:25:56 +02:00
Benjamin Bossan	e77924563a	FIX Prefix tuning after transformers PR 38635 (#2662 ) Due to https://github.com/huggingface/transformers/pull/38635, several tests involving prefix tuning broke: https://github.com/huggingface/peft/actions/runs/16417140904/job/46385751329 This PR fixes this by resoling two issues: 1. The _supports_cache_class attribute was removed, we can now assume that it is True if the attribute does not exist. 2. We had special handling of past_key_values for GPTBigCodeForCausalLM which is no longer required (nor valid) after that PR, so it is removed depending on the transformers version.	2025-07-22 13:59:34 +02:00
Alberto Simões	fa85d10a7f	Update README.md (#2659 ) Update bibtex entry.	2025-07-21 14:36:02 +02:00
Benjamin Bossan	f3b97c3704	FEAT Allow LoRA to target nn.Parameter (#2638 ) Normally, nn.Parameter cannot be targeted with LoRA adapters. This can be problematic, e.g. when there are MoE layers that use nn.Parameter directly, or when there is nn.Linear but the weight is passed directly instead of calling forward (e.g. MHA). It would be possible to craft a solution involving a special LoRA layer for each of the modules that use nn.Parameter directly (e.g. lora.MHA) but that doesn't scale. This PR is implements a direct way to target nn.Parameter making use of torch.nn.utils.parametrize. Using the feature requires passing target_parameters to the LoraConfig. During the forward pass, when the parameter is acceessed, the LoRA weights are added to the weights while still ensuring that gradients flow correctly to the LoRA weights. Right now, only LoRA supports this feature. Moreover, it is not possible to target multiple parameters of the same module with the same adapter. A workaround is to use multiple adapters (i.e. with different names). --------- Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>	2025-07-15 16:18:46 +02:00
Benjamin Bossan	22506a8e42	FIX Deploy method comp app: error in workflow file (#2645 ) Fixing the error: permissions: contents: {} Check failure on line 11 in .github/workflows/deploy_method_comparison_app.yml GitHub Actions / Deploy "method_comparison" Gradio to Spaces Invalid workflow file The workflow is not valid. .github/workflows/deploy_method_comparison_app.yml (Line: 11, Col: 13): A mapping was not expected	2025-07-14 14:48:06 +02:00
Benjamin Bossan	1c75d96aca	FIX: Prompt learning methods modules_to_save issue (#2646 ) When using prompt learning methods, modules_to_save was not correctly set automatically. This is really bad when using, for instance, sequence classification tasks, which require the classifier layer to be added to modules_to_save. The issue was introduced in #2220 where it is wrongly assumed that the PEFT config always has a modules_to_save attribute, which is not true for prompt learning. In #2481, this was partly fixed by using getattr to avoid an error. However, this did not resolve the fundamental issue that for prompt learning, there is no such attribute, resulting in module_to_save not being applied. This PR proposes to fix this by adding modules_to_save to the prompt learning configs.	2025-07-14 13:57:33 +02:00
kkb-code	a4f9334f12	FEAT Add SHiRA Adapters (#2584 ) Implements: Sparse High Rank Adapters Paper: https://arxiv.org/abs/2406.13175	2025-07-14 11:16:10 +02:00
githubnemo	35000fda88	Fix #2634 : Allow peft_method to be a string (#2635 ) The auto-tagging code assumed that every `PeftConfig.peft_type` value is an Enum value but when adding custom types without modifying the enum it is possible to have strings as well (and the interface supports that). This change allows for string values of `PeftConfig.peft_type` in the auto-tagging code.	2025-07-08 11:13:06 +02:00
Benjamin Bossan	0755ab93f6	FIX Faulty OFT parameter device test (#2630 ) There is an error in an OFT test because .cpu() is called on a parameter instead of a module. Calling it on parameter is not an in-place operation, so it has no effect.	2025-07-07 15:57:06 +02:00
Benjamin Bossan	fa9e429e93	FIX Correctly skip AWQ test based on torch version (#2631 ) There is currently an issue with a multi-GPU test using AutoAWQ. Thus, PR #2529 introduced an unconditional skip for this test. In #2596, a condition was added to only skip with torch 2.7, as other torch versions are not affected. However, the is_torch_version function does not actually match minor and patch versions, so is_torch_version("==", "2.7") returns False when using version 2.7.1. This PR fixes that by checking both "2.7.0" and "2.7.1" explicitly. This is not very robust in case that there are further patch releases of PyTorch. However, that is unlikely, and introducing a more general solution is IMO not worth it just for this instance.	2025-07-07 15:55:37 +02:00
Benjamin Bossan	d76f3fe98c	FIX Create mask function signature change (#2633 ) We use create_mask_for_generate from transformers. It was introduced in v4.53.0 but in v4.53.1, the function signature was changed to include position_ids as mandatory argument: https://github.com/huggingface/transformers/pull/39194 This breaks our function call in PEFT. This PR fixes the function call by passing position_ids. This in turn would break the function call with transformers v4.53.0, thus a strict version check is being used for >= v4.53.1.	2025-07-07 11:46:57 +02:00
kaixuanliu	b960d259e8	ENH Enable FSDP example for GPTQ quantized model (#2626 ) Besides fixes, includes an example script that uses `hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4` --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>	2025-07-07 11:08:03 +02:00
Benjamin Bossan	9f01809e70	FEAT: Add GH action to deploy method comparison app (#2625 ) * FEAT Add GH action to deploy method comparison app * Add to git credentials * Different approach * More fixes * Fix for requirements * Another approach * Bah * Change trigger to changes in method_comparison/ Manual trigger still possible * Update method_comparison/README.md * Satisfy Zizmor	2025-07-04 14:46:59 +02:00
Benjamin Bossan	4ad953aefb	Bump version to 0.16.1.dev0 after release (#2632 )	2025-07-04 14:46:48 +02:00
Benjamin Bossan	45996a1d6e	Release 0.16.0 (#2629 ) - Bump versions - Update a comment to poin to new PR - Remove a test skip that is obsolete after #2579 v0.16.0	2025-07-03 17:24:25 +02:00
githubnemo	79955723d8	Auto-tagging of PEFT models (#2599 ) Features like inference need correctly set tags on the repo / the model card in order to be available. Also the Hub uses tags to index the models and make them searchable. With this change PEFT tags models automatically as lora if they happen to be trained with LoRA, the base model and a custom `peft:method:<the method>` tag. * Base model tags were never supported, they are now Before PEFT simply ignored tags provided by the base model. Now the base model tags are added to the PEFT-specific model tags. * Tag 'tranformers' and add pipeline tag if possible We remove the `peft:method:*` tag because this change needs more discussion and is partially unrelated to this change. It is replaced by the necessary `transformers` tag if the model is based on transformers. We're also trying to resolve the pipeline tag automatically if it isn't set. While there is the `transformers.pipelines.base.SUPPORTED_PEFT_TASKS` mapping it is not sufficient to resolve the pipeline tag automatically since it is not a 1:1 mapping. Only the causal LM case is a unique mapping. --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>	2025-07-03 11:45:26 +02:00
Benjamin Bossan	180777ea97	TST Update diffusers hotswap tests (#2619 ) When the diffusers hotswap tests were added to PEFT in #2120, the diffusers test was marked as xfail because hotswapping was not yet implemented in diffusers. This has long been achieved but the test was not updated. This PR now updates the diffusers test in PEFT and removes the xfail. The new test is basically a copy of the corresponding test in diffusers. Moreover, I enhanced the test according to #2611 to also ensure that there are no CUDA graph re-records.	2025-07-02 16:56:55 +02:00
Benjamin Bossan	ce3b995f5b	FIX CI Multi-GPU tests require device_map (#2612 ) As discussed internally, since https://github.com/huggingface/transformers/pull/37982, some multi-GPU tests started failing because all parameters are loaded onto a single GPU. This should now be fixed by providing an explicit device_map instead of relying on "auto". Furthermore, for an unknown reason, the HQQ test started failing as the correlation dipped below 0.97 -- to 0.9696 actually. I think this is close enough to not warrant further investigation. Therefore, I only decreased the threshold.	2025-07-02 16:56:18 +02:00
Benjamin Bossan	05395fb2de	FIX Type annotation error in method comparison (#2628 ) Resolves an issue introduced by #2617	2025-07-02 16:33:22 +02:00
Shubham Patel	2bc97c02b7	FIX Improved handling of conv groups (#2567 ) More generalized handling of groups argument in LoRA/DoRA conv layers (previous solution: #2403).	2025-06-30 16:49:09 +02:00
Aochuan	e6577076bf	FEAT Add C3A (Circular Convolution Adaptation) (#2577 ) Add new PEFT method C³A (Circular Convolution Adaptation). From "Parameter-Efficient Fine-Tuning via Circular Convolution": https://arxiv.org/abs/2407.19342	2025-06-30 14:17:11 +02:00
Benjamin Bossan	456292649a	FIX Update signature for resolve_lora_variant (#2618 ) The function signature was missing **kwargs, which results in a failure after merging #2571.	2025-06-27 16:57:05 +02:00
Benjamin Bossan	87703ba0e5	TST Skip (more) failing MacOS tests (#2620 ) We have new MacOS tests that are failing, presumably due to the old torch version used for MacOS GH CI runners. It's just a handful of tests related to prefix tuning, IMO not worth trying to fix, as the error is deep within transformers. Therefore, just skip these tests.	2025-06-27 16:56:51 +02:00
Benjamin Bossan	171da8ed60	FIX Attention mask dict issue, generate w/ gemma (#2579 ) Resolves CI errors such as this one: https://github.com/huggingface/peft/actions/runs/15481482956/job/43588020111#step:5:53182 After resolving that error, other errors can occur, but they're unrelated and investigated independently. After the transformers change in https://github.com/huggingface/transformers/pull/37866, it can happen that: > Models using different types of attention in different layers (i.e. gemma3) will now have a dict returned by prepare_inputd_for_generation (one dict entry per attention type) As PEFT operates on the attention mask for prompt learning methods, we need to adjust the code for the possibility of attention_mask being a dict. Right now, I simply extract the single value if the dict is just one element. For other sizes, I just raise an error, as I don't know how to deal with that. For our tests, this is enough but we might need to find a better solution in the future.	2025-06-27 13:40:09 +02:00
Emre Cakiroglu	bbc9f5dc8b	FIX Avoid CUDA Graph re-record with hotswap (#2611 )	2025-06-27 11:33:09 +02:00
Benjamin Bossan	d26f332543	ENH Method comparison: temp result files with ts (#2617 ) In #2593, the timestamp was removed from the file name of result files. This makes sense for the proper results, as those should have unique file names and are tracked in git. However, for temporary and cancelled results, this is not true. Therefore, the timestamp is added back in. Moreover, I applied ruff to the MetaMathQA/ directory (it's not applied automatically) and fixed some imports. Ruff seems to get confused about local modules, thus the data and utils import are treated differently, but IMO no big deal.	2025-06-26 16:48:10 +02:00
Benjamin Bossan	5af0cbe4ee	FIX: Trainable tokens error with DeepSpeed ZeRO3 (#2605 ) Resolves #2603 Trainable tokens are erroring when using DS Z3 because the embedding weights are not available on all ranks. This solution fixes this in an efficient way that collects these weights on a single rank, initializes them, and then broadcasts only the slice that is affected.	2025-06-26 16:47:58 +02:00
Zeju Qiu	d936478f07	ENH Make OFT faster and more memory efficient (#2575 ) Make OFT faster and more memory efficient. This new version of OFT is not backwards compatible with older checkpoints and vice versa. To load older checkpoints, downgrade PEFT to 0.15.2 or lower.	2025-06-26 14:27:03 +02:00
gapsong	e34852f7b6	ENH Support Quantization-Aware LoRA with GPTQ (#2571 ) Support for Quantization-Aware Low-Rank Adaptation (QALoRA) for GPTQ.	2025-06-26 11:51:38 +02:00
githubnemo	bda9665bc9	Results with number of parameters + full fine tuning (#2602 ) This change updates all results with their respective number of parameters (trained + absolute) and adds the newly introduced full-finetuning. In addition to these results there was also an issue with the Makefile as it didn't consider the possibility of having experiments that don't have an adapter config (e.g., full fine-tuning).	2025-06-24 18:00:46 +02:00
jiqing-feng	d67d03439c	TST XPU regression tests with deterministic (#2600 ) --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-06-24 15:42:03 +02:00
Benjamin Bossan	59ef3b93c8	FIX: Transformers VLM architecture changes (#2574 ) FIX Transformers VLM architecture changes Follow up to #2554 See discussion in https://github.com/huggingface/transformers/pull/38627 To quote: > transformers PR #37033 re-arranges the way visual language models are built by moving the LM head from the language model to the top-level VLM (among other things). A consequence of this is that the keys in the PEFT state_dict now also follow the new architecture. This means that: 1. If a PEFT checkpoint was saved with the old architecture but is loaded with the new architecture, loading fails. 2. If a PEFT checkpoint was saved with the new architecture but is loaded with the old architecture, loading fails. 1. can be addressed by making use of the newly added _checkpoint_conversion_mapping attribute for models with the new architecture. In transformers, this is used to map old model state_dicts to the new state_dict format. In PEFT, with some fiddling, we can use the same mapping to make old PEFT state_dicts compatible with the new architecture (backwards compatibility). However, 2. is not easily addressed. We would need a reverse mapping for this. This could be easily derived from _checkpoint_conversion_mapping, but since this attribute doesn't exist on old models, we cannot do that. Therefore, new checkpoints created with PEFT on these models won't load successfully when users use old transformers (forward compatibility). These cases are covered by the added unit tests, which means that the test covering case 2 are marked as xfail. If we could reliably detect that we are in case 2, we could warn the user and advise them to upgrade transformers, but I don't know if it's possible to figure this out. We also allow users to pass their own key_mapping to from_pretrained and load_adapter, though the documentation advises against it. This argument could theoretically be used as a workaround in case there is indeed an issue with prompt learning state_dicts. Apart from these changes, I also made a small change to account for https://github.com/huggingface/transformers/issues/38017#issuecomment-2935889679.	2025-06-23 17:39:40 +02:00
Yao Matrix	bd893a8a36	TST Enable some further XPU tests to pass (#2596 ) --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-06-23 14:51:49 +02:00
Benjamin Bossan	5fe7f8f8ab	ENH: Method comparison allow full finetuning (#2597 ) - Allow full fine-tuning - Add an experiment for full fine-tuning - Rename some column names with wrong names - Remove redundant metric - Factor out file size calculation (estimate for FT)	2025-06-19 18:10:20 +02:00

1 2 3 4 5 ...

1384 Commits