frozenleaves/peft - peft - Gitea: Git for Me

mirror of https://github.com/huggingface/peft.git synced 2025-10-20 15:33:48 +08:00

Author	SHA1	Message	Date
Massimo Bini	2813b9c4bf	FEAT Add DeLoRA (#2780 ) Implements DeLoRA: "Decoupling Angles and Strength in Low-rank Adaptation" (https://huggingface.co/papers/2503.18225). Similar to DoRA, DeLoRA decouples the angular learning from the adaptation strength, but it also allows to limit the norm of the change. This way, DeLoRA promises to reduce the risk of catastrophic forgetting and to be more robust to hyper-parameter settings such as the learning rate.	2025-10-17 16:24:46 +02:00
Benjamin Bossan	8d8aa0b716	Method comparison: LoRA that targets MLP modules (#2845 ) The "LoRA Without Regret" blog post (https://thinkingmachines.ai/blog/lora/) mentions that targeting the MLP part of the transformer is more effective than targeting the attention modules. This experiment tests this by targeting: ["gate_proj", "up_proj", "down_proj"] instead of the default layers (["q_proj", "v_proj"]). I chose a rank to match the parameter count we would get when targeting the attention modules with rank 32, which is rank 10. Testing on my machine, there is indeed a nice improvement in the test score: \| metric \| target attention \| target MLP \| \|----------------------\|------------------\|------------\| \| test accuracy \| 48.2% \| 51.3% \| \| # trainable params \| 9175040 \| 9461760 \| \| peak memory reserved \| 20.74 GB \| 23.02 GB \| There is, however, also a marked increase in memory usage, despite matching parameter count. Since the operations are different, this may not be a surprise, but let's wait for the final verdict once this experiment runs on our AWS instance. Note: I also tested higher and lower ranks when targeting the MLP. The effect on memory usage was negligible, but it did improve the score: \| metric \| rank 8 \| rank 10 \| rank 12 \| rank 32 \| \|--------------------\|---------\|---------\|----------\|----------\| \| test accuracy \| 50.3% \| 51.3% \| 52.2% \| 54.8% \| \| # trainable params \| 7569408 \| 9461760 \| 11354112 \| 30277632 \| In the end, I chose only to add the rank 10 experiment to match the number of trainable parameters.	2025-10-16 17:37:02 +02:00
Shantanu Gupta	1a1f97263d	CHORE Replace deprecated torch_dtype with dtype (#2837 ) Note: Diffusers is left as is for now, might need an update later.	2025-10-16 14:59:09 +02:00
Benjamin Bossan	6392935921	Add prompt tuning experiment with sample vocab (#2824 ) A new initialization method was added to prompt tuning in #2815. This PR adds an experiment config for this method to the MetaMathQA benchmark. Testing locally, this got a test accuracy of 36%, compared to 25% with random initialization.	2025-10-13 16:54:45 +02:00
githubnemo	2f9f759587	Add num_trainable_params column to gradio app (#2819 ) While memory usage correlates with the number of trainable params, having this number directly makes it easier to see that methods are using similar numbers of trainable params and outliers can be inspected easily.	2025-10-13 14:36:58 +02:00
Ahmet Bilican	b0954e0daa	FEAT Add WaveFT method (#2560 ) Implements the paper "Exploring Sparsity for Parameter Efficient Fine Tuning Using Wavelets" (https://arxiv.org/abs/2505.12532). WaveFT enables fine-grained control over the number of trainable parameters by directly learning a sparse set of coefficients in the wavelet domain of residual matrices. Experiments show that it works well in the text-to-image generation space.	2025-10-07 10:58:49 +02:00
Benjamin Bossan	530d7bbf1e	Method comparison: Add MiSS result (#2740 ) - default - mini - bat Results are pretty close to the corresponding experiments with Bone, which is what we expected.	2025-09-25 17:58:22 +02:00
ppetrushkov	ce5c2044f1	FEAT RoAd: 2D Rotary Adaptation (#2678 ) Implements RoAd from https://arxiv.org/pdf/2409.00119 Supports mixed adapter batches.	2025-08-19 15:45:38 +02:00
Yao Matrix	95df499d87	ENH Support XPU in text gen benchmark (#2730 ) Signed-off-by: Yao, Matrix <matrix.yao@intel.com>	2025-08-12 11:08:43 +02:00
githubnemo	a475f56c81	Updated MetaMathQA results (#2686 ) - Updated results for OFT, C3A and shira - New results for trainable tokens (for completeness) Trainable tokens wasn't tuned a lot, we could probably search for better tokens and increase the learning rate. We can do this later.	2025-08-07 14:57:50 +02:00
VED	ec5a1c67b0	FEAT Text generation benchmark (#2525 ) Similar to #2395, this benchmark serves to compare different PEFT methods on an equal basis. This time, the goal is to measure metrics related to text generation, most notably speed and memory usage. The results should be easy to reproduce and compare. The actual experimental settings and results have yet to be added.	2025-08-07 10:17:32 +02:00
githubnemo	43845f9b14	Method Comparison: Improve formatting/layout of table (#2670 ) * Method Comparison: Improve formatting/layout of table Quick improvement to reduce the dominance of columns like `{peft,train}_config` and make numbers a bit more readable through proper decimal/thousands formatting. * Bump gradio version to accomodate required fixes	2025-07-24 19:02:09 +02:00
Yao Matrix	f650b08abb	make method comparison device agnostic, so it can expand to more accelerators like XPU (#2610 ) make method comparision device agnostic, so it can expand to more accelerators like XPU --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-07-22 15:25:56 +02:00
kkb-code	a4f9334f12	FEAT Add SHiRA Adapters (#2584 ) Implements: Sparse High Rank Adapters Paper: https://arxiv.org/abs/2406.13175	2025-07-14 11:16:10 +02:00
Benjamin Bossan	9f01809e70	FEAT: Add GH action to deploy method comparison app (#2625 ) * FEAT Add GH action to deploy method comparison app * Add to git credentials * Different approach * More fixes * Fix for requirements * Another approach * Bah * Change trigger to changes in method_comparison/ Manual trigger still possible * Update method_comparison/README.md * Satisfy Zizmor	2025-07-04 14:46:59 +02:00
Benjamin Bossan	05395fb2de	FIX Type annotation error in method comparison (#2628 ) Resolves an issue introduced by #2617	2025-07-02 16:33:22 +02:00
Aochuan	e6577076bf	FEAT Add C3A (Circular Convolution Adaptation) (#2577 ) Add new PEFT method C³A (Circular Convolution Adaptation). From "Parameter-Efficient Fine-Tuning via Circular Convolution": https://arxiv.org/abs/2407.19342	2025-06-30 14:17:11 +02:00
Benjamin Bossan	d26f332543	ENH Method comparison: temp result files with ts (#2617 ) In #2593, the timestamp was removed from the file name of result files. This makes sense for the proper results, as those should have unique file names and are tracked in git. However, for temporary and cancelled results, this is not true. Therefore, the timestamp is added back in. Moreover, I applied ruff to the MetaMathQA/ directory (it's not applied automatically) and fixed some imports. Ruff seems to get confused about local modules, thus the data and utils import are treated differently, but IMO no big deal.	2025-06-26 16:48:10 +02:00
githubnemo	bda9665bc9	Results with number of parameters + full fine tuning (#2602 ) This change updates all results with their respective number of parameters (trained + absolute) and adds the newly introduced full-finetuning. In addition to these results there was also an issue with the Makefile as it didn't consider the possibility of having experiments that don't have an adapter config (e.g., full fine-tuning).	2025-06-24 18:00:46 +02:00
Benjamin Bossan	5fe7f8f8ab	ENH: Method comparison allow full finetuning (#2597 ) - Allow full fine-tuning - Add an experiment for full fine-tuning - Rename some column names with wrong names - Remove redundant metric - Factor out file size calculation (estimate for FT)	2025-06-19 18:10:20 +02:00
githubnemo	179e29a756	Tracking of (trainable) parameters for MetaMathQA (#2598 ) This change adds tracking for the number of (trainable) parameters for each experiment Tracking the number of parameters, trainable and total, will make the results much more transparent regarding model capacity. If a method was accidentally trained with a lot more or less trainable parameters it would make for unfair results. Having these numbers will also make benchmarking parameter efficiency easier.	2025-06-19 18:08:25 +02:00
githubnemo	4721213828	Add Makefile + results for MetaMathQA task (#2593 ) These are the first results for the MetaMathQA task and also the first test of the Makefile used to run these tests. The Makefile offers the functionality to run individual experiments by specifying the result you want to have, e.g. `make results/adalora--llama-3.2-3B-rank32[...].json`. Alternatively you can simply run `make` for `make all` which runs all experiments that don't have a result yet or which have outdated configs (comparing result timestamp and config timestamp). The results are from the main branch. No errors happened during the run. There were errors with a compute instance that used a A10G 24GB because of OOM. L40S with 48GB RAM was fine. * Make sure to use original batch size for OFT This was not done previously because of runner memory constraints. * Remove timestamp from result files We're tracking the results in git for now which makes looking back easy enough (`git restore -s <rev> results`). This makes it easier for `make` to track the results that are already computed and which need to change.	2025-06-19 17:41:51 +02:00
githubnemo	6bcefb02c6	Input sanitizer for benchmark result renderer (#2594 ) Since `DataFrame.query` is potentially vulnerable we limit the possible filter input to a fixed grammar that is roughly like this: ``` expr = left op right left = ( expr ) \| literal right = ( expr ) \| literal op = in \| >= \| < \| <= \| == \| and \| or ``` this will give us boolean operations and basic comparisons. Note that `literal` can be arbitrary python literals (strings, tuples, ...).	2025-06-19 11:45:43 +02:00
Benjamin Bossan	d6dbbc9195	ENH: Method comparison improve logging (#2591 ) - Print early how the experiment is categorized - Last resort save_dir so that results are not lost - Catch errors in general, not only OOM - Log error message - Catch checkpoint saving in try ... except, just in case (otherwise, if it fails, no logs are written)	2025-06-17 12:14:56 +02:00
EskildAndersen	8af29c6468	added support for Conv1d for DoRA (#2531 ) DoRA now supports Conv1d layers and, notably, the check for how to deal with other than linear layers was softened from checking for 4 dimensions to now 3 dimensions since `Conv1d` layers have 3 elements instead of 4.	2025-05-12 20:33:58 +02:00
Paul Albert	6c48949930	Randlora documentation and some example usage (#2524 ) This is a follow up to #2464 and issue #2441. Entails documentation for RandLora and slightly updated example usage in the model.py docstring. Also adds RandLoRA to method comparison. --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>	2025-05-07 14:40:55 +02:00
Benjamin Bossan	6c054d0ff2	Method comparison: Support more options for the optimizer (#2479 ) Allow setting a different optimizer, including PEFT specific ones like LoRA+. Add experiment for LoRA-FA Update param name, rm obsolete directories	2025-05-05 15:41:43 +02:00
githubnemo	41921013f5	Method comparison evaluation suite (#2395 ) Introduction of a method evaluation suite. We generally face the problem that there is little knowledge on what PEFT methods perform best. To this end we decided to build an evaluation suite that has defined tasks, shared hyper-parameters and can be extended with new tasks and new method configurations over time. For the sake of comparison we've not decided to incorporate user-submitted results but we encourage users to inspect the results, suggest new experiments and improve the configuration of methods if they're deemed unfavorable. As of now there's only one task based on the MetaMathQA dataset which has the benefit of being complex while still fitting on a consumer GPU. Notable changes in this squash: * Add default training params The experiment specific training params use the default training params but can override any parameter from it if needed. However, this way it's easier to make a change to all experiments (say, I want to change the base model, I don't need to change each individual training_parameters.json). * Add possibility to change attn implementation However, both flash attention 2 and flex attention are slower on my system. Thus, stay with default None (-> SDPA). * Refactor to use GenerationConfig Allows to more easily use, say, static cache, which is the new default, as it's faster (apart from the first pass) * Better parsing of answers E.g. 1/2 == 0.5 * Keep adapter file by default after train run But add --clean to delete it. Keeping the adapter can be useful if the user wants to run further tests with the trained model. --------- Co-authored-by: Benjamin Bossan <benjamin.bossan@gmail.com>	2025-03-27 17:00:38 +01:00

28 Commits