frozenleaves/peft - peft - Gitea: Git for Me

mirror of https://github.com/huggingface/peft.git synced 2025-10-20 06:53:46 +08:00

Author	SHA1	Message	Date
Massimo Bini	2813b9c4bf	FEAT Add DeLoRA (#2780 ) Implements DeLoRA: "Decoupling Angles and Strength in Low-rank Adaptation" (https://huggingface.co/papers/2503.18225). Similar to DoRA, DeLoRA decouples the angular learning from the adaptation strength, but it also allows to limit the norm of the change. This way, DeLoRA promises to reduce the risk of catastrophic forgetting and to be more robust to hyper-parameter settings such as the learning rate.	2025-10-17 16:24:46 +02:00
Benjamin Bossan	8d8aa0b716	Method comparison: LoRA that targets MLP modules (#2845 ) The "LoRA Without Regret" blog post (https://thinkingmachines.ai/blog/lora/) mentions that targeting the MLP part of the transformer is more effective than targeting the attention modules. This experiment tests this by targeting: ["gate_proj", "up_proj", "down_proj"] instead of the default layers (["q_proj", "v_proj"]). I chose a rank to match the parameter count we would get when targeting the attention modules with rank 32, which is rank 10. Testing on my machine, there is indeed a nice improvement in the test score: \| metric \| target attention \| target MLP \| \|----------------------\|------------------\|------------\| \| test accuracy \| 48.2% \| 51.3% \| \| # trainable params \| 9175040 \| 9461760 \| \| peak memory reserved \| 20.74 GB \| 23.02 GB \| There is, however, also a marked increase in memory usage, despite matching parameter count. Since the operations are different, this may not be a surprise, but let's wait for the final verdict once this experiment runs on our AWS instance. Note: I also tested higher and lower ranks when targeting the MLP. The effect on memory usage was negligible, but it did improve the score: \| metric \| rank 8 \| rank 10 \| rank 12 \| rank 32 \| \|--------------------\|---------\|---------\|----------\|----------\| \| test accuracy \| 50.3% \| 51.3% \| 52.2% \| 54.8% \| \| # trainable params \| 7569408 \| 9461760 \| 11354112 \| 30277632 \| In the end, I chose only to add the rank 10 experiment to match the number of trainable parameters.	2025-10-16 17:37:02 +02:00
Nir	182f4c945a	ENH Add RWKV default target modules (#2810 )	2025-10-16 16:30:51 +02:00
Shantanu Gupta	1a1f97263d	CHORE Replace deprecated torch_dtype with dtype (#2837 ) Note: Diffusers is left as is for now, might need an update later.	2025-10-16 14:59:09 +02:00
Benjamin Bossan	87b90f045e	FIX TST Wrong attribute in LoftQ test (#2841 ) This is to fix an oversight from #2797, where the LoftQ test was sligthly refactored but one test was not updated accordingly.	2025-10-15 16:29:32 +02:00
Sambhav Dixit	086f187a4d	FIX DoRA embed_scale support (#2839 )	2025-10-15 12:07:51 +02:00
Sambhav Dixit	ec5a1b2ce6	FIX X-LoRA embed_scale support #2830 (#2831 )	2025-10-14 15:54:15 +02:00
Sambhav Dixit	9b8cf2a0c3	FIX Handle embed scale for trainable tokens, LoRA (#2825 ) Resolves #2809 Some models like Gemma3 apply a scalar to the embedding output. It needs to be taken into account when using trainable tokens or LoRA applied to the embedding layer.	2025-10-14 12:35:31 +02:00
Benjamin Bossan	6392935921	Add prompt tuning experiment with sample vocab (#2824 ) A new initialization method was added to prompt tuning in #2815. This PR adds an experiment config for this method to the MetaMathQA benchmark. Testing locally, this got a test accuracy of 36%, compared to 25% with random initialization.	2025-10-13 16:54:45 +02:00
Benjamin Bossan	25f97e663a	ENH: Add set_requires_grad method (#2807 ) This PR adds the set_requires_grad method to PEFT models (both PeftModel and BaseTuner). As the name suggests, this is a method to set the requires_grad attribute of the specified PEFT adapters. For more general context, this is mostly relevant when dealing with multiple adapters. As is, users can already set the active adapter(s) with set_adapter, which automatically adjust the requires_grad attribute too, so that only the active adapters will have grads enabled. However, there can be situations where activity status and requires grad may differ. Right now, users would need to manually set requires_grad to deal with that, which is error prone (e.g. forgetting modules_to_save). This PR closes this gap in the API. As this functionality is quite general purpose, I added a set_requires_grad function to functional.py for easier integration. Note: The set_requires_grad method will raise an error when called with prompt learning methods like prompt tuning. This is because these methods don't have a universal base class (BaseTuner and BaseTunerLayer) that would allow to add this API. Moreover, they only support a single adapter at a time, hence there is not much need to have this method in the first place. A side effect of not supporting prompt learning is that on the PeftModel, we are free to allow set_requires_grad to accept more than one adapter, which would normally be difficult, because prompt learning only allows one adapter.	2025-10-13 16:54:16 +02:00
Benjamin Bossan	61a11f9180	CI Testing transformers deprecations (#2817 ) Check if PEFT triggers transformers FutureWarning or DeprecationWarning by converting these warnings into failures.	2025-10-13 16:53:35 +02:00
githubnemo	2f9f759587	Add num_trainable_params column to gradio app (#2819 ) While memory usage correlates with the number of trainable params, having this number directly makes it easier to see that methods are using similar numbers of trainable params and outliers can be inspected easily.	2025-10-13 14:36:58 +02:00
jiqing-feng	2410f458c8	TST Change bad random seed (#2829 ) A seed was accidentally chosen that results in a test failing with XPU. Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-10-13 11:26:10 +02:00
jiqing-feng	879587f3db	FIX bnb weights can be dequantized on CPU (#2820 )	2025-10-10 12:29:54 +02:00
Sambhav Dixit	f8aca0a0c2	ENH Merging LoRAs supports negative weights (#2811 ) When using add_weighted_adapter, so far, there was an implicit assumption that all weights are positive. This PR allows negative weights to be passed. --------- Co-authored-by: Valentin Teutschbein <valentin.teutschbein@student.hpi.uni-potsdam.de>	2025-10-09 13:53:08 +02:00
Che-Xu	e9f5707e3f	FIX X-LoRA scaling storage and per token normalization (#2793 )	2025-10-09 13:36:54 +02:00
macmacmacmac	2c29cf7936	ENH Add sample vocab init to PromptEmbedding (#2815 )	2025-10-09 12:21:40 +02:00
Benjamin Bossan	31989eab83	FIX DOC Add missing TOC entry for WaveFT (#2814 )	2025-10-08 17:01:52 +02:00
Ahmet Bilican	b0954e0daa	FEAT Add WaveFT method (#2560 ) Implements the paper "Exploring Sparsity for Parameter Efficient Fine Tuning Using Wavelets" (https://arxiv.org/abs/2505.12532). WaveFT enables fine-grained control over the number of trainable parameters by directly learning a sparse set of coefficients in the wavelet domain of residual matrices. Experiments show that it works well in the text-to-image generation space.	2025-10-07 10:58:49 +02:00
Zhizhou Sha	f00d94a170	FIX Typo in PiSSA finetune README (#2812 )	2025-10-06 11:54:53 +02:00
Benjamin Bossan	24aebeec21	CHORE: Ensure PEFT works with huggingface_hub 1.0.0 (#2808 ) The reset_sessions function is removed but it's also no longer necessary to call it for the purpose we used it. Moreover, the deprecated use_auth_token argument is fully removed now, so everywhere we used to pass it, it is now removed, unless a user passes it explicitly. Also, remove the deprecated local_dir_use_symlinks argument.	2025-10-02 13:21:02 +02:00
Yuanyuan Chen	815956b9b8	CHORE Drop Python 3.9, add 3.13 (#2790 )	2025-10-01 12:02:39 +02:00
Yao Matrix	ffa971a68c	FIX LoftQ 8-bit bnb error, support XPU (#2797 )	2025-10-01 12:02:14 +02:00
githubnemo	4469af57a0	DOC Some more TIP syntax migration (#2806 ) Add `<Tip>`s converted to new syntax to docstrings. --------- Co-authored-by: nemo <git@ningu.net>	2025-09-30 12:31:12 +02:00
Benjamin Bossan	e596112b7b	Fix module target edge cases (#2773 ) Resolves #2772 Fixes several edge cases with unusual layer names or target modules. 1. As #2772 stated, if "weight" is part of a layer name, it would be treated incorrectly when creating the PEFT state_dict. 2. Similarly, when the adapter name itself is part of a layer name. Some of these errors would pass silently, which is especially bad (e.g. a weight not being loaded but no error raised). I also added some tests that were not failing before, but to cover some yet uncovered cases or to lay out some basic functionality. While working on this, I also noticed that it was possible to target a BaseTunerLayer with modules_to_save and trainable_token_indices (e.g. the lora_A and lora_B nn.Linear would be replaced with ModulesToSaveWrapper). I don't think this is ever desired, so we now raise an error if this is detected.	2025-09-30 11:09:44 +02:00
Benjamin Bossan	046e32bf16	ENH: Store PEFT version in PEFT config file (#2782 ) This PR adds the PEFT version to the adapter_config.json. This can be useful in the future -- for instance when we change the state dict format of a PEFT method, we can convert it in a backwards compatible way based on the PEFT version being used. It can also be useful for debugging by providing an easy way to see the PEFT version that was used to train a PEFT adapter. Notes: In #2038, we made a change to PEFT configs to make it so that even if new arguments are added to a config, it can still be loaded with older PEFT versions (forward compatibility). Before that change, adding the PEFT version would have been quite disruptive, as it would make all PEFT configs incompatible with older PEFT versions. Said PR was included in the 0.14.0 release from Dec 2024, so we can expect the vast majority of PEFT users to use this version or a more recent one. If the PEFT version is a dev version, the version tag is ambiguous. Therefore, I added some code to try to determine the commit hash. This works if users installed PEFT with git+...@<HASH>. Unit testing that the function to determine the hash works with these types of installs is not trivial. Therefore, I just patched the function to return a fixed hash. I did, however, test it locally and it works: python -m pip install git+https://github.com/huggingface/diffusers.git@5e181eddfe7e44c1444a2511b0d8e21d177850a0 python -c "from peft.config import _get_commit_hash; print(_get_commit_hash('diffusers'))" Also note that I tried to make the retrieval of the hash super robust by adding a broad try ... except. If there is an error there, e.g. due to a busted install path, we never want this to fail, but rather just accept that the hash cannot be determined (we add @UNKNOWN in this case). If users installed a dev version of PEFT in different way, e.g. using git clone && pip install ., the commit hash will not be detected. I think this is fine, I really don't want to start shelling out with git just for this purpose.	2025-09-30 11:09:18 +02:00
Benjamin Bossan	190f9873b1	CHORE DOC Migrate tips syntax (#2801 ) Discussed internally	2025-09-29 10:33:57 +02:00
Benjamin Bossan	6030f9160e	ENH Model and layer status for auxiliary modules (#2762 ) Right now, get_model_status() and get_layer_status() only report on BaseTunerLayers, but it would be helpful if they could also report auxiliary modules. This PR now includes those. To facilitate this, a few attributes and methods were added to AuxiliaryTrainingWrapper and subclasses to make them more similar to BaseTunerLayer (e.g. the adapter_layer_names attribute). These attributes and methods were assumed to be present in the code that determines the model and layer status.	2025-09-25 18:00:11 +02:00
Benjamin Bossan	ae671baec9	FIX PEFT layers expose in_features, out_features (#2784 ) Resolves #2783. Most PEFT layers (BaseTunerLayers) expose the in_features and out_features attributes. Therefore, other packages like diffusers may expect this attribute to exist. However, there were a few PEFT methods where these attributes were missing: - LoHa - LoKr - LN Tuning - Trainable Tokens The layers of these methods now also expose the attributes. Implementation To avoid code duplication, I factored out the whole code block in LoRA layers that extracts these attributes, since LoRA has the most exhaustive list of checks. The new utility function has the exact same functionality and can now be used by other PEFT methods. I updated the four PEFT methods mentioned above to use this new function, but I did not update PEFT methods that already handled it, as there wasn't really a need (they check one or two layer types at most, so there is little duplication).	2025-09-25 17:59:45 +02:00
Benjamin Bossan	7b2a5b1f02	DOC: Explain how to use multiple adapters at the same time (#2763 ) Explain how to use multiple adapters (e.g. 2 LoRA adapters) at the same time, as the API is not quite intuitive and there are some footguns around trainable parameters. This question has come up multiple times in the past (for recent examples, check #2749 and #2756). Thus it's a good idea to properly document this. --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-09-25 17:58:57 +02:00
Benjamin Bossan	530d7bbf1e	Method comparison: Add MiSS result (#2740 ) - default - mini - bat Results are pretty close to the corresponding experiments with Bone, which is what we expected.	2025-09-25 17:58:22 +02:00
Benjamin Bossan	9da3f77960	FIX Small fixes to warning, like missing spaces (#2788 ) - The warning message was missing spaces between sentences. - Added ' around strings for clarity - For one warning, which extended another warning, put it at the start instead of the end, because the other warning can be quite long, leading to users missing the addition For more context on this warning, see #2254	2025-09-25 17:58:07 +02:00
kaixuanliu	c15daaa5aa	ENH Support XPU in DoRA FT example (#2700 )	2025-09-25 17:57:41 +02:00
githubnemo	4f868bd7c9	Use technical user for CI runs (#2800 ) Makes it easier to track rate limiting issues.	2025-09-24 17:49:16 +02:00
kaixuanliu	50329a7138	ENH Support for XPU in LM eval notebook (#2705 )	2025-09-23 15:15:48 +02:00
Benjamin Bossan	f6b0a2dd43	ENH Small speedups to adapter injection (#2785 ) See https://github.com/huggingface/diffusers/issues/11816#issuecomment-3281290153 This PR implements two small improvements to the speed of adapter injection. On a benchmark based on the linked issue, the first change leads to a speedup of 21% and the second change of another 3%. It's not that much, but as the changes don't make the code more complicated, there is really no reason not to take them. The optimizations don't add any functional change but are simply based on not recomputing the same values multiple times. Therefore, unless I'm missing something, they should strictly improve runtime.	2025-09-23 13:27:49 +02:00
Benjamin Bossan	f1b83646a6	The great deduplication (#2771 ) Deduplicate a lot of redundant code from PEFT method's model.py: merge_and_unload unload delete_adapter set_adapter enable_adapter_layers disable_adapter_layers _replace_module _unload_and_optionally_merge _mark_only_adapters_as_trainable _check_new_adapter_config _check_target_module_exists _prepare_adapter_config __getattr__ get_peft_config_as_dict (fully deleted) Related changes: A new module, functional.py, is introduced, which contains functions (just reimported from elsewhere) that can be useful for libraries that want to integrate PEFT. I would suggest that we should treat them as public API and thus guarantee backwards compatibility. I also deduplicated almost identical TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPING constants by copying them from LoRA and only overriding a few values that differ. Moreover, some PEFT methods didn't have their own TRANSFORMERS_MODULES_TO_XXX_TARGET_MODULES_MAPPING but used the one from LoRA instead. They now each have their own constant, which is a copy from the one from LoRA.	2025-09-23 13:26:35 +02:00
Benjamin Bossan	b774fd901e	TST Add missing configs to test_config.py (#2781 ) The test_config.py tests were missing a few configs from recently added PEFT methods. Those are now included. After adding those, it was revealed that for C3A and trainable tokens, super().__post_init__() was not being called. This is now done.	2025-09-19 17:52:58 +02:00
Tanuj Rai	20a9829f76	FIX Account for rsLoRA scaling in set_scale (#2775 )	2025-09-16 11:30:29 +02:00
Benjamin Bossan	1806c1651a	CHORE Update and pin (commit hash) GitHub actions (#2779 ) Some GH actions didn't have a pinned commit hash while others did because of Zizmor. Now all actions have pinned commit hashes.	2025-09-11 11:12:23 +02:00
Benjamin Bossan	13fa0aea7e	FIX: Wrong coupling between requires_grad and the active adapter (#2765 ) Description At the moment, we strongly couple the active adapter with requires_grad=True. Concretely, when we call model.set_adapter(name), we automatically assume that this adapter should not only be made active, its requires_grad should also be set to True. For the purpose of training PEFT models, this is fair. However, when loading PEFT models for inference, this is not desired. Generally, for inference, we don't need requires_grad=True, but as is, it is enabled. Generally, this is not a severe bug, since in the inference code, we don't perform any updates, thus we don't inadvertently update a weight because it wrongly has requires_grad=True -- this is probably why it went unnoticed so far. However, it could lead to worse runtime performance and memory overhead when PyTorch records grads for those parameters (which it shouldn't if called with torch.inference_mode, but some users may forget to use this). Therefore, this bug is still worth fixing. Example Example With `modules_to_save` A very basic example where the current PEFT fails: import os from transformers import AutoModelForCausalLM from peft import LoraConfig, PeftModel, get_peft_model model_id = "facebook/opt-125m" path = "/tmp/peft/2759" if not os.path.exists(path + "/adapter_model.safetensors"): model = AutoModelForCausalLM.from_pretrained(model_id) config = LoraConfig(target_modules=["q_proj", "v_proj"], modules_to_save=["lm_head"], r=8) model = get_peft_model(model, config) model.save_pretrained(path) del model model = AutoModelForCausalLM.from_pretrained(model_id) model = PeftModel.from_pretrained(model, path) `modules_to_save` should not have grads enabled, but currently it does. ### With multiple adapters There is also an issue when loading more than one adapter: model = PeftModel.from_pretrained(...) assert not any(p.requires_grad for p in model.parameters()) # works So far, so good, the first adapter does not have `requires_grad`. model.load_adapter(...) assert not any(p.requires_grad for p in model.parameters()) # fails The load_adapter call inadvertently sets requires_grad=True for the weights of the _first_ adapter. The reason why this happens is because when the second adapter is loaded, we call set_adapter with the first adapter to ensure that it remains the activate adapter. However, due to the coupling of active adapter and requires_grad, this would result in setting requires_grad=True for the first adapter. The PR relaxes this coupling by allowing to call set_adapter with an additional argument, inference_mode. If set to True, the requires_grad will not be enabled, even if the adapter is activated. The example above would also fail for modules_to_save and trainable tokens, not only for the LoRA/LoHa/... weights. Still open bugs The proposed solution is unfortunately not perfect. Right now, we do pass inference_mode based on the PEFT config of the adapter being added, which helps with the original issue described above. However, even this is not absolutely correct, because inference_mode of the second adapter does not necessarily have the same value as inference_mode of the first adapter. To illustrate how this can go wrong, I added an xfailing test: test_loading_model_requires_grad_set_correctly_switch_inference_mode I believe that this use case is rarer than the ones described at the beginning, so IMO it is okay to have this bug because we fix more common bugs. However, LMK if you disagree. Related to this, I noticed that many tests in test_custom_models.TestRequiresGrad had code like this: config0 = FooConfig(...) peft_model = get_peft_model(MLP(), config0) config1 = FooConfig(..., inference_mode=True) # <== peft_model.add_adapter("adapter1", config1) This now fails because of the reason just given. I removed inference_mode=True here and the tests pass again. Note that the only reason why inference_mode=True was passed here is because AdaLoRA cannot load 2 adapters in training mode and thus requires this. Later PEFT methods without this restriction blindly copied the AdaLoRA test. For those PEFT methods, I removed inference_mode=True. However, this also means that the AdaLoRA tests now fail. I thus marked them as xfail. To properly fix this bug, I think we would have to refactor the code to isolate set_adapter (i.e. determining the active adapter) and setting requires_grad into separate code paths, as they're orthogonal. Moreover, these attributes are being set all over the place, which makes it hard to reason about where these attributes are being changed. This should be streamlined. Making these changes while not breaking any existing code is not trivial (or maybe impossible even). Therefore, I went the easier way for the time being with this PR. Maybe a bigger refactor could be envisioned for a version 1.0 release of PEFT. Related changes While working on this, I noticed that LNTuning was completely buggy when calling set_adapter. This is now fixed. Moreover, since I had to touch update_layer everywhere, I ensured that they all take kwargs for consistency.	2025-09-08 19:49:29 +02:00
Mohammadtaha Bagherifard	42db980676	Add Arrow + GenKnowSub to LoRA (#2644 ) This PR adds support for Arrow, a modular routing mechanism for LoRA experts introduced here, as well as the refinement method GenKnowSub, proposed in our ACL 2025 Main Conference paper. GenKnowSub enhances Arrow by subtracting a general-domain LoRA from task-specific ones prior to routing, leading to improved generalisation and modularity.	2025-09-08 14:21:37 +02:00
Shubham Patel	ed5c6eaa1a	Replace from_legacy_cache method with constructors (#2767 ) Replace Cache.from_legacy_cache method with init.	2025-09-08 13:49:25 +02:00
Benjamin Bossan	92e15573ac	CHORE Upgrade trufflehog GitHub action to 3.90.5 (#2770 ) Maybe solves the trufflehog false positive, maybe not.	2025-09-08 13:47:02 +02:00
Benjamin Bossan	5ef8e85d1f	FIX X-LoRA forward hook issue during generate (#2761 ) There was an issue that forward hooks would accumulate during generation, since one hook per forward step was being registered and generate would call forward multiple times. This is already undesirable, but to make it worse, only the last hook was removed, resulting in hooks accumulating.	2025-09-08 13:46:31 +02:00
githubnemo	c81363bd4e	Support dataclass model configs (#2778 ) LeRobot uses dataclasses to manage policy configs. If we want to support LeRobot policy fine-tuning it'd be easiest to support these configs in `get_model_config`. While it is possible to fix this on LeRobot's side (add a to_dict implementation to the config classes) I think it'd be cleaner to support it on our side since the cost is relatively low and dataclasses are getting more popular anyway. Thanks @xliu0105 for raising this issue and proposing a fix.	2025-09-08 13:35:47 +02:00
Benjamin Bossan	5d97453235	FIX Deprecated key_cache attribute on Cache pt 2 (#2753 ) In #2737, we fixed some code that relied on the deprecated attribute but some was being missed, as it only runs on the nightly CI with multiple GPUs. This PR fixes this. Note that the original transformers code that this solution was based on no longer exists, as transformers now initializes the cache lazily, so pre-allocating the keys and values to the correct device is not necessary. But since prefix tuning inserts "virtual" keys/values, we still have to ensure the correct device in PEFT. I have tested the failing tests locally and they pass.	2025-09-04 14:47:29 +02:00
Benjamin Bossan	2ea5377ee3	TST FIX Failing AutoAWQ test with torch 2.8 (#2752 ) There is a failing AWQ test since torch 2.6 which is marked as xfail for torch=2.7. However, now torch 2.8 is out and the test is still failing. Therefore, the xfail now checks for torch>=2.7. As AWQ is no longer being maintained, we should expect this situation to deteriorate over time and eventually we'll have to remove it. But for the time being, it still appears to mostly work, so I suggest we leave it as is.	2025-09-03 19:25:05 +02:00
githubnemo	de60e88b6b	Fix missing code start in docs (#2768 ) There was a minor typo which a suggestion of PR #2609 which broke code formatting for one code sample. This is a simple fix for that.	2025-09-03 18:37:52 +02:00
Greenewald	293aea5df6	Support for Activated LoRA (#2609 ) This PR migrates Activated LoRA (aLoRA) support from a standalone Github (see above) to PEFT itself. Note there is also an active PR for vLLM inference support for Activated LoRA: vllm-project/vllm#19710 . There are also collections of aLoRA models on huggingface (in the ibm-granite org), note that these preexisting models run off of the standalone github repo and will be updated to work with this new PEFT feature if merged. Description of changes: Activated LoRA is a modification of the LoRA architecture to "activate" the adapter weights only on tokens coming after a specified invocation_string. This fact makes it so that KV values for the string coming before the activation matches KV values for the base model. This allows KV cache for the input to be interchangeable between the base model and adapter model, and allows for major speedups in inference pipelines (e.g. agentic pipelines) that want to use both base models and adapter models. See the paper for detailed exploration of use cases and further elaboration. Other notes: The crux of the changes are really in layer.py. Everything else is simply managing the alora_offsets quantity which defines where the weights start to be activated. This is determined by scanning input strings for the invocation_string defined in the aLoraConfig. I believe that aLoRA really only makes sense for CausalLMs, hence I've only implemented this for that model type. Merging doesn't make sense for aLoRA adapters since the weights are not universally applied to all tokens. I used the LoRA code as a starting point, but did not implement various seemingly extra features in that code. As of now, invocation_string should probably start and end with special tokens, to avoid tokenizer issues at the boundary. Open to suggestions on how to make this more general if needed. --------- Co-authored-by: githubnemo <githubnemo@users.noreply.github.com>	2025-09-03 18:26:50 +02:00

1 2 3 4 5 ...

1484 Commits