transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-10-20 17:13:56 +08:00

Author	SHA1	Message	Date
HyunZ118	ac81541778	🌐 [i18n-KO] Translated gemma3n.md to Korean (#40873 ) * fix: manual edits * Apply suggestions from code review Apply suggestions from code review and make additional revisions Co-authored-by: HyunSang Jang <tasker.dev103@gmail.com> * Apply suggestions from code review Apply suggestions from code review — updated inline links for related text * Apply suggestions from code review Apply suggestions from code review - final * Update docs/source/ko/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: HyunSang Jang <tasker.dev103@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-10-17 09:57:05 -07:00
Steven Liu	e7592f2508	[docs] Manual tp-plan (#41674 ) * manual tp-plan * feedback	2025-10-17 09:38:26 -07:00
Cyril Vallez	75da795d8f	🚨 Remove torch.fx support (#41683 ) * remove all * fix comments * better checks * doc	2025-10-17 16:12:46 +02:00
Raushan Turganbay	10de06dace	🚨 [v5] Refactor RoPE for layer types (#39847 ) * update * batch update model code * typos * too many diffs, dump * dump again * another dump * fix copies * make `rope_scaling_dict` self attr * fix a few more tests * another update * fix a few more tests, hopefully last ones * fox copies * fix copies again * fix newly added models, I hate rebasing on main * update config files * modular files * fix rope utils test * docstring has to be indented more, why? * oops forgot to update some modualr files * copy from doesn't copy decorators? * fix overriden test as well * add a new test * fix failing tests again * update docstrings * fix phi3 * fix two models * fix copies * forgot to add * stupid bug from modular conversion * fix slow tests * update to call rotary emb once per model forward * 3K tests failing?! * update * update more models * fix copies * fix the rest of tests hopefully * fix after rebase * fix the rope tests * fix docs omni * change a bit * models with layer types * why it was deleted? * fix a few tests * fix last test! * delete extra empty lines * add a test case * more changes * fix models * typing hint for nested rope params * missed when resolving conflicts * delete layer types and fix typo * fix copies * fix copies * update docs text * docs * huuge update all models * fix copies * rename attr to align with new format * delete redundant rope tests * trigger ci * update the case * this is why i hate rebasing * maybe fixed? * oops * now fix? * fix last tests and copies * fix copies? * fix minimax and gemma3n * update typo * deprecation end version * final fix copies :fingers-crossed: * oh my, add the docs in toctree * oke, this is really the last fix * fix copies and hope that tests won't start failing again * use rope scaling if saved * fix slow tests * fix cwm and unrelated deepseek * fix last * update * hope it works now, it took so long * lets keep None for now, I will try to remove after checking tests * some more fixes, i find and replace does not always find all cases * last fix of tests * arthur's comment for extra foreward kwargs * delete unused code * fix slow qwen tests * delete layer types from models * faulty modular conversion * fix qwen omni * fix copies and style * address my comment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-10-17 14:57:27 +02:00
Yuanyuan Chen	0beda2aa3a	Fix MarkDown syntax (#41676 ) Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-10-17 12:44:27 +00:00
Cyril Vallez	0b3aef1da9	🚨 Remove torchscript support (#41688 ) * remove a lot * remove the rest * doc	2025-10-17 13:38:27 +02:00
Lucain	252d7cd952	Remove deprecated `use_auth_token` parameter (#41666 ) * Remove deprecated use_auth_token * code styl * fix test * Update examples/pytorch/speech-recognition/README.md	2025-10-17 09:57:46 +00:00
Raushan Turganbay	1eb45cd61d	Fix ckpt in docs (#41659 ) * fix ckpt in docs * fix config ckpt	2025-10-17 11:00:34 +02:00
Julien	354567d955	Adding superglue fast image processing (#41394 ) * Default implementation - no time improvement * Improved implementation - apparently 2 times faster with only simple function refactor * elementary torch first approach, still need further implementation of torch first method * torch-first approach finished * refactor processor * refactor test * partial doc update * EfficientLoFTRImageProcessorFast based implementation * EfficientLoFTRImageProcessorFast based implementation * Logic checked - Test Passed - Validated execution speed * use modular for efficientloftr * fix import --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>	2025-10-16 19:34:09 +00:00
SSUM	4dd4133d32	🌐 [i18n-KO] Translated `ko-LFM2.md` to Korean (#41502 ) * feat: nmt draft * fix: manual edits * Update docs/source/ko/model_doc/lfm2.md Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> * Update docs/source/ko/model_doc/lfm2.md Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * Update docs/source/ko/model_doc/lfm2.md Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> * Update docs/source/ko/model_doc/lfm2.md Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com> --------- Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>	2025-10-16 11:29:04 -07:00
HyunSang Jang	eefbf4ac8b	🌐 [i18n-KO] Translated llama4.md to Korean (#40396 ) * docs: ko: llama4.md * feat: nmt draft * fix: manual edits * Update docs/source/ko/model_doc/llama4.md Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Update docs/source/ko/model_doc/llama4.md Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Update docs/source/ko/model_doc/llama4.md Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> * Update docs/source/ko/model_doc/llama4.md Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> --------- Co-authored-by: TaskerJang <bymyself103@naver.com> Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>	2025-10-16 11:28:27 -07:00
Judy	50ca781d78	🌐 [i18n-KO] Translated `code_llama.md` to Korean (#40558 ) * docs: ko: code_llama.md * feat: nmt draft * fix: manual edits * Apply suggestions from code review Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> Co-authored-by: HyunZ118 <156191095+HyunZ118@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> --------- Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com> Co-authored-by: HyunZ118 <156191095+HyunZ118@users.noreply.github.com>	2025-10-16 11:27:46 -07:00
SSUM	8739fc05c4	[i18n-KO] Translated `big_bird.md` to Korean (#40445 ) * docs: ko: BigBird.md * feat: nmt draft * fix: manual edits	2025-10-16 11:23:56 -07:00
HyunZ118	77b5ad65ee	🌐 [i18n-KO] Translated sam_hq.md to Korean (#41340 ) * fix: manual edits * Apply suggestions from code review Apply suggestions from code review Co-authored-by: HyunSang Jang <tasker.dev103@gmail.com> * Apply suggestions from code review Apply suggestions from code review Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com> --------- Co-authored-by: HyunSang Jang <tasker.dev103@gmail.com> Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>	2025-10-16 11:10:16 -07:00
Judy	a9731a725e	🌐 [i18n-KO] Translated `chat_extras.md` to Korean (#39863 ) * docs: ko: chat_extras.md * feat: nmt draft * fix: manual edits * Apply suggestions from code review * Apply suggestions from code review * Update docs/source/ko/chat_extras.md	2025-10-16 10:41:03 -07:00
Marc Sun	9839d57a02	Fix serving continuous batching (#41624 ) * udpate-serving-cb * style * style * check none * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2025-10-16 17:24:21 +02:00
Yuanyuan Chen	2aff20aff6	Fix typos in documentation (#41641 ) Fix typos Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-10-16 12:58:46 +00:00
Yuanyuan Chen	981370c038	Format MarkDown documentation and tiny fixes (#41638 ) * Fix MarkDown syntax Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * More fixes Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-10-16 12:58:06 +00:00
Yoni Gozlan	a59124e27e	Add missing dates to docs (#41576 ) add dates	2025-10-16 09:32:28 +00:00
Merve Noyan	e20df45bf6	Add Backbone API fine-tuning tutorial (#41590 ) --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-10-15 18:42:32 +02:00
Jack	19df66dcba	Update executorch.md (#41582 ) * Update executorch.md * Update executorch.md * Update executorch.md * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-10-15 09:01:46 -07:00
Steven Liu	9f71e3a604	[docs] Duplicate entry (#41591 ) fix	2025-10-15 17:02:36 +02:00
Yuanyuan Chen	bb0c3af995	More markdown file fixes (#41599 ) * Format markdown files Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Format markdown files Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Format markdown files Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-10-15 12:29:27 +00:00
Anton Vlasjuk	fcd1ccdb78	[`Docs`] Fix changed references (#41614 ) * fix * fix * other ln	2025-10-15 13:59:13 +02:00
Yuanyuan Chen	4c8d293599	Fix typsetting and content of llm_tutorial_optimization.md (#41172 ) * Fix typsetting of llm_tutorial_optimization Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix errors Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-10-14 08:40:26 -07:00
Merve Noyan	3648fde486	Add DINOv3Backbone for ConvNext variant (#40651 ) --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-10-14 14:57:04 +02:00
Matt	b84c0b31c6	Remove references to AutoModelForVision2Seq (#41513 ) * Since Vision2Seq is deprecated, remove it from pipelines and docstrings * Catch some more references	2025-10-13 17:00:07 +01:00
Arthur	1ee3b288a6	[`from_pretrained`] Small refactor `from_pretrained`: move around unrelated stuff (#41445 ) * drafts * up * simplify modeling utils * more simplifications * type kwargs * up * move more accelerate related stuff * safeguarding? * nits * remove func when func is NOPE * more * nits * styling * yups * up * ups * revert * protect trainer utils iport * fix doc * Update src/transformers/integrations/peft.py Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> * review * update * ? * fixx * update * super small update * ups * style * this is stupid * 🤦 well this was the issue * small nit * fix * nit * damn the missing return * one last stupid fix --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2025-10-13 16:33:32 +02:00
Kehan Li	cad74496ca	[model] Add VideoLLaMA3 implementation (#40499 ) * Add VideoLLaMA3 implementation * Run style fix * Switch to modular * Fix config and smart_resize * Fix * Fix * Fix style * Fix * Ruff fix * Rename * Rename * Fix * Clean * Fix consistency * Add doc * Fix * Fix * Fix doc * Update generated code * remove test_initialization * fix tests * simplify * tests * Add VideoLlama3IntegrationTest * replace asserts * fix tests --------- Co-authored-by: steven-ccq <55176896+steven-ccq@users.noreply.github.com> Co-authored-by: steven-ccq <1456320989@qq.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-10-13 15:54:34 +02:00
Akilesh	3813a8e3a1	Add VideoMAE video processor (#41534 ) * Add video processor for VideoMAE * Document VideoMAE video processor * Add regression tests for VideoMAE video processor * refactor: Use direct batch key access for pixel_values_videos * test: add parity test for VideoMAEVideoProcessor vs VideoMAEImageProcessor * docs(videomae): update model docstring example to demonstrate VideoMAEVideoProcessor (TorchCodec-based decoding and sampling)	2025-10-13 15:42:27 +02:00
Yoni Gozlan	eb28242251	Add MLlama fast image processor (#41391 ) * Merge conflict * add fast processor * add fast processor * make style * add new convert rgb * use nested group by shape in mllama fast, add support for multiple inputs in group by shape * refactor after review --------- Co-authored-by: Vincent <phamvinh257@gmail.com>	2025-10-13 09:16:05 +00:00
Yuanyuan Chen	7164924a7e	Fix Latex typesetting in documentation (#41177 ) Fix Latex typsetting in documentation Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-10-10 08:54:27 -07:00
Benjamin Keene	dfd4121cd4	add Trainer import to .md in appropriate cell block for training.ipynb transformers_doc (#41484 ) add Trainer import to .md in appropriate cell block for docs	2025-10-10 12:04:07 +00:00
eustlb	c5094a4f97	[voxtral] language detection + skipping lang:xx (#41225 ) * proc + doc update * improve doc * add lang:xx in decode * update voxtral test * nit * nit * update test value * use regex	2025-10-10 09:18:30 +00:00
Cyril Vallez	e8194fe84f	Fix some tests (#41503 ) * fix * fix * doc	2025-10-10 11:05:09 +02:00
Pablo Montalvo	d0271be18f	Update philosophy (#41438 ) * update philosophy * Update docs/source/en/philosophy.md Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/philosophy.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * emphasis --------- Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-10-10 06:52:18 +00:00
Marc Sun	081391b20e	deprecate `jit_mode_eval` (#41376 )	2025-10-09 18:50:45 +02:00
Marc Sun	776eea8612	deprecate `overwrite_output_dir` (#41323 ) * dep * style * rm * wut * style	2025-10-09 18:36:19 +02:00
Marc Sun	3839d51013	`report_to` default changed to "none" + cleaning deprecated env var (#41375 ) * reporting * fix * fix	2025-10-09 18:28:48 +02:00
Yuxuan Zhang	78f79ba5af	Update GLM-4.6 doc (#41471 ) Update glm4_moe.md	2025-10-09 09:18:05 -07:00
Marc Sun	1a3a5f5289	Remove SigOpt (#41479 ) * remove sigopt * style	2025-10-09 18:05:55 +02:00
Jacob Kahn	0eae41ad36	Add Code World Model (CWM) (#41199 ) * [wip][cwm] Code World Model stubs and setup in HF Transformers * [wip] Get other things working * [wip] Working * Tokenizer pad * fix: cwm window attn * temp remove test * temp remove test * Fixes * Temporarily add auto config remapping option until VLLM 0.11 is out * Fix model type and add layer validation * Lint, remove CwmForSequenceClassification * Lint, tests * Remove CwmForSequenceClassification * Lint * Remove intermediary layer expors/doc errorss, fix tests * Lint * run python utils/sort_auto_mappings.py --check_only * Remove Cwm processor mapping, get check_repo passing * Remove CwmTextConfig from test * Add docstring for CwmConfig * remove global_window and window_pattern params from config * Fix docstrings * Revert change to auto docstring util * lint * Fixes minus test improvements * Alter tests to simply check logits * lint * Have slow tests use repo, make CwmPretrainedModel passthrough * Remove decoder layer implementation, use Llama3Decoder + CwmAttetion * Use linear w/o bias for CwmAttention, add token-level integration test * Don't ignore config attention bias * Remove attention bias parameter entirely from config --------- Co-authored-by: galco <galco@meta.com>	2025-10-09 17:57:45 +02:00
Marc Sun	b44d91570f	[v5] remove load_in_4bit and load_in_8bit (#41287 ) * [v5] remove load_in_4bit and load_in_8bit * fix * reveert * fix --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-10-09 16:34:04 +02:00
Konstantinos Pitas	bf38b2d11d	Change RT-Detr docs to reflect fixed 640x640 input size (#41364 ) * Update rt_detr docs to mention 640x640 input size The authors of RT-Detr mention that the model was trained on 640x640 images and was meant to be used for inference on 640x640 images. Also, the current implementation has certain quirks that make training/inferring on images of different sizes problematic. For example, the pixel masks used for batches of varying image sizes are discarded. I've added a few lines in the docs to notify the user about these issues. * Batching not possible with variable image sizes * Remove reference to batching --------- Co-authored-by: Konstantinos Pitas <kostasp210@gmail.com>	2025-10-09 14:29:16 +00:00
Yuanyuan Chen	72a3fc275c	Remove infer_device (#41088 ) * Remove infer_device Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix docs using accelerator Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix conflict Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-10-09 14:05:39 +00:00
Raushan Turganbay	d1c6310d6a	🚨 [v5] Rendundant code in nested configs (#41314 ) * batch update models * delete even more * fix modular super init location * fix * fix copies * fix again, these have force-set values in configs * fix copies	2025-10-09 13:47:44 +02:00
Cyril Vallez	7aa888b7fa	Fix doc (#41457 ) * dummy * remove	2025-10-08 20:13:21 +02:00
Prathamesh Chavan	bef73bf8d7	Update hqq.md (#41452 ) mistake in loading model	2025-10-08 07:44:56 -07:00
ErfanBaghaei	82ffeb28ad	Add Top-H decoding (entropy-bounded truncation) as a LogitsWarper for text generation (#40837 ) * init * added TopH * Update TopH logits_process.py * Update logits_process.py * Update test_logits_process.py * Update test_logits_process.py * added test No. 4 * Resolving __init__.py issues * Resolving configuration_utils.py Issues * Resolving logits_process.py Issues * Resolving utils.py Issues * Resolving test_logits_process.py Issues * Resolving __init__.py issues * Resolving logits_process.py Issues * Resolving __init__.py issues * Updated Docs * Updated Docstring * style: autoformat with make fixup * Fixing Docstring * Update logits_process.py removed defaults * Variable H name -> cumulative_entropy * Using torch.distributions.Categorical * Improve torch_dtype checks (#40808) * Improve torch_dtype checks Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Apply suggestions from code review --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Add VideoProcessors to auto-backend requirements (#40843) * add it * fix existing ones * add perception to auto_mapping... * Adds Causal Conv 1D kernel for mamba models (#40765) * add kernel * make style * keep causal-conv1d * small fix * small fix * fix modular converter * modular fix + lazy loading * revert changes modular * nit * hub kernels update * update * small nit * Update no split modules in T5Gemma model (#40810) * Update no split modules in T5Gemma model * Update no_split_modules also for T5Gemma modular * Remove model_split_percents from test cases --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * Replace image classification loss functions to `self.loss_function` (#40764) * Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library. (#40842) * align torch implementation of gdn with fla. * fix fla import. * fix * remove unused attr * fixes * strictly align l2norm in Qwen3-Next with FLA implementation. --------- Co-authored-by: bozheng-hit <dsoul0621@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Fixes for continuous batching (#40828) * Fix for CB attn mask and refactor * Tests for CB (not all passing) * Passing tests and a logger fix * Fixed the KV metrics that were broken when we moved to hybrid alloc * Fix circular import and style * Added tests for FA * Unfolded test to have device expectations * Fixes for H100 * more fixes for h100 * H100 are good * Style * Adding some comments from #40831 * Rename test * Avoid 1 letter variables * Dictonnary is only removed during kwargs * Test for supported sample * Fix a unvoluntary slice * Fixes for non-sliced inputs and small example improvments * Slice inputs is more understandabe * Style * [tests] re-enable aria fast tests (#40846) * rise from the dead * test * [SAM2] Fix inconsistent results with original implementation with input boxes (#40800) * Fix inconsistencies with box input inference with original repo * remove print * always pad * fix modular * [Sam2Video] Fix video inference with batched boxes and add test (#40797) fix video inference with batched boxes and add test * add: differential privacy research model (#40851) * VaultGemma * Removing Sequence and Token classification models. Removing integration tests for now * Remove pass-only modular code. style fixes * Update vaultgemma.md * Update docs/source/en/model_doc/vaultgemma.md Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * Update docs/source/en/model_doc/vaultgemma.md Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * Add links to model doc * Correct model doc usage examples * Updating model doc to describe differences from Gemma 2 * Update model_doc links * Adding integration tests * style fixes * repo consistency * attribute exception --------- Co-authored-by: Amer <amersinha@gmail.com> Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852) * ouput_attentions in typed kwargs * correct typing in GenericForTokenClassification * improve * [tests] move generative tests away from `test_modeling_common.py` (#40854) move tests * [generate] Always use decoder config to init cache (#40772) * mega derp * fix * always use the decoder * Use checkpoint in auto_class_docstring (#40844) Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix TrainingArguments.parallelism_config NameError with accelerate<1.10.1 (#40818) Fix ParallelismConfig type for accelerate < 1.10.1 Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Redirect MI355 CI results to dummy dataset (#40862) * [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> * [docstrings / type hints] Update outdated annotations for `past_key_values` (#40803) * some fixes * nits * indentation * indentation * a bunch of type hints * bulk changes * fix florence kwargs (#40826) * fix: XIELU act parameters not being casted to correct dtype (#40812) * Update model tags and integration references in bug report (#40881) * [Qwen3 Next] Use numerically stable `rsqrt` (#40848) use numerically stable inverse * Adding Support for Qwen3-VL Series (#40795) * add qwen3vl series * make fixup * fix import * re-protect import * fix it finally (need to merge main into the branch) * skip processor test (need the checkpoint) * oups typo * simplify modular * remove unecesary attr * fix layer * remove unused rope_deltas args * reuse image def * remove unnesesary imports --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> * [`VaultGemma`] Update expectations in integration tests (#40855) * fix tests * style * Fix modular consistency (#40883) * reapply modular * add missing one * 🔴 Move variable output controls to `_prepare_generation_config ` (#40715) * move checks to validate steps where possible * fix csm and other models that override _sample * ops dia you again * opsie * joao review * Move variable output controls to `prepare_inputs_for_generation` * fix a bunch of models * back to basics * final touches * Clarify passing is_causal in sdpa_attention_paged_forward (#40838) * Correctly pass is_causal in sdpa_attention_paged_forward Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Improve typing Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Add comment Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Improve comments Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Revert typing Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Use torch.expm1 and torch.log1p for better numerical results (#40860) Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Add Fast PromptDepthAnything Processor (#40602) * Test & import setup * First version passing tests * Ruff * Dummy post processing * Add numerical test * Adjust * Doc * Ruff * remove unused arg * Refine interpolation method and push test script * update bench * Comments * Update src/transformers/models/auto/image_processing_auto.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Remove benchmrk script * Update docstrings * Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * doc * further process kwargs * remove it * remove * Remove to dict * remove crop middle * Remove param specific handling * Update testing logic * remove ensure multiple of as kwargs * fix formatting * Remove none default and get image size * Move stuff to _preprocess_image_like_inputs and refacto * Clean * ruff * End of file & comments * ruff again * Padding fixed * Remove comments to pass tests * Remove prompt depth from kwargs * Adjust output_size logic * Docstring for preprocess * auto_docstring for preprocess * pass as an arg * update test batched * stack images * remove prompt scale to meter * return tensors back in preprocess * remove copying of images * Update behavior to match old processoer * Fix batch size of tests * fix test and fast * Fix slow processor * Put tests back to pytorch * remove check and modify batched tests * test do_pad + slow processor fix --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> * Fix deta loading & dataclass (#40878) * fix * fix 2 * Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882) Remove dict branch of attention_mask Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414) * fix: manual edits * Apply suggestions from code review * Update docs/source/ko/model_doc/smolvlm.md * Update docs/source/ko/model_doc/smolvlm.md * Update docs/source/ko/model_doc/smolvlm.md * Update docs/source/ko/model_doc/smolvlm.md * Update docs/source/ko/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * 🌐 [i18n-KO] Translated `imageprocessor.md` to Korean (#39557) * feat: manual translation * docs: fix ko/_toctree.yml * Apply suggestions from code review Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> * Update docs/source/ko/image_processors.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * [generate] remove docs of a feature that no longer exists (#40895) * Make debugging failing tests (check and update expect output values) easier 🔥 (#40727) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fixing the call to kernelize (#40628) * fix * style * overload train and eval * add getter and setter * Fix getter regression (#40824) * test things * style * move tests to a sane place * Fix flaky `Gemma3nAudioFeatureExtractionTest::test_dither` (#40902) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * [cache] Merge static sliding and static chunked layer (#40893) * merge * get rid of tensors in get_mask_sizes!! * remove branch * add comment explanation * re-add the class with deprecation cycle * Harmonize CacheLayer names (#40892) * unify naming * style * doc as well * post rebase fix * style * style * revert * [cache] Only use scalars in `get_mask_sizes` (#40907) * remove tensor ops * style * style * Set seed for `Glm4vIntegrationTest` (#40905) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Add Olmo3 model (#40778) * transformers add-new-model-like for Olmo3 * Implement modular Olmo3 * Update Olmo3 tests * Copy Olmo2 weight converter to Olmo3 * Implement Olmo3 weight converter * Fix code quality errors * Remove unused import * Address rope-related PR comments * Update Olmo3 model doc with minimal details * Fix Olmo3 rope test failure * Fix 7B integration test * remove dummy EncodingFast (#40864) Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Improve module name handling for local custom code (#40809) * Improve module name handling for local custom code * Use `%lazy` in logging messages * Revert "Use `%lazy` in logging messages" This reverts commit 5848755d5805e67177c5218f351c0ac852df9340. * Add notes for sanitization rule in docstring * Remove too many underscores * Update src/transformers/dynamic_module_utils.py * Update src/transformers/dynamic_module_utils.py --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Remove `runner_map` (#40880) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * disable `test_fast_is_faster_than_slow` (#40909) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * [gemma3] `Gemma3ForConditionalGeneration` compatible with assisted generation (#40791) * gemma3vision compatible with assisted generation * docstring * BC * docstring * failing checks * make fixup * apply changes to modular * misc fixes * is_initialized * fix poor rebase * [generate] misc fixes (#40906) misc fixes * 🔴Make `center_crop` fast equivalent to slow (#40856) make center_crop fast equivalent to slow * Fix dtype in Paligemma (#40912) * fix dtypes * fix copies * delete unused attr * [Docs] Adding documentation of MXFP4 Quantization (#40885) * adding mxfp4 quantization docs * review suggestions * Apply suggestions from code review Co-authored-by: vb <vaibhavs10@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: vb <vaibhavs10@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Processor load with multi-processing (#40786) push * [Llama4] Remove `image_sizes` arg and deprecate `vision_feature_layer` (#40832) * Remove unused arg * deprecate * revrt one change * get set go * version correction * fix * make style * comment * Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218) * Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test) * chore: fix code formatting and linting issues * refactor: move UMT5 GGUF test to quantization directory and clean up comments * chore: trigger CI pipeline * refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency. * Add regression check to UMT5 encoder GGUF test Verify encoder output against reference tensor values with appropriate tolerances for stability. * Update tests/quantization/ggml/test_ggml.py Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * Update tests/quantization/ggml/test_ggml.py remove comments Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * [torchao safetensors] renaming get_state_dict function (#40774) renaming get_state_dict function Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * Adding activation kernels (#40890) * first commit * add mode * revert modeling * add compile * rm print * Minor fix for #40727 (#40929) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Add support for Florence-2 training (#40914) * Support training florence2 * update doc and testing model to florence-community * fix florence-2 test, use head dim 16 instead of 8 for fa2 * skip test_sdpa_can_dispatch_on_flash * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Add LongCat-Flash (#40730) * working draft for LongCat * BC changes to deepseek_v3 for modular * format * various modularities * better tp plan * better init * minor changes * make modular better * clean up patterns * Revert a couple of modular commits, because we won't convert in the end * make things explicit. * draft test * toctree, tests and imports * drop * woops * make better things * update test * update * fixes * style and CI * convert stuff * up * ah, yes, that * enable gen tests * fix cache shape in test (sum of 2 things) * fix tests * comments * re-Identitise * minimize changes * better defaults * modular betterment * fix configuration, add documentation * fix init * add integration tests * add info * simplify * update slow tests * fix * style * some additional long tests * cpu-only long test * fix last tests? * urg * cleaner tests why not * fix * improve slow tests, no skip * style * don't upcast * one skip * finally fix parallelism * [DOC] Add missing dates in model cards (#40922) add missing dates * [models] remove unused `import torch.utils.checkpoint` (#40934) * Intel CPU dockerfile (#40806) * upload intel cpu dockerfile Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update cpu dockerfile Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update label name Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941) * Fix trainer tests (#40823) * fix liger * fix * more * fix * fix hp * fix --------- Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com> * Fix `Glm4vMoeIntegrationTest` (#40930) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Raise error instead of warning when using meta device in from_pretrained (#40942) * raise instead of warning * add timm * remove * Consistent naming for images kwargs (#40834) * use consistent naming for padding * no validation on pad size * add warnings * fix * fox copies * another fix * fix some tests * fix more tests * fix lasts tests * fix copies * better docstring * delete print * Remove nested import logic for torchvision (#40940) * remove nested import logic for torchvision * remove unnecessary protected imports * remove unnecessarry protected import in modular (and modeling) * fix wrongly remove protected imports * Fix `Glm4vModelTest::test_eager_matches_fa2_generate` (#40947) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Update expected values for some `test_speculative_generation` (#40949) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Standardize audio embedding function name for audio multimodal models (#40919) * Standardize audio embedding function name for audio multimodal models * PR review * Add FlexOlmo model (#40921) * transformers add-new-model-like * Add FlexOlmo implementation * Update FlexOlmo docs * Set default tokenization for flex olmo * Update FlexOlmo tests * Update attention comment * Remove unneeded use of `sliding_window` * Don't list dropout in eager_paged_attention_forward (#40924) Remove dropout argument Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Update expected values for one more `test_speculative_generation` after #40949 (#40967) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * FIX(trainer): ensure final checkpoint is saved when resuming training (#40347) * fix(trainer): ensure final checkpoint is saved when resuming training * add test * make style && slight fix of test * make style again * move test code to test_trainer * remove outdated test file * Apply style fixes --------- Co-authored-by: rangehow <rangehow@foxmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Add new model LFM2-VL (#40624) * Add LFM2-VL support * add tests * linting, formatting, misc review changes * add siglip2 to auto config and instantiate it in lfm2-vl configuration * decouple image processor from processor * remove torch import from configuration * replace \| with Optional * remove layer truncation from modeling file * fix copies * update everything * fix test case to use tiny model * update the test cases * fix finally the image processor and add slow tests * fixup * typo in docs * fix tests * the doc name uses underscore * address comments from Yoni * delete tests and unsuffling * relative import * do we really handle imports better now? * fix test * slow tests * found a bug in ordering + slow tests * fix copies * dont run compile test --------- Co-authored-by: Anna <anna@liquid.ai> Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com> * Fix outdated version checks of accelerator (#40969) * Fix outdated version checks of accelerator Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix outdated version checks of accelerator Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966) use skip_predictor in vjepa2 `get_vision_features` * [Trainer] Fix DP loss (#40799) * fix * style * Fix fp16 * style --------- Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com> * [timm_wrapper] better handling of "Unknown model" exception in timm (#40951) * fix(timm): Add exception handling for unknown Gemma3n model * nit: Let’s cater to this specific issue * nit: Simplify error handling * Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956) * fix merge conflicts * change token typing --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal> * [tests] Really use small models in all fast tests (#40945) * start * xcodec * chameleon * start * layoutlm2 * layoutlm * remove skip * oups * timm_wrapper * add default * doc * consistency * Add captured actual outputs to CI artifacts (#40965) * fix * fix * Remove `# TODO: ???` as it make me `???` * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Revert change in `compile_friendly_resize` (#40645) fix * Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Using torch.distributions.Categorical * Remove `set_model_tester_for_less_flaky_tests` (#40982) remove * Benchmarking v2 GH workflows (#40716) * WIP benchmark v2 workflow * Container was missing * Change to sandbox branch name * Wrong place for image name * Variable declarations * Remove references to file logging * Remove unnecessary step * Fix deps install * Syntax * Add workdir * Add upload feature * typo * No need for hf_transfer * Pass in runner * Runner config * Runner config * Runner config * Runner config * Runner config * mi325 caller * Name workflow runs properly * Copy-paste error * Add final repo IDs and schedule * Review comments * Remove wf params * Remove parametrization from worfkflow files * Fix callers * Change push trigger to pull_request + label * Add back schedule event * Push to the same dataset * Simplify parameter description * 🔴[`Attention`] Bert-based Models Attention Refactor (#38301) * clean start to bert refactor * some test fixes * style * fix last tests * be strict on positional embeddings, fixup according tests * cache support * more cache fixes, new causal API * simplify masks, fix tests for gen * flex attn, static cache support, round of fixes * ? * this time * style * fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before) * roberta * fixup sdpa remains * attention split, simplify args and kwargs, better typing * fix encoder decoder * fix test * modular roberta * albert * data2vectext, making it modular tomorrow * modular data2vec text * tmp disable * xmod + cache position fixes * whoops * electra + markuplm, small fixes * remove wrong copy * xlm_roberta + some embedding fixes * roberta prelayernorm * RemBert: remove copy, maybe doing it later * ernie * fix roberta offloading * camembert * copy fixes * bert generation + fixes on eager * xlm roberta xl * bridgetower (text) + seamlessv2 copy fixes * rocbert + small fixes * whoops * small round of fixups * NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps) * the end of the tunnel? * fixup nllbmoe + style * we dont need this anymore * megatron bert is barely used, low prio skip for now * Modernize bert (template for others) NOTE: trying to push this through, might be overdue if not in time possible * check inputs for all others (if checkmarked) * fix bridgetower * style * fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else) * proper fix for bert to force intermediate dict outputs * propagate to others * style * xlm roberta xl investigation, its the layernorm... * mobile bert * revert this, might cause issues with composed models * review * style * Remove [[autodoc]] refs to TF/Flax objects (#40996) * remove refs * more * ENH: Enable readline support for transformers chat (#40911) ENH Enable readline support for chat This small change enables GNU readline support for the transformers chat command. This includes, among others: - advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f ctrl + k alt + d etc. - navigate and search history: arrow up/down ctrl + p ctrl + n ctrl + r - undo: ctrl + _ - clear screen: ctrl + l Implementation Although it may look strange, just importing readline is enough to enable it in Python, see: https://docs.python.org/3/library/functions.html#input As readline is not available on some platforms (https://docs.python.org/3/library/readline.html), the import is guarded. Readline should work on Linux, MacOS, and with WSL, I'm not sure about Windows though. Ideally, someone can give it a try. It's possible that Windows users would have to install pyreadline (https://pypi.org/project/pyreadline3/). * [testing] test `num_hidden_layers` being small in model tester (#40992) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * blt wip (#38579) * blt wip * cpu version * cpu friendly with full entropy model (real time patching) * adding config file instead of args file * enable MPS * refactoring unused code * single config class in config file * inherit from PreTrainedModel * refactor LMTransformer --> BLTPatcher * add conversion script * load from new checkpoing with form_pretrained * fixed demo from_pretrained * clean up * clean a few comments * cleanup folder * clean up dir * cleaned up modeling further * rename classes * adding transformers Attention class and RotaryEmbedding class * exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc * seperate out patcher config, update modeling and conversion script * rename vars to be more transformers-like * rm unused functions * adding cross attention from transformers * pass arg * rename weights * updated conversion script * overwritten commit! fixing PR * apply feedback * adding BLTRMSNorm like Llama * add repeat_kv and eager_attention_forward copied from * BLTMLP identical to MllamTextMLP * clean up some args' * more like mllama, but busier inits * BLTTransformerLayer config * decoder, encoder, global configs * wip working on modular file * cleaning up patch and configs * clean up patcher helpers * clean up patcher helpers further * clean up * some config renaming * clean up unused configs * clean up configs * clean up configs * update modular * clean * update demo * config more like mllama, seperated subconfigs from subdicts * read from config instead of self args * update demo file * model weights to causal lm weights * missed file * added tied weights keys * BLTForCausalLM * adding files after add-new-model-like * update demo * working on tests * first running integration tests * added integration tests * adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff * tokenizer clean up * modular file * fixing rebase * ruff * adding correct basemodel output and updating config with checkpoint vals (for testing) * BLTModelTests git status * enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic * fix sdpa == causal tests * fix small model test and some gradient checkpointing * skip training GC tests * fix test * updated modular * update modular * ruff * adding modular + modeling * modular * more modern is_casual check * cleaning up modular * more modular reduction * ruff * modular fix * fix styling * return 2 * return 2 * fix some tests * fix bltcrossattention after modular break * some fixes / feedback * try cache generate fix * try cache generate fix * fix generate tests * attn_impl workaround * refactoring to use recent TransformersKwargs changes * fix hidden_states shape test * refactor to new outputs * simplify outputs a bit * rm unneeded decoderlayer overwriting * rename blt * forgot tokenizer test renamed * Reorder * Reorder * working on modular * updates from modular * new modular * ruff and such * update pretrainedmodel modular * using cohere2 apply_rotary_pos_emb * small changes * apply feedback r2 * fix cross_attention * apply more feedback * update modeling fix * load submodules from pretrainedmodel * set initializer_range to subconfigs * rm cross_attnetion_states pass when not needed * add 7b projection layer support * check repo * make copies * lost cohere2 rotate_half * ruff * copies? * don't tie weights for submodules * tie weights setting * check docstrings * apply feedback * rebase * rebased modeling * update docs * applying feedback * few more fixes * fix can_record_outputs * fast tokenizer * no more modulelist * tok auto * rm tokenizersss * fix docs * ruff * fix after rebase * fix test, configs are not subscriptable --------- Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal> Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal> * [docs] rm stray tf/flax autodocs references (#40999) rm tf references * [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796) * fix * fixup inits * oops * fixup gemma * fixup modular order * how does this keep happen lol * vaultgemma is new i forgot * remove init check * Make `EfficientLoFTRModelTest` faster (#41000) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix typoes in src and tests (#40845) Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix more dates in model cards and wrong modalities in _toctree.yml (#40955) * Fix model cards and modalities in toctree * fix new models * RUFF fix on CI scripts (#40805) Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * fix dict like init for ModelOutput (#41002) * fix dict like init * style * 🚨 [v5] remove generate output retrocompatibility aliases (#40998) remove old type aliases * [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980) * update test (and overwrites) * better test comment * 0 as a default for * Patch more `unittest.case.TestCase.assertXXX` methods (#41008) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * 🚨 [v5] remove deprecated entry point (#40997) * remove old entry point * update references to transformers-cli * 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859) * fix: bug that made early stop change order of matches * fix: applied code suggestion Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix: applied code suggestion to modular * fix: integration tests --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * Fix `PhimoeIntegrationTest` (#41007) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix Glm4v test (#41011) fix * Update after #41007 (#41014) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix benchmark runner argument name (#41012) * Adding support for Qwen3Omni (#41025) * Add Qwen3Omni * make fix-copies, import properly * nit * fix wrong setup. Why was audio_token_id renamed ? * upds * more processing fixes * yup * fix more generation tests * down to 1? * fix import issue * style, update check repo * up * fix quality at my best * final quality? * fix doc building * FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE * SKIP THE TEMPLATE ONE --------- Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com> Co-authored-by: Arthur <arthur.zucker@gmail.com> * Making compute_loss_func always take priority in Trainer (#40632) * logger warn, if-else logic improved * redundant if condition fix * Modify Qwen3Omni parameter name since VL changed it (#41045) Modify parameter name since VL changed it Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com> * Fix Qwen video tests (#41049) fix test * [testing] Fix `qwen2_audio` (#41018) * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix typing of tuples (#41028) * Fix tuple typing Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * More fixes Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * More fixes Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Remove optax (#41030) Remove optax dep Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix typos in English/Chinese documentation (#41031) * Fix typos and formatting in English docs Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix typos and formatting in Chinese docs Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Use torch.autocast (#40975) * Use torch.autocast Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Format code Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * docs: improved RoPE function Docstrings (#41004) * docs: improved RoPE functuon docstrings * Update src/transformers/modeling_rope_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Fix condition for emitting warning when generation exceeds max model length (#40775) correct warning when generation exceeds max model length Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com> * Fix outdated torch version check (#40925) Update torch minimum version check to 2.2 Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Remove doc of tf and flax (#41029) Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485) * Add whole word masking * Vectorize whole word masking functions * Unit test whole word masking * Remove support for TF in whole word masking * [testing] Fix `seed_oss` (#41052) * fix * fix * fix * fix * fix * fix * Update tests/models/seed_oss/test_modeling_seed_oss.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * Remove repeated import (#40937) * Remove repeated import Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix conflict Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Simplify unnecessary Optional typing (#40839) Remove Optional Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Add write token for uploading benchmark results to the Hub (#41047) * Separate write token for Hub upload * Address review comments * Address review comments * Ci utils (#40978) * Add CI reports dir to gitignore * Add utils to run local CI * Review compliance * Style * License * Remove <frameworkcontent> and <pt> tags from documentation (#41055) * Remove <frameworkcontent> and <pt> tags Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Revert changes Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Update docs/source/en/model_doc/madlad-400.md --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Fix CI jobs being all red 🔴 (false positive) (#41059) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Update quantization CI (#41068) * fix * new everything * fix * [i18n-bn] Add Bengali language README file (#40935) * [i18n-bn] Add Bengali language README file and update links in existing language files * Update Bengali README for clarity and consistency in model descriptions * Improve documentation and errors in Mamba2-based models (#41063) * fix bug in Mamba2 docs * correct 'because on of' issue * link to other Mamba2 model types * github URL is not changed * update error message in generated files * Update team member list for some CI workflows (#41094) * update list * update list --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * fix crash when using chat to send 2+ request to gptoss (#40536) Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * Minor addition, no split modules for VideoMAEE (#41051) * added no split modules * fixed typo --------- Co-authored-by: Raushan Turganbay <raushan@huggingface.co> * Switch to `python:3.10-slim` for CircleCI docker images (#41067) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix argument name in benchmarking script (#41086) * Fix argument name in benchmarking script * Adjust vars * Remove mention of TensorFlow/Flax/JAX from English documentation (#41058) Remove mention of TensorFlow from English documentation Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix typos in documentation (#41087) Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix typing (#40788) * Fix optional typing Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix optional typing Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix schema typing Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix typing * Fix typing * Fix typing * Fix typing * Use np.ndarray Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix typing Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Format code Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Use np.ndarray Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Improve typing Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix quote string of np.ndarray Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * More fixes Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix code * Format Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Remove unused arguments (#40916) * Fix unused arguments Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * More fixes Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Remove tf and flax from Chinese documentation (#41057) Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * fix wrong height and width when read video use torchvision (#41091) * docs: Fix Tool Use links and remove dead RAG links (#41104) docs: Fix tool use links. Remove dead RAG links. Fix style * 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917) * tmp * fix modular inheritance * nit * paligemma 1 doesn't have swa * use same pattern as in models with hybrid layers * PR comments * helium also needs layer_typed (bc it relies on gemma) * paligemma/gemma3: same mask creation fn in fwd and generate * propagate changes to helium (gemma-based) * tmp commit * slow paligemma tests passing, let's see what breaks * fix test_left_padding_compatibility * tmp commit * tmp commit * rebase error * docs * reduce diff * like this? * t5gemma * better comment * shorter diff * exception * ffs type * optional * shorter modular_gemma.py * helium model actually needs no changes -- the tester is the issue * t5gemma modular config * a few more modular; paligemma BC * fix processor issues? * rm config exception * lift warning in gemma * [tests] gpt2 + `CausalLMModelTester` (#41003) * tmp commit * tmp commit * tmp commit * rm old GPT2ModelTester * nit bug * add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns * vision_encoder_decoder * Fix `_get_test_info` for inherited tests (#41106) * fix _get_test_info * fix patched * add comment * ruff --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Remove bad test skips (#41109) * remove bad skips * remove more * fix inits * Format empty lines and white space in markdown files. (#41100) * Remove additional white space and empty lines from markdown files Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Add empty lines around code Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809) Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> * 🚨 [V5] Remove deprecated training arguments (#41017) * Remove deprecated training arguments from V5 Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Remove deprecated training arguments from V5 Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix comments Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix code Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Support loading LFM2 GGUF (#41111) * add gguf config mapping for lfm2 * add lfm2 tensor process to unsqueeze conv weights * adjust values from gguf config to HF config * add test for lfm2 gguf * ruff --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * [torchao safetensors] integrate torchao safetensors support with transformers (#40735) * enable torchao safetensors * enable torchao safetensors support * add more version checking * [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036) * fix mismatched dims for qwen3 next * propagate changes * chore: renamed tot_heads to total_sequence_length * Apply suggestion from @vasqu Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * minor fix to modular qwen3 next file --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * Fix the error where a keyword argument appearing before args (#41099) Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> Fix broken `` expressions in markdown files (#41113) Fix broken expressions in markdown files Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Remove self-assignment (#41062) * Remove self-assignment Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Update src/transformers/integrations/flash_paged.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Clear pass Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Clear pass Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Clear pass Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928) * Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning * docs(text2text_generation): 更新参数注释以反映现代生成实践将max_length参数注释更新为max_new_tokens，以符合现代生成实践中指定生成新token数量的标准做法 * refactor(text2text_generation): Remove outdated input validation logic * docs(text2text_generation): Revert incorrectly modified comment * docs(text2text_generation): Revert incorrectly modified comment * Fixed MXFP4 model storage issue (#41118) * Fixed loading LongT5 from legacy checkpoints (#40724) * Fixed loading LongT5 from legacy checkpoints * Adapted the fix to work with missing lm_head * dummy commit (#41133) * dummy commit, nothing interesting * dummy commit, nothing interesting * dummy commit, nothing interesting * dummy commit, nothing interesting --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix loading logic flaw with regards to unexpected and missing keys (#40850) * Unexpected keys should be ignored at load with device map * remove them all * fix logic flaw * fix * simplify * style * fix * revert caching allocator change * add other test * add nice doc --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Using torch.distributions.Categorical * Resolving logits_process.py Issues * style: autoformat with make fixup * Update logits_process.py removed defaults * Variable H name -> cumulative_entropy * Resolving format error * Correction of the loop variables in logit processor * Vectorized the loop in logits_process * formatted logits_process * paper reference and stopping rule comment logits_process * Trigger CI rerun * Update logits_process.py * added test_TopH_example_integration * added test_TopH_example_integration * Update README.md * Restore CI config to match main (remove accidental changes) * Restore CI config to match upstream main (no diffs) --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com> Signed-off-by: Wang, Yi <yi.a.wang@intel.com> Co-authored-by: ArminAzizi98 <147081650+ArminAzizi98@users.noreply.github.com> Co-authored-by: Yuanyuan Chen <cyyever@outlook.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Yuchao Zhang <418121364@qq.com> Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> Co-authored-by: Bo Zheng <368586905@qq.com> Co-authored-by: bozheng-hit <dsoul0621@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: Ryan Mullins <ryanmullins@google.com> Co-authored-by: Amer <amersinha@gmail.com> Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com> Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Ákos Hadnagy <akos@ahadnagy.com> Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com> Co-authored-by: NanoCode012 <nano@axolotl.ai> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: 艾力可 <178652170+thalahors@users.noreply.github.com> Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com> Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com> Co-authored-by: Samuel Barry <127697809+SamuelBarryCS@users.noreply.github.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: HyunZ118 <156191095+HyunZ118@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com> Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Shane A <shanea@allenai.org> Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Raushan Turganbay <raushan@huggingface.co> Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Co-authored-by: vb <vaibhavs10@gmail.com> Co-authored-by: Yaswanth Gali <82788246+yaswanth19@users.noreply.github.com> Co-authored-by: Akshay Babbar <priv.akshay@outlook.com> Co-authored-by: liangel-02 <liangel@meta.com> Co-authored-by: Duc-Viet Hoang <vietyb00@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: lilin-1 <256404019@qq.com> Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com> Co-authored-by: Jack <32371937+jackzhxng@users.noreply.github.com> Co-authored-by: Rangehow <88258534+rangehow@users.noreply.github.com> Co-authored-by: rangehow <rangehow@foxmail.com> Co-authored-by: Anna <anna@liquid.ai> Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com> Co-authored-by: Hamish Scott <41787553+hamishs@users.noreply.github.com> Co-authored-by: Harshal Janjani <75426551+harshaljanjani@users.noreply.github.com> Co-authored-by: Branden <brandenkmurray@gmail.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal> Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal> Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal> Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal> Co-authored-by: StevenBucaille <steven.bucaille@gmail.com> Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com> Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com> Co-authored-by: Arthur <arthur.zucker@gmail.com> Co-authored-by: Ayush <ayushtanwar1729@gmail.com> Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com> Co-authored-by: Ralph Gleaton <70818603+rjgleaton@users.noreply.github.com> Co-authored-by: Saidur Rahman Pulok <59414463+saidurpulok@users.noreply.github.com> Co-authored-by: Nick Doiron <ndoiron@mapmeld.com> Co-authored-by: Wang, Yi <yi.a.wang@intel.com> Co-authored-by: Duygu Altinok <duygu.altinok12@gmail.com> Co-authored-by: Jinde.Song <juude.song@gmail.com> Co-authored-by: hbenoit <60629420+HaroldBenoit@users.noreply.github.com> Co-authored-by: nnul <107971634+notkisk@users.noreply.github.com> Co-authored-by: YangKai0616 <kai.yang@intel.com> Co-authored-by: Karol Szustakowski <61427290+Szustarol@users.noreply.github.com> Co-authored-by: souvikku <107592858+souvikku@users.noreply.github.com>	2025-10-08 13:37:51 +00:00
Jerry Zhang	2166e26cb1	[torchao] Add regex support for ModuleFqnToConfig (#41242 ) * Add regex support for ModuleFqnToConfig Summary: Similar to https://github.com/pytorch/ao/pull/3084 we added regex support in transformers so people can use regex to quantize the models. See https://github.com/pytorch/ao/pull/3084 for docs and precedence of different configurations Uploaded model: https://huggingface.co/torchao-testing/opt-125m-ModuleFqnToConfig-v1-regex-0.14.0.dev Test Plan: pytest tests/quantization/torchao_integration/test_torchao.py -k test_module_fqn_to_config_regex Reviewers: Subscribers: Tasks: Tags: * Apply style fixes * add assert for --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-10-08 11:05:15 +00:00

1 2 3 4 5 ...

3756 Commits