transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-10-20 17:13:56 +08:00

Author	SHA1	Message	Date
Yuanyuan Chen	12a50f294d	Enable FURB rules in ruff (#41395 ) * Apply ruff FURB rules Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Enable ruff FURB rules Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * More fixes Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * More fixes Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Revert changes Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * More fixes Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-10-17 15:00:40 +00:00
Yuanyuan Chen	080d704af1	Fix Pylint warnings (#41644 ) * Fix pylint warnings Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * More fixes Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Raise with an exception Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-10-17 13:09:42 +00:00
Raushan Turganbay	10de06dace	🚨 [v5] Refactor RoPE for layer types (#39847 ) * update * batch update model code * typos * too many diffs, dump * dump again * another dump * fix copies * make `rope_scaling_dict` self attr * fix a few more tests * another update * fix a few more tests, hopefully last ones * fox copies * fix copies again * fix newly added models, I hate rebasing on main * update config files * modular files * fix rope utils test * docstring has to be indented more, why? * oops forgot to update some modualr files * copy from doesn't copy decorators? * fix overriden test as well * add a new test * fix failing tests again * update docstrings * fix phi3 * fix two models * fix copies * forgot to add * stupid bug from modular conversion * fix slow tests * update to call rotary emb once per model forward * 3K tests failing?! * update * update more models * fix copies * fix the rest of tests hopefully * fix after rebase * fix the rope tests * fix docs omni * change a bit * models with layer types * why it was deleted? * fix a few tests * fix last test! * delete extra empty lines * add a test case * more changes * fix models * typing hint for nested rope params * missed when resolving conflicts * delete layer types and fix typo * fix copies * fix copies * update docs text * docs * huuge update all models * fix copies * rename attr to align with new format * delete redundant rope tests * trigger ci * update the case * this is why i hate rebasing * maybe fixed? * oops * now fix? * fix last tests and copies * fix copies? * fix minimax and gemma3n * update typo * deprecation end version * final fix copies :fingers-crossed: * oh my, add the docs in toctree * oke, this is really the last fix * fix copies and hope that tests won't start failing again * use rope scaling if saved * fix slow tests * fix cwm and unrelated deepseek * fix last * update * hope it works now, it took so long * lets keep None for now, I will try to remove after checking tests * some more fixes, i find and replace does not always find all cases * last fix of tests * arthur's comment for extra foreward kwargs * delete unused code * fix slow qwen tests * delete layer types from models * faulty modular conversion * fix qwen omni * fix copies and style * address my comment --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-10-17 14:57:27 +02:00
Lucain	252d7cd952	Remove deprecated `use_auth_token` parameter (#41666 ) * Remove deprecated use_auth_token * code styl * fix test * Update examples/pytorch/speech-recognition/README.md	2025-10-17 09:57:46 +00:00
Rémi Ouazan	cf1e9834ec	Restore cuda graphs to continuous batching (#41421 ) * Type hints and small fixes * Remove unusued params * Made slice inputs the default * ruffed * Updated some var name and moved index slicing * Logging arg in example * Added some padding debug var and reformat out cg * First working CG, fixe size * Working flexible CG * CG are compatible with all implementations * Fixed CG API * Update example * Documentation * Fix padding tokens in FA * Review compliance * Better doc around weird bug * Style * Fix for sliding with CG	2025-10-13 11:57:56 +02:00
Marc Sun	feca4f3de7	remove `tpu_num_cores` (#41383 ) * remove-tpu-num-cores * fix * let's remove it * style * Update examples/legacy/seq2seq/finetune_tpu.sh Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-10-10 15:53:28 +02:00
Marc Sun	f9f8bf5a10	Revert `local_rank` deletion and some cleaning (#41504 ) * forgot those * clean * Fix * merge * fix * fix	2025-10-10 12:23:04 +02:00
Marc Sun	0419ff881d	Remove `local_rank` arg from `TrainingArguments` (#41382 )	2025-10-09 18:54:12 +02:00
Marc Sun	081391b20e	deprecate `jit_mode_eval` (#41376 )	2025-10-09 18:50:45 +02:00
Marc Sun	776eea8612	deprecate `overwrite_output_dir` (#41323 ) * dep * style * rm * wut * style	2025-10-09 18:36:19 +02:00
Yuanyuan Chen	2b5e4c0d13	Import Callable from collections.abc (#41130 ) Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-10-09 12:12:43 +00:00
Arthur	8dfc8e8cfc	🤦 CB nit! (#41413 ) * 🤦 * updates * update cb simple * merge * up * update * fix * up * nit * rumble this is annoying * update * update * up * fix * .... * cleanup a bit * nit * typo * typing and typo * nit * updates * up * final fix! * update * fix more import issues * nuke is paged * up	2025-10-08 13:36:27 +02:00
Cyril Vallez	46db0edf3b	🚨🚨 Remove all traces of legacy cache format (#41378 ) * remove * more * add back * tests * revert classes * tests * add exceptions * reapply modular * rename * oupsi * start with whisper * fix tests * fix * fix * fix * typing	2025-10-08 11:14:44 +02:00
Cyril Vallez	242eb9cbdc	Remove deprecation warning (#41425 ) * remove * fix space	2025-10-07 19:21:14 +02:00
Arthur	0395ed52ae	[`CB`] Refactors the way we access paged (#41370 ) * up * refactor the way we handle paged attention * affect serve as well * update * fix * cup	2025-10-06 17:55:31 +02:00
Yuanyuan Chen	fa36c973fc	Remove unnecessary list comprehension (#41305 ) Remove unnecessary comprehension Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-10-06 14:49:02 +00:00
Anton Vlasjuk	c27b67f0cd	🚨 [`v5`] Remove relative position embeddings (for bert like models) (#41170 ) * remove from modeling files * remaining changes * style / copies * revert deprecated models and fixup some models * oops	2025-10-06 14:21:41 +02:00
Raushan Turganbay	9db58abd6e	Check model inputs - hidden states (#40994 ) * update all models * fix copies * skip aria tests * update other models * skip should be in test, not tester * i think this is more descriptive as a name * find and replace for new models	2025-10-06 11:48:52 +02:00
Cyril Vallez	163601c619	Standardize `PretrainedConfig` to `PreTrainedConfig` (#41300 ) * replace * add metaclass for full BC * doc * consistency * update deprecation message * revert	2025-10-06 11:34:02 +02:00
Yuanyuan Chen	894a2bdd8c	Fix pylint generator warnings (#41258 ) Fix pylint generator warnings Signed-off-by: cyy <cyyever@outlook.com>	2025-10-02 12:35:42 +00:00
Yuanyuan Chen	1cc9069551	Fix unnecessary single-item container checks (#41279 ) Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-10-02 12:35:11 +00:00
Marc Sun	103fa6d235	[v5] Remove deprecated prediction loop (#41123 ) * rem deprecated * more * rm all instances of legacy arg	2025-09-30 17:43:01 +02:00
Marc Sun	dded9fd112	[v5] More Training Args cleaning (#41131 ) clean	2025-09-30 17:38:07 +02:00
Anton Vlasjuk	52f5eca7c9	🚨 [`v5`] Remove headmasking (#41076 ) * first attempt at removing * copies * last bits in core * quick fixes * tests purge * docs and examples * some fixes * more * another round of cleanups * fix * fix a bunch of models * fix dummy bert * fix * fix new model * fix signature change * fix * fix style/copies * new models * fix copies didnt find that damn * test * this shouldnt have happened during model addition	2025-09-30 16:04:57 +02:00
Marc Sun	06c04e0851	Deprecate `half_precision_backend` (#41134 ) * deprecate * remove * rm apex * fix * fix * fix doc	2025-09-30 11:36:44 +02:00
OMOTAYO OMOYEMI	42c682514b	docs/examples(speech): pin CTC commands to Hub datasets; add Windows notes (#41027 ) * examples(speech): load Common Voice from Hub; remove deprecated dataset-script references (Windows-friendly notes) * docs/examples(speech): pin CTC streaming & other CTC commands to Hub datasets; add Windows notes * make style * examples(speech): align DataTrainingArguments help with datasets docs; minor wording fixes * docs/examples(speech): address review remove Hub subsection & Whisper tip; align dataset help text * style: apply ruff/black/usort/codespell on examples/speech-recognition * Apply style fixes * Update examples/pytorch/speech-recognition/README.md * update doc to match load_dataset --------- Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com> Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-09-30 08:38:31 +00:00
Cyril Vallez	5426edecab	Make quantizers good citizens loading-wise (#41138 ) * fix param_needs_quantization * rewrite most hqq * clean * fix * comment * remove it from exception of safetensors * start on bnb 4bits * post-rebase fix * make bnb4 bit a good citizen * remove forgotten print * make bnb 8bits a good citizen * better hqq * fix * clean * remove state dict from signature * switch method * make torchao a good citizen * fixes * fix torchao * add check * typo	2025-09-29 17:04:45 +02:00
Lysandre Debut	10f6891fc5	Remove data from examples (#41168 ) Remove telemetry	2025-09-26 13:52:45 +02:00
Rémi Ouazan	97ca0b4712	Fix flash-attn for paged_attention when no kernels (#41078 ) * Fix non-kernels flash attention paged implementation * Cover all cases * Style * Update src/transformers/integrations/flash_paged.py Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * Apply style fixes --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-09-26 10:41:21 +02:00
Cyril Vallez	e54bb62a73	Simplify and improve model loading logic (#41103 ) * remove unexpected keys from inputs (they have nothing to do there) * remove input * simplify a lot init * fix * fix check for non-persistent buffer * revert because too many old and bad models... * remove comment * type hint * make it a real test * remove model_to_load -> always use the same model * typo * remove legacy offload_folder (we never waste that memory anymore) * do not change prefix anymore * change very bad function name * create adjust method * remove useless method * restrict * BC * remove unused method * CI * remove unused args * small fix * fix * CI * CI * avoid too many loops * fix regex * cleaner * typo * fix * fix	2025-09-25 17:28:27 +02:00
Yuanyuan Chen	65dcd66cc8	🚨 [V5] Remove deprecated training arguments (#41017 ) * Remove deprecated training arguments from V5 Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Remove deprecated training arguments from V5 Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix comments Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix code Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-09-24 12:01:27 +02:00
Cyril Vallez	4df2529d79	🚨🚨🚨 Fully remove Tensorflow and Jax support library-wide (#40760 ) * setup * start the purge * continue the purge * more and more * more * continue the quest: remove loading tf/jax checkpoints * style * fix configs * oups forgot conflict * continue * still grinding * always more * in tje zone * never stop * should fix doc * fic * fix * fix * fix tests * still tests * fix non-deterministic * style * remove last rebase issues * onnx configs * still on the grind * always more references * nearly the end * could it really be the end? * small fix * add converters back * post rebase * latest qwen * add back all converters * explicitly add functions in converters * re-add	2025-09-18 18:27:39 +02:00
Rémi Ouazan	ef053939ca	Fixes for continuous batching (#40828 ) * Fix for CB attn mask and refactor * Tests for CB (not all passing) * Passing tests and a logger fix * Fixed the KV metrics that were broken when we moved to hybrid alloc * Fix circular import and style * Added tests for FA * Unfolded test to have device expectations * Fixes for H100 * more fixes for h100 * H100 are good * Style * Adding some comments from #40831 * Rename test * Avoid 1 letter variables * Dictonnary is only removed during kwargs * Test for supported sample * Fix a unvoluntary slice * Fixes for non-sliced inputs and small example improvments * Slice inputs is more understandabe * Style	2025-09-12 15:35:31 +02:00
Rémi Ouazan	1cdbbb3e9d	Support sliding window in CB (#40688 ) * CB example: better compare feature * Cache managers, still issue w/ effective length * WIP -- fix for effective length * Renames * Wroking, need better parity checks, we mind be missing 1 token * Small fixes * Fixed wrong attn mask and broke cache into pieces * Warmup is slowing down things, disabling it * Cache was too big, fixed * Simplified index objects * Added a profile option to the example * Avoid calls to memory reporing tools * Restore full attention read indices for better latency * Adressed some TODOS and style * Docstrings for cache managers * Docstrings for Schedulers * Refactor scheudlers * [Important] Cache fix for sliding window, check with small sw size * Updated doc for cache memory compute and cache as a whole * Moved a todo * Nits and style * Fix for when sliding window is smaller than max batch per token * Paged interface update * Support for FLash in new API * Fix example CB * Fix bug in CB for paged * Revert example * Style * Review compliance * Style * Styleeeee * Removed NO_SLIDING_WINDOW * Review #2 compliance * Better art * Turn cum_seqlens_k in a dict * Attn mask is now a dict * Update examples/pytorch/continuous_batching.py Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> * Adressed McPatate pro review * Style and fix --------- Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>	2025-09-09 15:51:11 +02:00
Yuanyuan Chen	fd2a29d468	Fix more typos (#40627 ) Fix typos Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-09-08 16:05:40 +00:00
Kashif Rasul	3f7bda4209	[Continous Batching] fix do_Sample=True in continuous batching (#40692 ) * fix do_Sample=True in continous batching * added test * fix top_p * test * Update examples/pytorch/continuous_batching.py	2025-09-08 10:30:15 +02:00
Matt	fe1a9e0dba	Remove TF/Flax examples (#40654 ) * Remove TF/Flax examples * Remove check_full_copies * Trigger CI	2025-09-03 14:15:57 +01:00
Yuanyuan Chen	a470f21396	Enable more ruff UP rules (#40579 ) * Import Sequence from collections.abc Signed-off-by: cyy <cyyever@outlook.com> * Apply ruff UP rules Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-09-02 17:29:59 +02:00
Rémi Ouazan	21e708c8fd	Fix for missing default values in encoder decoder (#40517 ) * Added default_value for is_updated and type check * Forgot one * Repo consistency	2025-09-01 16:11:23 +02:00
Yuanyuan Chen	a543095c99	Fix typos (#40585 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-09-01 12:58:23 +00:00
Lysandre	ce48e9cac0	Dev version	2025-08-29 20:17:34 +02:00
Cyril Vallez	becab2c601	Use the config for DynamicCache initialization in all modelings (#40420 ) * update all * remove the most horrible old code * style	2025-08-28 14:32:30 +02:00
Rémi Ouazan	34108a2230	Continuous batching refactor (#40426 ) * Rework of the CB example * Further rework of CB example * Refactor PA cache, slice on tokens, add debug prints -- WIP * Slice cache -- WIP * Added a mechanism to check batched outputs in CB script * Less logging, debug flag for slice, !better reset! -- WIP * QOL and safety margins * Refactor and style * Better saving of cb example * Fix * Fixes and QOL * Mor einformations about metrics * Further logging * Style * Licenses * Removed some comments * Add a slice input flag * Fix in example * Added back some open-telemetry deps * Removed some aux function * Added FA2 option to example script * Fixed math (all of it) * Added a simple example * Renamed core to classes * Made allocation of attention mask optionnal * Style	2025-08-26 13:01:42 +02:00
Manuel de Prada Corral	49e168ff08	🚨 Remove Contrastive Search decoding strategy (#40428 ) * delete go brrr * fix tests * review	2025-08-26 12:31:46 +02:00
Pablo Montalvo	ba095d387d	🧹 🧹 🧹 Get set decoder cleanup (#39509 ) * simplify common get/set * remove some noise * change some 5 years old modeling utils * update examples * fix copies * revert some changes * fixes, gah * format * move to Mixin * remove smolvlm specific require grad * skip * force defaults * remodularise some stuff * remodularise more stuff * add safety for audio models * style * have a correct fallback, you daft donkey * remove this argh * change heuristic for audio models * fixup * revert * this works * this should be explicit * fix Nth ESM exception * tryout decoder * this as well * revert again * 🧠 * aaah ESM has two modelings aaah * broom broom * format * wrong copies * copies * modular cleanups * format * modularities * wrong mergefix * seriously * align with new model * new model	2025-08-25 10:57:56 +02:00
Cyril Vallez	d8f6d3790a	⚠️⚠️ Use `dtype` instead of `torch_dtype` everywhere! (#39782 ) * update everywhere * style * pipelines * switch it everywhere in tests * switch it everywhere in docs * switch in converters everywhere * update in examples * update in model docstrings * style * warnings * style * Update configuration_utils.py * fix * Update configuration_utils.py * fixes and add first test * add pipeline tests * Update test_pipelines_common.py * add config test * Update test_modeling_common.py * add new ones * post rebase * add new * post rebase adds	2025-08-22 12:34:16 +02:00
Matteo Destro	56c44213b3	[detection] fix attention mask for RT-DETR-based models (#40269 ) * Fix get_contrastive_denoising_training_group attention * Add bool attention_mask conversion	2025-08-19 09:15:56 +00:00
Manuel de Prada Corral	a36d51e801	🚨 Always return Cache objects in modelings (to align with generate) (#39765 ) * watch the world burn * fix models, pipelines * make the error a warning * remove kwargs and return_legacy_cache * fix reformer	2025-08-18 16:26:35 +02:00
Yuanyuan Chen	6333eb986a	Fix more typos (#40212 ) Signed-off-by: cyy <cyyever@outlook.com>	2025-08-18 12:52:12 +00:00
Yuanyuan Chen	28a03fb78a	Fix various Pylint warnings (#40107 ) Tidy code Signed-off-by: cyy <cyyever@outlook.com>	2025-08-15 12:40:12 +00:00

1 2 3 4 5 ...

2689 Commits