transformers-mirror

Author	SHA1	Message	Date
Cyril Vallez	5f6e278a51	Remove `set_model_tester_for_less_flaky_tests` (#40982 ) remove	2025-09-18 18:56:10 +02:00
Cyril Vallez	4df2529d79	🚨🚨🚨 Fully remove Tensorflow and Jax support library-wide (#40760 ) * setup * start the purge * continue the purge * more and more * more * continue the quest: remove loading tf/jax checkpoints * style * fix configs * oups forgot conflict * continue * still grinding * always more * in tje zone * never stop * should fix doc * fic * fix * fix * fix tests * still tests * fix non-deterministic * style * remove last rebase issues * onnx configs * still on the grind * always more references * nearly the end * could it really be the end? * small fix * add converters back * post rebase * latest qwen * add back all converters * explicitly add functions in converters * re-add	2025-09-18 18:27:39 +02:00
Yih-Dar	5ac3c5171a	Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-18 18:27:27 +02:00
Pavel Iakubovskii	d9d7f6a6b9	Revert change in `compile_friendly_resize` (#40645 ) fix	2025-09-18 16:25:45 +01:00
Yih-Dar	738b223f57	Add captured actual outputs to CI artifacts (#40965 ) * fix * fix * Remove `# TODO: ???` as it make me `???` * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-18 15:40:53 +02:00
Cyril Vallez	dd7ac4cd59	[tests] Really use small models in all fast tests (#40945 ) * start * xcodec * chameleon * start * layoutlm2 * layoutlm * remove skip * oups * timm_wrapper * add default * doc * consistency	2025-09-18 15:24:12 +02:00
Branden	2ce35a248f	Fix Issue #39030 : AutoTokenizer.from_pretrained does not propagate token (#40956 ) * fix merge conflicts * change token typing --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>	2025-09-18 13:22:19 +00:00
Harshal Janjani	6e51ac31ef	[timm_wrapper] better handling of "Unknown model" exception in timm (#40951 ) * fix(timm): Add exception handling for unknown Gemma3n model * nit: Let’s cater to this specific issue * nit: Simplify error handling	2025-09-18 14:09:08 +01:00
Marc Sun	9378f874c1	[Trainer] Fix DP loss (#40799 ) * fix * style * Fix fp16 * style --------- Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>	2025-09-18 13:07:20 +00:00
Hamish Scott	7cf1f5ced0	Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966 ) use skip_predictor in vjepa2 `get_vision_features`	2025-09-18 11:51:45 +00:00
Yuanyuan Chen	f6104189fd	Fix outdated version checks of accelerator (#40969 ) * Fix outdated version checks of accelerator Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Fix outdated version checks of accelerator Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-09-18 11:49:14 +00:00
Raushan Turganbay	c532575795	Add new model LFM2-VL (#40624 ) * Add LFM2-VL support * add tests * linting, formatting, misc review changes * add siglip2 to auto config and instantiate it in lfm2-vl configuration * decouple image processor from processor * remove torch import from configuration * replace \| with Optional * remove layer truncation from modeling file * fix copies * update everything * fix test case to use tiny model * update the test cases * fix finally the image processor and add slow tests * fixup * typo in docs * fix tests * the doc name uses underscore * address comments from Yoni * delete tests and unsuffling * relative import * do we really handle imports better now? * fix test * slow tests * found a bug in ordering + slow tests * fix copies * dont run compile test --------- Co-authored-by: Anna <anna@liquid.ai> Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>	2025-09-18 11:01:58 +00:00
Rangehow	564fde14f1	FIX(trainer): ensure final checkpoint is saved when resuming training (#40347 ) * fix(trainer): ensure final checkpoint is saved when resuming training * add test * make style && slight fix of test * make style again * move test code to test_trainer * remove outdated test file * Apply style fixes --------- Co-authored-by: rangehow <rangehow@foxmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-09-18 09:57:21 +00:00
Yih-Dar	5748352c27	Update expected values for one more `test_speculative_generation` after #40949 (#40967 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-18 11:47:14 +02:00
Yuanyuan Chen	438343d93f	Don't list dropout in eager_paged_attention_forward (#40924 ) Remove dropout argument Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-09-18 09:05:50 +00:00
Shane A	449da6bb30	Add FlexOlmo model (#40921 ) * transformers add-new-model-like * Add FlexOlmo implementation * Update FlexOlmo docs * Set default tokenization for flex olmo * Update FlexOlmo tests * Update attention comment * Remove unneeded use of `sliding_window`	2025-09-18 09:04:06 +00:00
Jack	3bb1b4867c	Standardize audio embedding function name for audio multimodal models (#40919 ) * Standardize audio embedding function name for audio multimodal models * PR review	2025-09-18 08:45:04 +00:00
Yih-Dar	58e13b9f12	Update expected values for some `test_speculative_generation` (#40949 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-17 20:50:38 +02:00
Yih-Dar	529d3a2b06	Fix `Glm4vModelTest::test_eager_matches_fa2_generate` (#40947 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-17 19:53:59 +02:00
Yoni Gozlan	a2ac4de8b0	Remove nested import logic for torchvision (#40940 ) * remove nested import logic for torchvision * remove unnecessary protected imports * remove unnecessarry protected import in modular (and modeling) * fix wrongly remove protected imports	2025-09-17 13:34:30 -04:00
Raushan Turganbay	8e837f6ae2	Consistent naming for images kwargs (#40834 ) * use consistent naming for padding * no validation on pad size * add warnings * fix * fox copies * another fix * fix some tests * fix more tests * fix lasts tests * fix copies * better docstring * delete print	2025-09-17 18:40:25 +02:00
Cyril Vallez	eb04363a0d	Raise error instead of warning when using meta device in from_pretrained (#40942 ) * raise instead of warning * add timm * remove	2025-09-17 18:23:37 +02:00
Yih-Dar	ecc1d778ce	Fix `Glm4vMoeIntegrationTest` (#40930 ) * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-17 18:21:18 +02:00
Marc Sun	c5553b4120	Fix trainer tests (#40823 ) * fix liger * fix * more * fix * fix hp * fix --------- Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>	2025-09-17 16:05:17 +00:00
lilin-1	14f01aee39	docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941 )	2025-09-17 08:48:38 -07:00
jiqing-feng	26b65fb516	Intel CPU dockerfile (#40806 ) * upload intel cpu dockerfile Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update cpu dockerfile Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * update label name Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-09-17 15:42:30 +00:00
Joao Gante	66f97d3f64	[models] remove unused `import torch.utils.checkpoint` (#40934 )	2025-09-17 16:37:56 +01:00
Yoni Gozlan	3853bfe4d5	[DOC] Add missing dates in model cards (#40922 ) add missing dates	2025-09-17 11:17:06 -04:00
Pablo Montalvo	6cade29278	Add LongCat-Flash (#40730 ) * working draft for LongCat * BC changes to deepseek_v3 for modular * format * various modularities * better tp plan * better init * minor changes * make modular better * clean up patterns * Revert a couple of modular commits, because we won't convert in the end * make things explicit. * draft test * toctree, tests and imports * drop * woops * make better things * update test * update * fixes * style and CI * convert stuff * up * ah, yes, that * enable gen tests * fix cache shape in test (sum of 2 things) * fix tests * comments * re-Identitise * minimize changes * better defaults * modular betterment * fix configuration, add documentation * fix init * add integration tests * add info * simplify * update slow tests * fix * style * some additional long tests * cpu-only long test * fix last tests? * urg * cleaner tests why not * fix * improve slow tests, no skip * style * don't upcast * one skip * finally fix parallelism	2025-09-17 14:48:10 +02:00
Duc-Viet Hoang	48a5565179	Add support for Florence-2 training (#40914 ) * Support training florence2 * update doc and testing model to florence-community * fix florence-2 test, use head dim 16 instead of 8 for fa2 * skip test_sdpa_can_dispatch_on_flash * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-09-17 11:49:56 +00:00
Yih-Dar	89949c5d2d	Minor fix for #40727 (#40929 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-17 11:42:13 +02:00
Mohamed Mekkouri	c830fc1207	Adding activation kernels (#40890 ) * first commit * add mode * revert modeling * add compile * rm print	2025-09-17 11:36:09 +02:00
liangel-02	f6999b00c3	[torchao safetensors] renaming get_state_dict function (#40774 ) renaming get_state_dict function Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-09-17 11:20:50 +02:00
Akshay Babbar	8428c7b9c8	Fix #40067 : Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218 ) * Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test) * chore: fix code formatting and linting issues * refactor: move UMT5 GGUF test to quantization directory and clean up comments * chore: trigger CI pipeline * refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency. * Add regression check to UMT5 encoder GGUF test Verify encoder output against reference tensor values with appropriate tolerances for stability. * Update tests/quantization/ggml/test_ggml.py Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> * Update tests/quantization/ggml/test_ggml.py remove comments Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-09-17 09:15:55 +00:00
Yaswanth Gali	ddd4caf066	[Llama4] Remove `image_sizes` arg and deprecate `vision_feature_layer` (#40832 ) * Remove unused arg * deprecate * revrt one change * get set go * version correction * fix * make style * comment	2025-09-17 09:14:13 +00:00
Raushan Turganbay	b82cd1c240	Processor load with multi-processing (#40786 ) push	2025-09-17 09:46:49 +02:00
Aritra Roy Gosthipaty	6e50a8afb2	[Docs] Adding documentation of MXFP4 Quantization (#40885 ) * adding mxfp4 quantization docs * review suggestions * Apply suggestions from code review Co-authored-by: vb <vaibhavs10@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --------- Co-authored-by: vb <vaibhavs10@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-09-16 11:31:28 -07:00
Raushan Turganbay	cccef4be91	Fix dtype in Paligemma (#40912 ) * fix dtypes * fix copies * delete unused attr	2025-09-16 16:07:56 +00:00
Yoni Gozlan	beb09cbd5a	🔴Make `center_crop` fast equivalent to slow (#40856 ) make center_crop fast equivalent to slow	2025-09-16 16:01:38 +00:00
Joao Gante	d4af0d9f03	[generate] misc fixes (#40906 ) misc fixes	2025-09-16 15:18:06 +01:00
Joao Gante	3b3f6cd0c1	[gemma3] `Gemma3ForConditionalGeneration` compatible with assisted generation (#40791 ) * gemma3vision compatible with assisted generation * docstring * BC * docstring * failing checks * make fixup * apply changes to modular * misc fixes * is_initialized * fix poor rebase	2025-09-16 15:08:48 +01:00
Yih-Dar	88ba0f107e	disable `test_fast_is_faster_than_slow` (#40909 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-16 15:34:04 +02:00
Yih-Dar	270da89708	Remove `runner_map` (#40880 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-16 15:18:07 +02:00
Xuehai Pan	df03fc1f9c	Improve module name handling for local custom code (#40809 ) * Improve module name handling for local custom code * Use `%lazy` in logging messages * Revert "Use `%lazy` in logging messages" This reverts commit 5848755d5805e67177c5218f351c0ac852df9340. * Add notes for sanitization rule in docstring * Remove too many underscores * Update src/transformers/dynamic_module_utils.py * Update src/transformers/dynamic_module_utils.py --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2025-09-16 13:11:48 +00:00
Yuanyuan Chen	96bc19bcdf	remove dummy EncodingFast (#40864 ) Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-09-16 12:56:11 +00:00
Shane A	d0af4269ec	Add Olmo3 model (#40778 ) * transformers add-new-model-like for Olmo3 * Implement modular Olmo3 * Update Olmo3 tests * Copy Olmo2 weight converter to Olmo3 * Implement Olmo3 weight converter * Fix code quality errors * Remove unused import * Address rope-related PR comments * Update Olmo3 model doc with minimal details * Fix Olmo3 rope test failure * Fix 7B integration test	2025-09-16 13:28:23 +02:00
Yih-Dar	65f9ede359	Set seed for `Glm4vIntegrationTest` (#40905 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-16 13:01:51 +02:00
Cyril Vallez	0c1839d609	[cache] Only use scalars in `get_mask_sizes` (#40907 ) * remove tensor ops * style * style	2025-09-16 12:48:58 +02:00
Cyril Vallez	3688a977d0	Harmonize CacheLayer names (#40892 ) * unify naming * style * doc as well * post rebase fix * style * style * revert	2025-09-16 12:14:12 +02:00
Cyril Vallez	087775d10e	[cache] Merge static sliding and static chunked layer (#40893 ) * merge * get rid of tensors in get_mask_sizes!! * remove branch * add comment explanation * re-add the class with deprecation cycle	2025-09-16 11:41:20 +02:00

1 2 3 4 5 ...

20501 Commits