transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-10-20 09:03:53 +08:00

Author	SHA1	Message	Date
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	b3e3c3dc93	[Qwen3VL] fix device mismatch error for FSDP2 training (#41536 ) For FSDP2, parameters might be on a meta device, and the weight.device attribute may not accurately reflect where the actual computation will happen during forward passes. ```log File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 776, in forward pos_embeds = self.fast_pos_embed_interpolate(grid_thw) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 745, in fast_pos_embed_interpolate pos_embeds = self.pos_embed(idx_tensor) * weight_tensor[:, :, None] ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/nn/modules/module.py", line 1879, in _call_impl return inner() ^^^^^^^ File "torch/nn/modules/module.py", line 1827, in inner result = forward_call(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "torch/nn/modules/sparse.py", line 192, in forward return F.embedding( ^^^^^^^^^^^^ File "torch/nn/functional.py", line 2546, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__index_select) ``` https://github.com/volcengine/verl/pull/3686#issuecomment-3380981817 Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-10-14 10:28:25 +00:00
Matt	b84c0b31c6	Remove references to AutoModelForVision2Seq (#41513 ) * Since Vision2Seq is deprecated, remove it from pipelines and docstrings * Catch some more references	2025-10-13 17:00:07 +01:00
Arthur	1ee3b288a6	[`from_pretrained`] Small refactor `from_pretrained`: move around unrelated stuff (#41445 ) * drafts * up * simplify modeling utils * more simplifications * type kwargs * up * move more accelerate related stuff * safeguarding? * nits * remove func when func is NOPE * more * nits * styling * yups * up * ups * revert * protect trainer utils iport * fix doc * Update src/transformers/integrations/peft.py Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> * review * update * ? * fixx * update * super small update * ups * style * this is stupid * 🤦 well this was the issue * small nit * fix * nit * damn the missing return * one last stupid fix --------- Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>	2025-10-13 16:33:32 +02:00
Kehan Li	cad74496ca	[model] Add VideoLLaMA3 implementation (#40499 ) * Add VideoLLaMA3 implementation * Run style fix * Switch to modular * Fix config and smart_resize * Fix * Fix * Fix style * Fix * Ruff fix * Rename * Rename * Fix * Clean * Fix consistency * Add doc * Fix * Fix * Fix doc * Update generated code * remove test_initialization * fix tests * simplify * tests * Add VideoLlama3IntegrationTest * replace asserts * fix tests --------- Co-authored-by: steven-ccq <55176896+steven-ccq@users.noreply.github.com> Co-authored-by: steven-ccq <1456320989@qq.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-10-13 15:54:34 +02:00
Akilesh	3813a8e3a1	Add VideoMAE video processor (#41534 ) * Add video processor for VideoMAE * Document VideoMAE video processor * Add regression tests for VideoMAE video processor * refactor: Use direct batch key access for pixel_values_videos * test: add parity test for VideoMAEVideoProcessor vs VideoMAEImageProcessor * docs(videomae): update model docstring example to demonstrate VideoMAEVideoProcessor (TorchCodec-based decoding and sampling)	2025-10-13 15:42:27 +02:00
Julian Ste	66d8d7a077	Fixed typos and formatting (#34215 ) #hacktoberfest	2025-10-13 13:38:06 +00:00
Joao Gante	d621be8286	🚨 [v5] `generate` delegates default cache initialization to the model (#41505 )	2025-10-13 13:20:48 +01:00
regisss	d7c9fbdb64	Enable modular files from other libraries (#41372 ) Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-10-13 13:48:32 +02:00
fan-amd	41e763decd	Add AMD developer cloud support (#41126 ) * Add AMD developer cloud support * Add AMD remote svg link. * Update notebooks/README.md Co-authored-by: pagezyhf <165770107+pagezyhf@users.noreply.github.com> --------- Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com> Co-authored-by: pagezyhf <165770107+pagezyhf@users.noreply.github.com>	2025-10-13 12:17:24 +02:00
Rémi Ouazan	cf1e9834ec	Restore cuda graphs to continuous batching (#41421 ) * Type hints and small fixes * Remove unusued params * Made slice inputs the default * ruffed * Updated some var name and moved index slicing * Logging arg in example * Added some padding debug var and reformat out cg * First working CG, fixe size * Working flexible CG * CG are compatible with all implementations * Fixed CG API * Update example * Documentation * Fix padding tokens in FA * Review compliance * Better doc around weird bug * Style * Fix for sliding with CG	2025-10-13 11:57:56 +02:00
Raushan Turganbay	6c901bdc0e	[SAM] Fix typing hints (#41506 ) fix	2025-10-13 11:52:00 +02:00
Sai-Suraj-27	58f9e13313	Fixed Type-hints in function defintions (#41525 ) * Explicitly annotate default None parameters as Optional * make style. * make style. * Fixed check_copies. * fix consistency.	2025-10-13 11:48:37 +02:00
Yoni Gozlan	eb28242251	Add MLlama fast image processor (#41391 ) * Merge conflict * add fast processor * add fast processor * make style * add new convert rgb * use nested group by shape in mllama fast, add support for multiple inputs in group by shape * refactor after review --------- Co-authored-by: Vincent <phamvinh257@gmail.com>	2025-10-13 09:16:05 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	65cb8fac6d	[Qwen3VL] fix: hidden_states in place modification error (#41535 ) ``` File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 941, in forward hidden_states = self._deepstack_process( ^^^^^^^^^^^^^^^^^^^^^^^^ File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 960, in _deepstack_process hidden_states[visual_pos_masks, :] = local_this ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Output 0 of SliceBackward0 is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning the output of the custom Function. ``` Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-10-13 10:50:14 +02:00
Yih-Dar	3927ffed31	[testing] reduce runtime of `HunYuanMoEV1IntegrationTest:test_model_generation` (#41373 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-10-10 22:27:01 +02:00
Yuanyuan Chen	7164924a7e	Fix Latex typesetting in documentation (#41177 ) Fix Latex typsetting in documentation Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-10-10 08:54:27 -07:00
nicha-api	26a5368c44	Allow optuna's catch kwargs passthrough (#41496 ) * allow optuna's catch kwargs passthrough * apply ruff formatting --------- Co-authored-by: nicha <nicha.api@nectec.or.th>	2025-10-10 13:58:07 +00:00
Marc Sun	feca4f3de7	remove `tpu_num_cores` (#41383 ) * remove-tpu-num-cores * fix * let's remove it * style * Update examples/legacy/seq2seq/finetune_tpu.sh Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> --------- Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>	2025-10-10 15:53:28 +02:00
Cyril Vallez	c6042a4169	Remove outdated flags (#41512 ) remove flags	2025-10-10 14:34:47 +02:00
Benjamin Keene	dfd4121cd4	add Trainer import to .md in appropriate cell block for training.ipynb transformers_doc (#41484 ) add Trainer import to .md in appropriate cell block for docs	2025-10-10 12:04:07 +00:00
Cyril Vallez	60f6ec438a	Fix detectron2 import (#41510 ) * fix * fix * typo	2025-10-10 13:33:47 +02:00
Marc Sun	f9f8bf5a10	Revert `local_rank` deletion and some cleaning (#41504 ) * forgot those * clean * Fix * merge * fix * fix	2025-10-10 12:23:04 +02:00
Lucain	b4067472ae	Bump to hfh 1.0.0.rc5 to fix test (#41508 )	2025-10-10 12:12:08 +02:00
Marc Sun	bc529a3368	More trainer cleaning (#41489 ) clean	2025-10-10 11:55:43 +02:00
Pablo Montalvo	b92fc0c6e1	[QoL] modular conversion shows LoC saved (#41500 ) smol qol conversion	2025-10-10 11:55:23 +02:00
BakerBunker	2eae7c7452	Set `truncation` to `False` in Qwen3Omni to avoid default truncation (#41473 ) * Set `truncation` to `False` in Qwen3Omni to avoid default truncation * move `padding` and `truncation` to audio default args --------- Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>	2025-10-10 09:55:18 +00:00
eustlb	c5094a4f97	[voxtral] language detection + skipping lang:xx (#41225 ) * proc + doc update * improve doc * add lang:xx in decode * update voxtral test * nit * nit * update test value * use regex	2025-10-10 09:18:30 +00:00
Yao Matrix	f4487ec521	fix gemma3n case failure (#41426 ) * fix gemma3n case failure Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * fix style Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * Update dependency_versions_table.py * change the case argument passing way to make the case PASS, generation_config way need re-visit Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * fix style Signed-off-by: Yao, Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-10-10 09:15:27 +00:00
Cyril Vallez	e8194fe84f	Fix some tests (#41503 ) * fix * fix * doc	2025-10-10 11:05:09 +02:00
Joao Gante	9556b36b2f	[causallm tester] automate pipeline mappings + bloom tests (#41318 )	2025-10-10 10:02:00 +01:00
eustlb	5aca530b34	[Parakeet] unnecessary warning & auto mapping (#41412 ) * add parakeet to CONFIG_MAPPING_NAMES * TOKENIZER_MAPPING_NAMES update * fix auto tokenizer * update * fix	2025-10-10 11:00:15 +02:00
Sai-Suraj-27	4f323369db	Fixed tiny incorrect imports in `glm4v` (#41483 ) Fixed tiny import issue in glm4v	2025-10-10 08:57:01 +00:00
Yih-Dar	f5f3457278	Try to remove `pickle` - `BloomTokenizerFast` (#41466 ) * pickle 1 * pickle 1 * pickle 1 * pickle 1 * pickle 1 * pickle 1 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-10-10 10:52:51 +02:00
Mohamed Mekkouri	3585737746	[kernels] rm yoso kernel (#41495 ) * disable kernel mapping * rm kernel * delete files * style * typo	2025-10-10 10:50:12 +02:00
Mohamed Mekkouri	b543679d0e	[kernels] Remove RWKV kernel finally ! (#41493 ) * rm kernel * fix style	2025-10-10 10:32:05 +02:00
jiqing-feng	ac7777be16	fix bnb model loading (#41499 )	2025-10-10 08:27:29 +00:00
Lysandre Debut	17c31a98ac	Streaming should be handled at the request-level rather than at the istance level (#41444 ) * Streaming should be handled at the request-level rather than at the instance level * Add tests * Require torch GPU	2025-10-10 10:24:55 +02:00
Mohamed Mekkouri	b28902c86b	Remove DISABLE_KERNEL_MAPPING flag (#41475 ) rm disable	2025-10-10 10:19:25 +02:00
Pablo Montalvo	d0271be18f	Update philosophy (#41438 ) * update philosophy * Update docs/source/en/philosophy.md Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/philosophy.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * emphasis --------- Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-10-10 06:52:18 +00:00
Marc Sun	0419ff881d	Remove `local_rank` arg from `TrainingArguments` (#41382 )	2025-10-09 18:54:12 +02:00
Marc Sun	081391b20e	deprecate `jit_mode_eval` (#41376 )	2025-10-09 18:50:45 +02:00
Marc Sun	1ddbbdef48	[Trainer] deprecate ray scope (#41403 )	2025-10-09 18:50:00 +02:00
Anton Vlasjuk	c20849bad1	[`CI`] Fix copies on main (#41486 ) fix copies	2025-10-09 18:38:14 +02:00
Marc Sun	776eea8612	deprecate `overwrite_output_dir` (#41323 ) * dep * style * rm * wut * style	2025-10-09 18:36:19 +02:00
Marc Sun	3839d51013	`report_to` default changed to "none" + cleaning deprecated env var (#41375 ) * reporting * fix * fix	2025-10-09 18:28:48 +02:00
Yuxuan Zhang	78f79ba5af	Update GLM-4.6 doc (#41471 ) Update glm4_moe.md	2025-10-09 09:18:05 -07:00
Marc Sun	11c597b1b8	Remove deprecated args in Trainer for v5 (#41404 ) remove deprecated code	2025-10-09 18:10:14 +02:00
Marc Sun	b450d55a91	Remove `past_index` (#41384 ) * remove-tpu-num-cores * fix * rm past index * Revert "fix" This reverts commit 7608a6c059210957d3a77812e66178c8b79a9313. * Revert "remove-tpu-num-cores" This reverts commit ef08a51d71389849851518d67d8ad6c9ea8f04fc.	2025-10-09 18:06:46 +02:00
Marc Sun	1a3a5f5289	Remove SigOpt (#41479 ) * remove sigopt * style	2025-10-09 18:05:55 +02:00
Marc Sun	823fab4860	Fix bnb fsdp loading for pre-quantized checkpoint (#41415 ) * fix * fix * get_param_name * fix device name	2025-10-09 18:05:35 +02:00

1 2 3 4 5 ...

20920 Commits