20920 Commits

Author SHA1 Message Date
b3e3c3dc93 [Qwen3VL] fix device mismatch error for FSDP2 training (#41536)
For FSDP2, parameters might be on a meta device, and the weight.device attribute may
not accurately reflect where the actual computation will happen during forward passes.

```log
  File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 776, in forward
    pos_embeds = self.fast_pos_embed_interpolate(grid_thw)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 745, in fast_pos_embed_interpolate
    pos_embeds = self.pos_embed(idx_tensor) * weight_tensor[:, :, None]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1879, in _call_impl
    return inner()
           ^^^^^^^
  File "torch/nn/modules/module.py", line 1827, in inner
    result = forward_call(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/sparse.py", line 192, in forward
    return F.embedding(
           ^^^^^^^^^^^^
  File "torch/nn/functional.py", line 2546, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__index_select)
```
https://github.com/volcengine/verl/pull/3686#issuecomment-3380981817

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-10-14 10:28:25 +00:00
b84c0b31c6 Remove references to AutoModelForVision2Seq (#41513)
* Since Vision2Seq is deprecated, remove it from pipelines and docstrings

* Catch some more references
2025-10-13 17:00:07 +01:00
1ee3b288a6 [from_pretrained] Small refactor from_pretrained: move around unrelated stuff (#41445)
* drafts

* up

* simplify modeling utils

* more simplifications

* type kwargs

* up

* move more accelerate related stuff

* safeguarding?

* nits

* remove func when func is NOPE

* more

* nits

* styling

* yups

* up

* ups

* revert

* protect trainer utils iport

* fix doc

* Update src/transformers/integrations/peft.py

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* review

* update

* ?

* fixx

* update

* super small update

* ups

* style

* this is stupid

* 🤦 well this was the issue

* small nit

* fix

* nit

* damn the missing return

* one last stupid fix

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-10-13 16:33:32 +02:00
cad74496ca [model] Add VideoLLaMA3 implementation (#40499)
* Add VideoLLaMA3 implementation

* Run style fix

* Switch to modular

* Fix config and smart_resize

* Fix

* Fix

* Fix style

* Fix

* Ruff fix

* Rename

* Rename

* Fix

* Clean

* Fix consistency

* Add doc

* Fix

* Fix

* Fix doc

* Update generated code

* remove test_initialization

* fix tests

* simplify

* tests

* Add VideoLlama3IntegrationTest

* replace asserts

* fix tests

---------

Co-authored-by: steven-ccq <55176896+steven-ccq@users.noreply.github.com>
Co-authored-by: steven-ccq <1456320989@qq.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-13 15:54:34 +02:00
3813a8e3a1 Add VideoMAE video processor (#41534)
* Add video processor for VideoMAE

* Document VideoMAE video processor

* Add regression tests for VideoMAE video processor

* refactor: Use direct batch key access for pixel_values_videos

* test: add parity test for VideoMAEVideoProcessor vs VideoMAEImageProcessor

* docs(videomae): update model docstring example to demonstrate VideoMAEVideoProcessor (TorchCodec-based decoding and sampling)
2025-10-13 15:42:27 +02:00
66d8d7a077 Fixed typos and formatting (#34215)
#hacktoberfest
2025-10-13 13:38:06 +00:00
d621be8286 🚨 [v5] generate delegates default cache initialization to the model (#41505) 2025-10-13 13:20:48 +01:00
d7c9fbdb64 Enable modular files from other libraries (#41372)
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-13 13:48:32 +02:00
41e763decd Add AMD developer cloud support (#41126)
* Add AMD developer cloud support

* Add AMD remote svg link.

* Update notebooks/README.md

Co-authored-by: pagezyhf <165770107+pagezyhf@users.noreply.github.com>

---------

Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Co-authored-by: pagezyhf <165770107+pagezyhf@users.noreply.github.com>
2025-10-13 12:17:24 +02:00
cf1e9834ec Restore cuda graphs to continuous batching (#41421)
* Type hints and small fixes

* Remove unusued params

* Made slice inputs the default

* ruffed

* Updated some var name and moved index slicing

* Logging arg in example

* Added some padding debug var and reformat out cg

* First working CG, fixe size

* Working flexible CG

* CG are compatible with all implementations

* Fixed CG API

* Update example

* Documentation

* Fix padding tokens in FA

* Review compliance

* Better doc around weird bug

* Style

* Fix for sliding with CG
2025-10-13 11:57:56 +02:00
6c901bdc0e [SAM] Fix typing hints (#41506)
fix
2025-10-13 11:52:00 +02:00
58f9e13313 Fixed Type-hints in function defintions (#41525)
* Explicitly annotate default None parameters as Optional

* make style.

* make style.

* Fixed check_copies.

* fix consistency.
2025-10-13 11:48:37 +02:00
eb28242251 Add MLlama fast image processor (#41391)
* Merge conflict

* add fast processor

* add fast processor

* make style

* add new convert rgb

* use nested group by shape in mllama fast, add support for multiple inputs in group by shape

* refactor after review

---------

Co-authored-by: Vincent <phamvinh257@gmail.com>
2025-10-13 09:16:05 +00:00
65cb8fac6d [Qwen3VL] fix: hidden_states in place modification error (#41535)
```
  File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 941, in forward
    hidden_states = self._deepstack_process(
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 960, in _deepstack_process
    hidden_states[visual_pos_masks, :] = local_this
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Output 0 of SliceBackward0 is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning the output of the custom Function.
```

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-10-13 10:50:14 +02:00
3927ffed31 [testing] reduce runtime of HunYuanMoEV1IntegrationTest:test_model_generation (#41373)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-10 22:27:01 +02:00
7164924a7e Fix Latex typesetting in documentation (#41177)
Fix Latex typsetting in documentation

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-10 08:54:27 -07:00
26a5368c44 Allow optuna's catch kwargs passthrough (#41496)
* allow optuna's catch kwargs passthrough

* apply ruff formatting

---------

Co-authored-by: nicha <nicha.api@nectec.or.th>
2025-10-10 13:58:07 +00:00
feca4f3de7 remove tpu_num_cores (#41383)
* remove-tpu-num-cores

* fix

* let's remove it

* style

* Update examples/legacy/seq2seq/finetune_tpu.sh

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-10 15:53:28 +02:00
c6042a4169 Remove outdated flags (#41512)
remove flags
2025-10-10 14:34:47 +02:00
dfd4121cd4 add Trainer import to .md in appropriate cell block for training.ipynb transformers_doc (#41484)
add Trainer import to .md in appropriate cell block for docs
2025-10-10 12:04:07 +00:00
60f6ec438a Fix detectron2 import (#41510)
* fix

* fix

* typo
2025-10-10 13:33:47 +02:00
f9f8bf5a10 Revert local_rank deletion and some cleaning (#41504)
* forgot those

* clean

* Fix

* merge

* fix

* fix
2025-10-10 12:23:04 +02:00
b4067472ae Bump to hfh 1.0.0.rc5 to fix test (#41508) 2025-10-10 12:12:08 +02:00
bc529a3368 More trainer cleaning (#41489)
clean
2025-10-10 11:55:43 +02:00
b92fc0c6e1 [QoL] modular conversion shows LoC saved (#41500)
smol qol conversion
2025-10-10 11:55:23 +02:00
2eae7c7452 Set truncation to False in Qwen3Omni to avoid default truncation (#41473)
* Set `truncation` to `False` in Qwen3Omni to avoid default truncation

* move `padding` and `truncation` to audio default args

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
2025-10-10 09:55:18 +00:00
c5094a4f97 [voxtral] language detection + skipping lang:xx (#41225)
* proc + doc update

* improve doc

* add lang:xx in decode

* update voxtral test

* nit

* nit

* update test value

* use regex
2025-10-10 09:18:30 +00:00
f4487ec521 fix gemma3n case failure (#41426)
* fix gemma3n case failure

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update dependency_versions_table.py

* change the case argument passing way to make the case PASS,
generation_config way need re-visit

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-10 09:15:27 +00:00
e8194fe84f Fix some tests (#41503)
* fix

* fix

* doc
2025-10-10 11:05:09 +02:00
9556b36b2f [causallm tester] automate pipeline mappings + bloom tests (#41318) 2025-10-10 10:02:00 +01:00
5aca530b34 [Parakeet] unnecessary warning & auto mapping (#41412)
* add parakeet to CONFIG_MAPPING_NAMES

* TOKENIZER_MAPPING_NAMES update

* fix auto tokenizer

* update

* fix
2025-10-10 11:00:15 +02:00
4f323369db Fixed tiny incorrect imports in glm4v (#41483)
Fixed tiny import issue in glm4v
2025-10-10 08:57:01 +00:00
f5f3457278 Try to remove pickle - BloomTokenizerFast (#41466)
* pickle 1

* pickle 1

* pickle 1

* pickle 1

* pickle 1

* pickle 1

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-10 10:52:51 +02:00
3585737746 [kernels] rm yoso kernel (#41495)
* disable kernel mapping

* rm kernel

* delete files

* style

* typo
2025-10-10 10:50:12 +02:00
b543679d0e [kernels] Remove RWKV kernel finally ! (#41493)
* rm kernel

* fix style
2025-10-10 10:32:05 +02:00
ac7777be16 fix bnb model loading (#41499) 2025-10-10 08:27:29 +00:00
17c31a98ac Streaming should be handled at the request-level rather than at the istance level (#41444)
* Streaming should be handled at the request-level rather than at the instance level

* Add tests

* Require torch GPU
2025-10-10 10:24:55 +02:00
b28902c86b Remove DISABLE_KERNEL_MAPPING flag (#41475)
rm disable
2025-10-10 10:19:25 +02:00
d0271be18f Update philosophy (#41438)
* update philosophy

* Update docs/source/en/philosophy.md

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/philosophy.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* emphasis

---------

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-10-10 06:52:18 +00:00
0419ff881d Remove local_rank arg from TrainingArguments (#41382) 2025-10-09 18:54:12 +02:00
081391b20e deprecate jit_mode_eval (#41376) 2025-10-09 18:50:45 +02:00
1ddbbdef48 [Trainer] deprecate ray scope (#41403) 2025-10-09 18:50:00 +02:00
c20849bad1 [CI] Fix copies on main (#41486)
fix copies
2025-10-09 18:38:14 +02:00
776eea8612 deprecate overwrite_output_dir (#41323)
* dep

* style

* rm

* wut

* style
2025-10-09 18:36:19 +02:00
3839d51013 report_to default changed to "none" + cleaning deprecated env var (#41375)
* reporting

* fix

* fix
2025-10-09 18:28:48 +02:00
78f79ba5af Update GLM-4.6 doc (#41471)
Update glm4_moe.md
2025-10-09 09:18:05 -07:00
11c597b1b8 Remove deprecated args in Trainer for v5 (#41404)
remove deprecated code
2025-10-09 18:10:14 +02:00
b450d55a91 Remove past_index (#41384)
* remove-tpu-num-cores

* fix

* rm past index

* Revert "fix"

This reverts commit 7608a6c059210957d3a77812e66178c8b79a9313.

* Revert "remove-tpu-num-cores"

This reverts commit ef08a51d71389849851518d67d8ad6c9ea8f04fc.
2025-10-09 18:06:46 +02:00
1a3a5f5289 Remove SigOpt (#41479)
* remove sigopt

* style
2025-10-09 18:05:55 +02:00
823fab4860 Fix bnb fsdp loading for pre-quantized checkpoint (#41415)
* fix

* fix

* get_param_name

* fix device name
2025-10-09 18:05:35 +02:00