transformers

mirror of https://github.com/huggingface/transformers.git synced 2025-10-20 17:13:56 +08:00

Author	SHA1	Message	Date
Joao Gante	de5cbe8b79	[deprecations] Remove generate-related deprecations up to v4.56 (#40729 ) remove generate-related deprecations up to v4.56	2025-09-09 16:32:41 +01:00
Yih-Dar	a2fffa505d	Fetch more test data with `hf_hub_download` (#40710 ) [test-all] tests Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-05 09:49:31 +00:00
Yih-Dar	5b0c01b5e2	Final test data cache - inside CI docker images (#40689 ) * run * build * build * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-04 13:12:49 +00:00
Raushan Turganbay	1f3cc935cc	Load a tiny video to make CI faster (#40684 ) * load a tiny video to make CI faster * add video in url_to_local_path	2025-09-04 14:49:00 +02:00
Yih-Dar	30a4b8707d	CircleCI docker images cleanup / update / fix (#40681 ) * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-04 10:42:18 +02:00
Yih-Dar	34595cf296	Even more test data cached (#40636 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-03 21:20:37 +00:00
Matt	fe1a9e0dba	Remove TF/Flax examples (#40654 ) * Remove TF/Flax examples * Remove check_full_copies * Trigger CI	2025-09-03 14:15:57 +01:00
Yih-Dar	e690fe61e8	Fix `too many requests` in `TestMistralCommonTokenizer` (#40623 ) * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-03 05:05:03 +02:00
Yih-Dar	91be12bdc6	Avoid `too many request` caused by `AutoModelTest::test_dynamic_saving_from_local_repo` (#40614 ) * fix * fix * fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-02 12:08:52 +02:00
Yih-Dar	4da03d7f57	Reduce more test data fetch (#40595 ) * example * fix * fix * add to fetch script --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-09-01 18:07:18 +02:00
Raushan Turganbay	0b24507379	processor tests - use dummy videos (#40537 ) * use dummy videos * failing on main, new model merged had conflicts	2025-09-01 09:04:47 +00:00
ivarflakstad	2d3b8863e8	Fix collated reports upload filename (#40556 )	2025-08-30 09:35:51 +02:00
EduardDurech	d10603f701	Add Apertus (#39381 ) * init swissai model * AutoModelForCausalLM * AutoModelForCausalLM mapping * qk norm and post ln optional * fix wrong shape of qk norm: megatron uses head_dim * automodel fixes * minor fix in forward * fix rope validation to accept llama3 scaling * `SwissAIForTokenClassification` support * Align `SwissAI` to v4.52.4 * Align `SwissAI` to v4.53.1 * Init CUDA xIELU * `SwissAI`->`Apertus` * ci fix * check_docstring ignore ApertusConfig * Licensing and placeholder tests * Placeholder doc * XIELU syntax * `_xielu_python` optimization * Fix xIELU * [tmp] `{beta,eps}` persistent=False until {beta,eps} saved in checkpoint * Modular `Apertus` * CUDA xIELU logging * ci fix * ci fix * ci fix * Update license Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * Update tests/models/apertus/test_modeling_apertus.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * `.utils.import_utils.is_torchdynamo_compiling` * `Apertus` class ordering * `past_key_value{->s}`, `make fix-copies` * ci fix * Remove unused configuration parameters * `{beta,eps}` saved in checkpoint * `{beta,eps}` Temporarily on CPU * Suggestions Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * ci fix * remove fx_compatible (deprecated) * remove `rotary_embedding_layer` As the tests are written for a config without default scaling (which is not the case in Apertus) - besides, rope scaling is tested in other models so it's all safe. * fully removing `Mask4DTestHard` class Not needed (for now) * switch to `dtype` instead of `torch_dtype` Following this: https://github.com/huggingface/transformers/pull/39782 * remove unused imports * remove `cache_implementation="static"` * +Apertus to `docs/source/en/_toctree.yml` for the doc builder --------- Co-authored-by: Alexander Hagele <alexanderhagele@gmail.com> Co-authored-by: dhia680 <garbayad@gmail.com> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> Co-authored-by: Dhia Garbaya <84809366+dhia680@users.noreply.github.com>	2025-08-28 11:55:43 +02:00
ivarflakstad	721d4aee81	Include machine type in collated reports filename (#40514 )	2025-08-28 09:28:12 +02:00
Cyril Vallez	98289c5546	[modular] Classes can now be defined and referenced in arbitrary order (without bringing unwanted dependencies) (#40507 ) * remove future class from dependency graph * convert all	2025-08-27 23:06:10 +02:00
Cyril Vallez	8b804311ba	[modular] Remove ambiguity in all calls to parent class methods + fix dependency graph (#40456 ) * fix in modular * remove leftover print * fix everything except when it's in assignment * fix assignment as well * more general * better * better * better comment * docstring * cleaner * remove base * doc	2025-08-27 14:51:28 +02:00
Cyril Vallez	a3afebbbbe	[modular] Use multi-processing + fix model import issue (#40481 ) * add mp and simplify a bit * improve * fix * fix imports * nit	2025-08-27 14:51:12 +02:00
Yih-Dar	80f4c0c6a0	CI when PR merged to `main` (#40451 ) * up * up * up * up * up * update --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-27 10:56:18 +02:00
StevenBucaille	78f32c3917	[pipeline] Add Keypoint Matching pipeline (#39970 ) * feat: keypoint-matcher pipeline * docs: added keypoint-matcher pipeline in docs * fix: added missing statements for repo consistency * docs: updated SuperGlue, LightGlue and EfficientLoFTR docs * Apply suggestions from code review Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * test: fixed run_pipeline_test * update pipeline typing and docs * update tests * update docs snippets * Fix import error * fix: pipeline init * pt framework --------- Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-08-26 15:26:57 +01:00
ivarflakstad	e68146fbe7	Fix collated reports model name entry (#40441 )	2025-08-25 20:36:01 +00:00
ivarflakstad	7637d298b3	Fix collated reports uploading (#40440 )	2025-08-25 21:49:59 +02:00
ivarflakstad	f0e87b436d	Fix collated reports model directory traversal (#40437 ) Fix model dir traversal	2025-08-25 18:01:58 +00:00
Joao Gante	1763ef2951	[docs] remove last references to `transformers` TF classes/methods (#40429 ) * halfway through tasks * complete * Update utils/check_docstrings.py	2025-08-25 16:30:59 +01:00
Joao Gante	c99ed492c7	[docs] remove flax references from `/en/model_doc` (#40311 ) * 1st commit * all models up to D * all models up to G * all models up to M * all remaining models	2025-08-21 10:52:54 +01:00
ivarflakstad	1054494dd6	Update notification service amd_daily_ci_workflows definition (#40314 )	2025-08-20 17:49:46 +02:00
Yih-Dar	5d906740d2	Update CI with nightly torch workflow file (#40306 ) * fix nightly ci * Apply suggestions from code review Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com> --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>	2025-08-20 16:59:00 +02:00
Duc-Viet Hoang	ca543f822f	Add support for Florence-2 (#38188 ) * init * add modular * fixup * update configuration * add processing file * update auto files * update * update modular * green setup_and_quality ci * it works * fix some tests * commit florence2 * update test * make test cases done - 16 left * style * fix few test cases * fix some tests * fix init test * update florence2 vision style * hope is green * fix init test * fix init * update modular * refactor vision module * fix: channel attention use dynamic scale * update modular * update * update attention mask * update * fix naming * Update src/transformers/models/florence2/processing_florence2.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * spatial block works * more beautiful * more more beautiful * merge main * merge main and fixup * fix typing hint * update modeling * fix eager matches sdpa * fix style * fix compile test - all green * remove florence2 language * remove Florence2LanguageModel things * fix style * update florence2 model * override prepare encoder_decoder for generation * add weight conversion script * rewrite channel attention to use sdpa * eleminate 1 tranpose op * support fa2 * fix quality check * chore: reformat `test_modeling_florence2.py` * some refactor for processor * some refactor for processor * update naming convention and remove BC * make it pass the test * fix: correct Embedding Cosine * update comments and docstring * support input_embeds * support input embeds ideally * fix style * fix style * fix style again :D * add test prcoessor * refactor processor and add test for processor * reformat test processor * make fixup * fix schema check * remove image_token * ensure image token in tokenizer and fix integration tests * fix processor test * add more integration tests for large model and rename test_processor to test_processing * test_assisted_decoding_sample should pass * update doc and make model work with image text to text pipeline * docs: add sdpa bagde * resolve cyril's comments * fix import torch error * add helper get_placeholder_mask * inherit from llava * florence2 may not _supports_attention_backend because of bart ... * move florence2 model card to multimodal * let base model always return_dict * fix style * tiny update doc * set _checkpoint_conversion_mapping = {} * fix code quality * support flex and compile graph and move external func to internal func * remove condition because it always true * remove window funcs * move post processor config out * fix ci * new intro to trigger test * remove `kernel_size` argument --------- Co-authored-by: ducviet00-h2 <viet.d.hoang@h2corporation.jp> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2025-08-20 14:28:06 +02:00
Joao Gante	da9452a592	[docs] delete more TF/Flax docs (#40289 ) * delete some TF docs * update documentation checks to ignore tf/flax * a few more removals * nit * Update utils/check_repo.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2025-08-20 10:44:14 +01:00
NielsRogge	1d46091737	Add MetaCLIP 2 (#39826 ) * First draft * Make fixup * Use eos_token_id * Improve tests * Update clip * Make fixup * Fix processor tests * Add conversion script * Update docs * Update tokenization_auto * Make fixup * Use check_model_inputs * Rename to lowercase * Undo CLIP changes * Address comment * Convert all checkpoints * Update auto files * Rename checkpoints	2025-08-20 09:25:43 +02:00
tic-top	5b3b7ea472	Add Kosmos-2.5 (#31711 ) Add Microsoft Kosmos-2.5 --------- Co-authored-by: kirp@umich.edu <tic-top> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2025-08-19 11:56:03 +02:00
Eon Kim	47938f8f8d	Add Ovis2 model and processor implementation (#37088 ) * Add Ovis2 model and processor implementation * Apply style fixes * Add unit tests for Ovis2 image processing and processor * Refactor image processing functions for clarity and efficiency * Add Ovis2 ImageProcessorFast * Refactor Ovis2 code * Refactor Ovis2 model components and update processor functionality * Fix repo consistency issues for Ovis2: docstring, config cleanup * Update Ovis2 model integration tests * Update Ovis2 configuration and processing classes for improved documentation * Remove duplicate entry for 'ovis2' in VLM_CLASS_NAMES * Fix conflict * Fix import order * Update image processor class names * Update Ovis2 model structure * Refactor Ovis2 configuration * Fix typos * Refactor Ovis2 model classes and remove unused code * Fix typos * Refactor Ovis2 model initialization * Fiix typos * Remove Ovis2 model mapping from MODEL_MAPPING_NAMES in modeling_auto.py * Add license and update type hints * Refactor token function and update docstring handling * Add license * Add Ovis2 model support and update documentation * Refactor Ovis2 model structure and enhance multimodal capabilities * Update Ovis2 weight mapping for consistency and clarity in key patterns * Remove unused 'grids' parameter from Ovis2 model and Update processing logic to handle image grids more efficiently. * Refactor Ovis2 model test structure to include Ovis2Model * Add optional disable_grouping param to Ovis2ImageProcessorFast * Refactor type hints in Ovis2 modules * Add licensing information in Ovis2 modules and tests * Refactor Ovis2 model by removing unused methods * Refactor Ovis2 model tests by renaming test classes and removing skipped tests * Refactor Ovis2 model output classes * Refactor Ovis2 weight conversion and Update model embedding classes * Refactor Ovis2 model imports and remove unused functions * Enhance vision configuration extraction in Ovis2 weight conversion * Refactor Ovis2 model's forward method to remove interpolation option * Update Ovis2 model documentation * Refactor Ovis2 model input handling and tokenizer configuration * Update return type hints in Ovis2 model * Remove commented-out code * fix config for tests and remove key mappings * Update tokenizer configuration to use add_special_tokens method * skip torchscript * Fix image placeholder generation in Ovis2Processor * Refactor Ovis2 model to rename visual_table to visual_embeddings_table * Enhance Ovis2 model by adding vision_feature_select_strategy parameter * Refactor Ovis2 model weights conversion and architecture * Refactor Ovis2 model by removing vision_feature_select_strategy parameter * Update Ovis2 model examples * Refactor Ovis2 model * Update Ovis2 model * Update Ovis2 model configuration * Refactor Ovis2 model test setup * Refactor flash attention support * Refactor * Fix typo * Refactor * Refactor model classes * Update expected output in Ovis2 * Refactor docstrings * Fix * Fix * Fix * Update input in tests * Fix * Fix get_decoder method * Refactor * Refactor Ovis2 * Fix * Fix * Fix test * Add get_placeholder_mask * Refactor Ovis2 model tests * Fix * Refactor * Fix * Fix * Fix Ovis2 test --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>	2025-08-18 16:05:49 +02:00
Manal ML	3f4c85fef0	Add X-Codec model (#38248 ) * add working x-codec * nit * fix styling + copies * fix docstring * fix docstring and config attribute * Update args + config * update convertion script * update docs + cleanup * Ruff fix * fix doctrings	2025-08-15 16:24:12 +02:00
MAHIR DAIYAN	b02f2d8b6a	Add dates to the model docs (#39320 ) * added dates to the models with a single hf papers link * added the dates for models with multiple papers * half of no_papers models done * rest of no_papers models also done, only the exceptions left * added copyright disclaimer to sam_hw, cohere, cohere2 + dates * some more fixes, hf links + typo * some new models + a rough script * the script looks robust, changed all paper links to hf * minor change to handle technical reports along with blogs * ran make fixup to remove the white space * refactor	2025-08-14 10:08:46 -07:00
Pavel Iakubovskii	6f259bc83e	Fix docs typo (#40167 ) * DINOv3 model * working version * linter revert * linter revert * linter revert * fix init * remove flex and add convert to hf script * DINOv3 convnext * working version of convnext * adding to auto * Dinov3 -> DINOv3 * PR feedback * complete convert checkpoint * fix assertion * bf16 -> fp32 * add fast image processor * fixup * change conversion script * Use Pixtral attention * minor renaming * simplify intermediates capturing * refactor DINOv3ViTPatchEmbeddings * Refactor DINOv3ViTEmbeddings * [WIP] rope: remove unused params * [WIP] rope: rename period -> inv_freq for consistency * [WIP] rope: move augs * change inv_freq init (not persistent anymore) * [WIP] rope: move coords to init * rope - done! * use default LayerScale * conversion: truncate expected outputs * remove commented code * Refactor MLP layers * nit * clean up config params * nit docs * simplify embeddings * simplify compile compat lru_cache * fixup * dynamic patch coords * move augmentation * Fix docs * fixup and type hints * fix output capturing * fix tests * fixup * fix auto mappings * Add draft docs * fix dtype cast issue * add push to hub * add image processor tests * fixup * add modular * update modular * convert and test convnext * update conversion script * update prefix * Update LayerNorm * refactor DINOv3ConvNextLayer * rename * refactor convnext model * fix doc check * fix docs * fix convnext config * tmp fix for check docstring * remove unused arg * fix tests * (nit) change init * standardize gated MLP * clear namings and sat493m * fix tensors on different devices * revert linter * pr * pr feedbak ruff format * missing headers * fix code snippet and collection link in docs * DINOv3 description * fix checkpoints in tests * not doc fixes in configs * output_hidden_states * x -> features * remove sequential --------- Co-authored-by: Cijo Jose <cijose@meta.com>	2025-08-14 17:29:53 +02:00
Sangbum Daniel Choi	68a13cd4a6	Add Segment Anything 2 (SAM2) (#32317 ) * initial comment * test * initial conversion for outline * intermediate commit for configuration * chore:init files for sam2 * adding arbitary undefined config * check * add vision * make style * init sam2 base model * Fix imports * Linting * chore:sam to sam2 classes * Linting * Add sam2 to models.__init__ * chore:match prompt encoder with sam2 code * chore:prepare kwargs for mask decoder * Add image/video predictors * Add CUDA kernel * Add output classes * linting * Add logging info * tmp commit * docs for sam2 * enable image processing * check difference of original SAM2 - difference is the order of ToTensor() - please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize * enable promptencoder of sam2 * fix promprencoder * Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference) * Confirmed that ImageEncoder is exactly same (Be aware the linting of init) * Confirmed that MaskDecoder is exactly same (TO DO: lint variable name) * SamModel is now available (Need more chore for name) * make fix-copies * make style * make CI happy * Refactor VisionEncoder and PostioinEmbedding * TO DO : fix the image_embeddings and sparse_embeddings part * pure image inference done * reusable features fix and make style * styling * refactor memoryattention * tmp * tmp * refactor memoryencoder TO DO : convert and inference the video pipeline * TO DO : fix the image_encoder shape * conversion finish TO DO: need to check video inference * make style * remove video model * lint * change * python utils/check_docstringspy --check_all * python utils/check_config_attributes.py * remove copies for sam2promptencoder due to configuration * change __init__.py * remove tensorflow version * fix that to not use direct comparison * make style * add missing import * fix image_embedding_size * refactor Sam2 Attention * add fully working video inference (refactoring todo) * clarify _prepare_memory_conditioned_features * simplify modeling code, remove unused paths * use one model * use auto_docstring * refactor rope embeddings * nit * not using multimask when several points given * add all sam2.1 * add video tmp * add Sam2VideoSessionState + fast image proc + video proc * remove init_states from model * fix batch inference * add image integration tests * uniformize modeling code with other sam models and use modular * pass vision tests an most model tests * All tests passing * add offloading inference state and video to cpu * fix inference from image embedding and existing mask * fix multi_boxes mask inference * Fix batch images + batch boxes inference * improve processing for image inference * add support for mask generation pipeline * add support for get_connected_components post processing in mask generation * add fast image processor sam, image processor tests and use modular for sam2 image processor * fix mistake in sam after #39120 * fix init weights * refactor convert * add integration tests for video + other improvements * add needed missing docstrings * Improve docstrings and * improve inference speed by avoiding cuda sync * add test * skip test for vision_model * minor fix for vision_model * fix vision_model by adding sam2model and change the torch dependencies * remove patch_size * remove image_embedding_size * fix patch_size * fix test * make style * Separate hieradet and vision encoder in sam2 * fixup * review changes part 1 * remove MemoryEncoderConfig and MemoryAttentionConfig * pass q_stride instead of q_pool module * add inference on streamed videos * explicitely process streamed frames * nit * Improve docstrings in Sam2Model * update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel * improve video inference api * change inference_state to inference_session * use modular for Sam2Model * fix convert sam2 hf * modular * Update src/transformers/models/sam2/video_processing_sam2.py Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com> * fix minor config * fix attention loading error * update modeling tests to use hub checkpoints * Use CI A10 runner for integration tests values + higher tolerance for video integration tests * PR review part 1 * fix doc * nit improvements * enforce one input format for points, labels and boxes * nit * last few nits from PR review * fix style * fix the input type * fix docs * add sam2 model as conversion script * improve sam2 doc * nit fixes + optimization * split sam2 and sam2_video in two models * PR review part 1 * fix None for default slow processor of sam2 * remove unecessary code path in sam2_video * refactor/simplify RoPE * replace embedding module list with embedding matrix * fix tests * remove kernel * nit * use lru_cache for sine_pos_embeddings * reorder sam2_video methods * simplify sam2_video * PR review part 1 * simplify sam2 video a lot * more simplification * update integration tests with updated conftest * more explicit config for hieradet * do post_processing outside of sam2 video model * Improve Sam2VideoVisionRotaryEmbedding * fix tests * update docs and fix mask2former/oneformer * avoid unnecessary reshapes/permute * fix device concatenating points * small dtype fix * PR review * nit * fix style and finish up doc * fix style * fix docstrings * fix modular --------- Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com> Co-authored-by: Haitham Khedr <haithamkhedr@meta.com> Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>	2025-08-13 14:18:05 -04:00
Arthur	bec6926696	gpt oss is important (#40139 )	2025-08-13 19:49:54 +02:00
ivarflakstad	ebceef343a	Collated reports (#40080 ) * Add initial collated reports script and job definition * provide commit hash for this run. Also use hash in generated artifact name. Json formatting * tidy * Add option to upload collated reports to hf hub * Add glob pattern for test report folders * Fix glob * Use machine_type as path filter instead of glob. Include machine_type in collated report	2025-08-13 14:48:15 +02:00
ivarflakstad	4668ef1459	Update notification service MI325 (#40078 ) add mi325 to amd_daily_ci_workflows	2025-08-12 10:22:52 +02:00
Yih-Dar	43001fd3c6	Fix `time_spent` in `notification_service.py`. (#40081 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-11 18:30:58 +02:00
Yih-Dar	f4d57f2f0c	Revert "fix `notification_service.py` about `time_spent`" (#40044 ) Revert "fix `notification_service.py` about `time_spent` (#40037)" This reverts commit d2ba153b29feb9cc0e9818c1ce63a07679b47250.	2025-08-08 22:32:24 +02:00
Yuxuan Zhang	7b20915f4e	GLM-4.5V Model Support (#39805 ) * init * update * uupdate * ruff * t patch is 2 defalut not 1 * draft * back * back1 * update * config update * update using glm-41 format * add self.rope_scaling = config.rope_scaling * update config * update * remove the processor * update * fix tests * update * for test * update * update 2126 * self.rope_scaling is missing in GLM4MOE lets add it * update * update * Update modular_glm4v_moe.py * change config * update apply_multimodal_rotary_pos_emb * format * update * Delete 3-rollout_qas_thinking_answers.py * use right name * update with place holder * update * use right rotary * Update image_processing_glm4v_fast.py * rope_config_validation needs to rewrite the entire config file in modular * update * changed name * update * Update modeling_glm4v_moe.py * _init_weights shoud be add in Glm4vMoePreTrainedModel * remove use_qk_norm * Update modular_glm4v_moe.py * remove use_qk_norm as it is not use * fix style * deprecations are not needed on new models * fix merge issues --------- Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Arthur <arthur.zucker@gmail.com>	2025-08-08 17:39:52 +02:00
Yih-Dar	d2ba153b29	fix `notification_service.py` about `time_spent` (#40037 ) temp Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-08 17:11:16 +02:00
Yoni Gozlan	513f76853b	Modular fix: remove the model name in `find_file_type` (#39897 ) * remove the model name in the class name * add comment	2025-08-06 23:31:07 +00:00
Manuel de Prada Corral	cf243a1bf8	Fix `fix_and_overwrite` mode of `utils/check_docstring.py` (#39369 ) * bug in fix mode of check_docstring	2025-08-06 19:37:25 +02:00
Yih-Dar	369c99d0ce	Avoid `utils/check_bad_commit.py` failing due to rate limit (requesting `api.github.com`) (#39918 ) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2025-08-05 21:52:20 +02:00
Joao Gante	b771e476a8	[CI] post-`GptOss` fixes for green CI (#39929 )	2025-08-05 20:04:59 +02:00
Arthur	7c38d8fc23	Add GPT OSS model from OpenAI (#39923 ) * fix * nice * where i am at * Bro this works * Update src/transformers/integrations/tensor_parallel.py * cleanups * yups that was breaking * Update src/transformers/models/openai_moe/modeling_openai_moe.py * gather on experts and not mlp * add changes for latest convert branch * adds options to get output_router_logits from config * bring chat temlate + special tokens back into the script. * initial commmit * update * working with shards * add model.safetensors.index.json * fix * fix * mxfp4 flag * rm print * Fix PAD/EOS/BOS (#18) * fix pad/eos/bos * base model maybe one day * add some doc * special tokens based on harmony. * add in tokenizer config as well. * prepare for rebase with main * Fix for initialize_tensor_parallelism now returning 4-tuple ``` [rank0]: File "/fsx/edward/work/openai-tsm-examples/examples/generate.py", line 17, in <module> [rank0]: model = AutoModelForCausalLM.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/models/auto/auto_factory.py", line 600, in from_pretrained [rank0]: return model_class.from_pretrained( [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 316, in _wrapper [rank0]: return func(args, kwargs) [rank0]: ^^^^^^^^^^^^^^^^^^^^^ [rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 4748, in from_pretrained [rank0]: tp_plan, device_map, device_mesh = initialize_tensor_parallelism(tp_plan, tp_size=None) [rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank0]: ValueError: too many values to unpack (expected 3) ``` mxfp4 * mxfp4 draft * fix * fix import * draft * draft impl * finally working ! * simplify * add import * working version * consider blocks and scales * device mesh fix * initial commit * add working dequant + quant logic * update * non nan, gibberish output * working EP + quantization finally ! * start cleaning * remove reversing process * style * some cleaning * initial commmit * more cleaning * more cleaning * simplify * more cleaning * rm duplicated function * changing tp_plan * update tp plan check * add loading attribute * dequantizing logic * use subfunctions * import cleaning * update_param_name * adds clamped swiglu * add clamping to training path * simplify dequant logic * update * Bad merge * more simplifications & tests * fix ! * fix registering custom attention * fix order * fixes * some test nits * nits * nit * fix * Clamp sink logits * Clean * Soft-max trick * Clean up * p * fix deepspeed * update both modeling and modular for cleanup * contiguous * update tests * fix top_k router call * revert renaming * test nits * small fixes for EP * fix path for our local tests * update as I should not have broken that! * fix the loss of mixtral * revert part of the changes related to router_scores, kernel probably no ready for that! * deleting a small nit * update arch * fix post processing * update * running version but not expected output * moving to cuda * initial commit * revert * erroring when loading on cpu * updates * del blocks, scales * fix * style * rm comm * comment * add comment * style * remove duplicated lines * Fix minor issue with weight_map conversion script * fix sampling params * rename to final name * upate pre-final version of template * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py * fix batched inference * serve fixes * swizzle ! * update final chat template by Matt. * fix responses; pin oai * sinplify * Thanks Matt for his tireless efforts! Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * fix * Use ROCm kernels from HUB * Make kernel modes explicit * update final chat template by Matt. x2 * Thanks Matt for his tireless efforts! Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> * Fix installation * Update setup.py Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com> * allow no content * fix: update message handling in write_tokenizer function * Fix template logic for user message role * last nits for CB and flash_paged! * there was one bad merge * fix CB (hardcode for now, its just using kv groups instead) * fix * better fix for device_map * minor device fix * Fix flash paged * updates * Revert "remove dtensors, not explicit (#39840)" This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3. * update * Revert "remove dtensors, not explicit (#39840)" This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3. * fix merge * fix * Fix line break when custom model indentity * nits testing * to locals first and pass sliding window to flash paged * register modes for MegaBlocksMoeMlp * add integration test in fixtures -> now update the tests to use it! * update integration tests * initial fix * style and update tests * fix * chore(gpt oss): remove mlp_bias from configuration It was just a leftover. * stats * Integration tests * whoops * Shouldn't move model * Ensure assistant messages without thinking always go to "final" channel * More checks to ensure expected format * Add pad_token_id to model configuration in write_model function (#51) * Add oai fix fast tests (#59) * Fix some fast tests * Force some updates * Remove unnecessary fixes * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py * reasoning -> Reasoning * Add additional integration tests * fixup * Slight fixes * align chat template with harmony * simplify * Add comment * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * torch testing assert close * Revert fixup * skip 2 test remove todo * merge * padding side should be left for integration tests * fix modular wrt to changes made to modeling * style * isort * fix opies for the loss * mmmm --------- Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: edbeeching <edbeeching@gmail.com> Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com> Co-authored-by: MekkCyber <mekk.cyber@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com> Co-authored-by: Zhuohan Li <zhuohan@openai.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: joao@huggingface.co <joao@ip-10-53-88-32.ec2.internal> Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com> Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Akos Hadnagy <akos@ahadnagy.com> Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com> Co-authored-by: Alvaro Moran <alvaro.moran@huggingface.co> Co-authored-by: Lysandre <hi@lysand.re> Co-authored-by: Matt <rocketknight1@gmail.com>	2025-08-05 18:02:18 +02:00
Cyril Vallez	380b2a0317	Rework add-new-model-like with modular and make test filenames coherent (#39612 ) * remove tf/flax * fix * style * Update add_new_model_like.py * work in progress * continue * more cleanup * simplify and first final version * fixes -> it works * add linter checks * Update add_new_model_like.py * fix * add modular conversion at the end * Update add_new_model_like.py * add video processor * Update add_new_model_like.py * Update add_new_model_like.py * Update add_new_model_like.py * fix * Update image_processing_auto.py * Update image_processing_auto.py * fix post rebase * start test filenames replacement * rename all test_processor -> test_processing * fix copied from * add docstrings * Update add_new_model_like.py * fix regex * improve wording * Update add_new_model_like.py * Update add_new_model_like.py * Update add_new_model_like.py * start adding test * fix * fix * proper first test * tests * fix * fix * fix * fix * modular can be used from anywhere * protect import * fix * Update add_new_model_like.py * fix	2025-08-04 14:41:09 +02:00
rziga	3951d4ad5d	Add MM Grounding DINO (#37925 ) * first commit Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface. TODO: Some tests are failing so that needs to be fixed. * fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model * cleaned up a hack in the conversion script * Fixed the expected values in integration tests Cross att masking and cpu-gpu consistency tests are still failing however. * changes for make style and quality * add documentation * clean up contrastive embedding * add mm grounding dino to loss mapping * add model link to config docstring * hack fix for mm grounding dino consistency tests * add special cases for unused config attr check * add all models and update docs * update model doc to the new style * Use super_kwargs for modular config * Move init to the _init_weights function * Add copied from for tests * fixup * update typehints * Fix-copies for tests * fix-copies * Fix init test * fix snippets in docs * fix consistency * fix consistency * update conversion script * fix nits in readme and remove old comments from conversion script * add license * remove unused config args * remove unnecessary if/else in model init * fix quality * Update references * fix test * fixup --------- Co-authored-by: qubvel <qubvel@gmail.com>	2025-08-01 15:43:23 +01:00
Yuanyuan Chen	1e0665a191	Simplify conditional code (#39781 ) * Use != Signed-off-by: cyy <cyyever@outlook.com> * Use get Signed-off-by: cyy <cyyever@outlook.com> * Format * Simplify bool operations Signed-off-by: cyy <cyyever@outlook.com> --------- Signed-off-by: cyy <cyyever@outlook.com>	2025-07-30 12:32:10 +00:00

1 2 3 4 5 ...

1202 Commits