Compare commits

..

61 Commits

Author SHA1 Message Date
04de24dad3 Allow tokenizer.parse_response() to accept IDs/arrays directly 2025-10-24 18:59:51 +01:00
13cff621f5 Allow tokenizer.parse_response() to accept IDs/arrays directly 2025-10-24 18:59:51 +01:00
8bde822a86 Fix TypeError: find_adapter_config_file() got an unexpected keyword argument '_adapter_model_path' (#41604)
* Pass original dict instead of copy to maybe_load_adapters

* Revert "Pass original dict instead of copy to maybe_load_adapters"

This reverts commit 26fe1b3f35419fdc14932dfbda6bb39e4bdb9b34.

* Return cleaned version of adapter_kwargs

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-10-24 19:52:14 +02:00
9bb51b311f Share embedding modules in BART, not only weights (#41821)
* Share embedding modules in BART, not only weights

Embedding modules are now shared between encoder, decoder
and shared - it is the same module, like in the T5 implementation.

This has the benefit that it does not matter which module is returned
in `get_input_embeddings`, the caller of the latter can be sure that
modifications done to that (e.g., hooks) apply to the embeddings.

Background: While revamping the gradient checkpointing tests in PEFT via
peft#2860 we found that the gradient enable step
(`modeling_utils.enable_input_require_grads`) does not work for BART.
This leads to gradient checkpointing with `use_reentrant=True` to
fail as it will not detect any gradients. The reason for this is that
the returned value by `get_input_embeddings` (`self.shared`) is not
the module that is called in the encoder, therefore any hooks added
to `self.shared` are not run - in this case the hook set by
`enable_input_require_grads`.

Since the background is a missing hook I've added a test that tests
directly the ability to define hooks and their ability to being called.

* Add explanatory comment

* Don't initialize embeddings when not neccessary

* make fix-copies

---------

Co-authored-by: nemo <git@ningu.net>
2025-10-24 17:22:02 +02:00
090a8946c6 Fix const parsing for dict inputs in chat schemas (#41824)
* Fix const parsing for dict inputs

* make fixup
2025-10-24 15:14:06 +01:00
4faf675232 Fix Qwen2Audio flash attention mask format for generation (#41843)
* Fix Qwen2Audio flash attention mask format for generation

* use create_bidirectional_mask instead

* fix

* fix

* empty

* fix
2025-10-24 14:45:48 +02:00
bb6028cb79 Fix MXFP4 quantizer to support variable num_local_experts and hidden_size (#41795)
Fix MXFP4 quantizer to support variable num_local_experts
2025-10-24 14:18:52 +02:00
7935b869dc Remove redundant code from Qwen3VLProcessor (#41836)
* Remove redundant code from Qwen3VLProcessor

* same modification to modular_qwen3_vl.py
2025-10-24 11:08:49 +00:00
c27efe6e65 further reducing flakiness in utils/check_bad_commit.py (#41658) (#41815)
* 111

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-24 11:36:01 +02:00
8c291846f5 extend 2 blip2 and falcon_h1 test cases to xpu (#41825)
* extend 2 blip2 and falcon_h1 test cases to xpu

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* xx

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-24 11:15:15 +02:00
beb71b7575 extend 2 trainer test cases to xpu (#41829)
extend a trainer test cases to xpu

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-24 11:11:15 +02:00
82451cbb30 extend bitnet cases to xpu, all 8 cases pass (#41831)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-24 11:05:12 +02:00
9c20660138 unpin torch/torchcodec for CircleCI (#41839)
CirCleCI with torch 2.9

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-24 08:19:38 +02:00
e4b920b3cf [Parakeet] add output_attention_mask (#41694)
* add output_attention_mask

* style
2025-10-23 23:09:20 +00:00
81b4f9882c transformers serve quantization docs + some api fixes for bitsandbytes (#41253)
* doc

* fix api

* fix

* fix

* fix

* fix args

* minor doc fix

* fix

* style

* rm check for now

* fix

* style

* Update docs/source/en/serving.md

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* add log and update value

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-23 16:00:54 +00:00
2a3f66d9d2 Deprecate warmup_ratio (#41326)
* dep

* style

* deprecate warmup_ratio

* better

* fix

* Revert "style"

This reverts commit cf4f9e7c4f7837a88eea6eeabf8b4dfe9455f6dc.

* Revert "dep"

This reverts commit 1800beb13f407ddb881d0af936860643e84ba085.

* update version
2025-10-23 17:17:21 +02:00
ca01fe4d13 transformers cli default flag fix (#41761) 2025-10-23 13:33:55 +00:00
f780932e05 Fixed some grammar mistakes (#41802)
Added spaces between words, fixed a typo and other errors
2025-10-23 12:39:58 +00:00
e7c5a60368 Fixed grammar mistakes (#41799)
* Fixed grammar mistakes

fixed a couple grammar mistakes

* Update README.md

* Change phrasing a bit more

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-10-23 12:34:02 +00:00
91b5a680c0 [Trainer] remove env vars (#41697)
* remove env var

* style

* fix value

* update

* fix

* style

* fix

* maybe this time

* rm tests

* fix
2025-10-23 14:17:20 +02:00
d4562bb8ae Fix Qwen3Next dtype API usage (#41735)
Replace torch.get_current_dtype() with torch.get_default_dtype() to fix FLA compatibility
2025-10-23 12:02:02 +00:00
e46c2ff32e Add a safeguard around a flaky test in gemma2 (#41811)
* Fix _compile flag in flex attn integration

* Revert fix and add precaution around test
2025-10-23 12:36:50 +02:00
3b6ddbcb88 make apollo test case pass (#41805)
make apollo test cases pass

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-23 12:07:31 +02:00
ff04520266 Bump AMD docker (#41792) 2025-10-23 10:44:20 +02:00
01f5ac70a3 flash attn pytest marker (#41781)
* flash attn marker

* 111

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-23 08:39:19 +00:00
2c5b888c95 [Onnx docs] Remove some traces (#41791)
fix
2025-10-23 10:34:25 +02:00
0eb372ba19 [quantization] fix torchao tests after 0.14.0 release (#41777)
* initial commit

* clean int4_weight_only

* make style

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-23 08:26:44 +00:00
87be559508 Fix attention mask in mamba layers (#41790)
* not all mamba models are like LFM

* compile friendly

* adjust slow tests expectation

* naming
2025-10-22 18:15:38 +02:00
2ca506ca1d Fix chat schema tests (#41793)
* Fix chat schema tests

* make fixup
2025-10-22 15:00:49 +00:00
5426947e3a fix type annotation typo in docstring (#41788) 2025-10-22 13:58:18 +00:00
93671b4444 Swap columns and rows of the grid layout in LFM2-VL (#41755)
* swap columns and rows of the grid layout

* update integration tests

* fix the test case

* revert batched test change
2025-10-22 12:52:06 +00:00
18a3349a9f [quantization] Skip Fp8 tests when hardware capability < 8.9 (#41785)
* skipping tests

* style

* nit
2025-10-22 13:33:28 +02:00
e9f241bf89 [quantization] fix compressed_tensors tests (#41780)
fixing tests
2025-10-22 12:37:07 +02:00
7cd1d2b66c [v5] Delete legacy chat template saving (#41648)
* delete lagcy chat template saving

* fix tests

* fix qwen audio
2025-10-22 09:40:55 +00:00
48a36c96da fix: Gemma 3 weights conversion vision and multimodal projector paths (#41767)
fix: Gemma 3 vision and multimodal projector paths
2025-10-22 09:38:56 +00:00
9a27302803 Fix CUDA index out of bounds for q_idx in VLM token type masking for Gemma3, PaliGemma, and example modular (#41757)
* Fix CUDA index out of bounds for q_idx in Gemma3 token type masking

* Fix CUDA index out of bounds for q_idx in modular modeling_new_task_model

* Revert "Fix CUDA index out of bounds for q_idx in Gemma3 token type masking"

This reverts commit f8e5c2a42c305aebd00c46161bf22f520009c8fc.

* Fix CUDA index out of bounds for q_idx in PaliGemma token type masking

* Fix CUDA index out of bounds for q_idx in Gemma3 token type masking
2025-10-22 11:29:47 +02:00
4f8781f84f Remove invalid @staticmethod from module-level get_device_and_memory_breakdown (#41747)
Remove staticmethod decorator from function
2025-10-22 10:52:29 +02:00
a8cece13e2 Fix bark after #41445 (#41645)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-22 09:21:45 +02:00
2e67a9b602 Add LightGlue fast image processor (#41670)
* add fast image processor skel

* add working lightglue fast image processor + tests

* remove plot_keypoint_matching
2025-10-22 00:33:16 +02:00
264cce9e0a Chat response parsing (#40894)
* Initial commit

* Adding more tests, bugfixes, starting tool tests

* Add support for JSON parsers and some tool tests

* stash commit

* stash commit

* stash commit

* stash commit

* stash commit

* Fix cohere schema, fix a lot of the recursive parser code

* GPT-OSS passing too!

* Update tests

* make fixup

* Offset tracking partially done

* stash commit

* stash commit

* Assistant masking Just Works

* make fixup

* stash commit

* stash commit

* JMESPath approach

* stash commit before i rip this PR apart

* Remove broken offset code

* Remove broken offset code

* Update chat parsing code and add tests for Ernie + fix Cohere tests for new format

* Implement tokenizer method

* jmespath dependency handling

* Completed TODOs

* Add support to TextGenerationPipeline

* Update GPT-OSS schema and test cases

* make fixup

* Fix typing (??)

* missing future import

* Use old typing in tokenization_utils_base.py

* put jmespath in various extras

* Remove accidental newline

* Guard tests correctly

* Remove require_jinja on the schema tests since we don't actually apply chat templates there

* make fixup

* fix some bad linter changes

* Fix docstring

* Push draft documentation

* Extend tests, more documentation

* make fixup

* docs docs docs

* Add Processor support

* Add to toctree

* Flag markdown correctly

* Remove double backslashes in docs for simplicity

* Simplify node-regex-to-dict

* Add support to ImageTextToTextPipeline

* Add support to ImageTextToTextPipeline and save/loading support in Processors

* Begin reworking docs to start fitting in response parsing

* Fix rebase

* Expand documentation further

* Expand documentation further

* Refactor x-regex-to-dict to x-regex-key-value, update the parser logic docs section

* Refactor x-regex-to-dict to x-regex-key-value, update the parser logic docs section

* More docs update

* Update TextGenerationPipeline to support tools properly

* Some rebase fixes

* Re-add is_jmespath_available

* Re-add is_jmespath_available

* Add Qwen3 parser and test, add maybe-json support

* Rollback processor changes - we'll wait for legacy saving to be deprecated

* Make fixup

* Revert ImageTextToText changes for now

* Add pipeline test

* make fixup

* Resolve a todo

* Resolve more TODOs and clean up the spec a little

* Add ref in the tools doc

* Update docs/source/en/chat_response_parsing.md

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/utils/chat_parsing_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add a docstring for parse_response

* Add function docstring and reference it in the docs

* Fix generate link

* Revert Processor changes for now

* Use updated GPT-OSS format

* Print the dict keys instead of the whole dict so the example doesn't become too big

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-10-21 17:26:18 +01:00
3f2db2c205 Simplify pipeline padding logic (#41667)
* Remove a lot of unnecessary pad logic

* Remove unnecessary clone() calls since we're just doing a slice assignment

* Just make the full tensor instead of adding to a zeros tensor
2025-10-21 17:01:48 +01:00
1d651c749e Modernize CLIP modeling code (#41546)
* stranded

* update modular

* modularities

* update

* fx broken

* fx stillb roken

* update

* missed this

* fix metaclip
2025-10-21 16:04:43 +02:00
f39355ec23 [v5] Remove deprecated tranformers.onnx (#41700)
* Remove deprecated tranformers.onnx

* Remove transformers.onnx related doc

* style

* shouldn't have been removed

* fix mismatch between metaclip2 modular en config file

* remove onnx config from not_doctested.txt

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-10-21 15:22:41 +02:00
5995435d96 Update type hints in modeling_rope_utils.py to use | syntax (#41714)
* Update type hints to use | syntax for Union types

- Replace Union[str, os.PathLike] with str | os.PathLike
- Replace Optional[Union[str, dict]] with str | dict | None
- Keep Union for forward references like 'torch.dtype'
- Update imports to remove unused Union import where possible

This modernizes the type hints to use Python 3.10+ syntax while maintaining
compatibility with forward references.

* Update type hints in modeling_rope_utils.py to use | syntax

- Replace Union[float, dict[str, float]] with float | dict[str, float]
- Remove unused Union import
- Maintain backward compatibility

This modernizes the type hints to use Python 3.10+ syntax.
2025-10-21 12:05:11 +00:00
2383f3fcbb Fix graphormer model compilation with Cython 3.1.4 (#41671)
Hitting this kind of error when running:

```
cython src/transformers/models/deprecated/graphormer/algos_graphormer.pyx
```

```
Error compiling Cython file:
------------------------------------------------------------
...
    (nrows, ncols) = path.shape
    assert nrows == ncols
    cdef unsigned int n = nrows
    cdef unsigned int max_dist_copy = max_dist

    path_copy = path.astype(long, order='C', casting='safe', copy=True)
                            ^
------------------------------------------------------------
src/transformers/models/deprecated/graphormer/algos_graphormer.pyx:88:28: undeclared name not builtin: long
```

This appears to have changed between cython==3.0 and cython==3.1.  AFAICT the
correct type to use here would be `int`.  Switching to it makes the command
succeed and generate an algos_graphormer.c file.
2025-10-21 12:02:23 +00:00
c4e88f78ca Reduce warning noise caused by Tensor.new_tensor (#41748) 2025-10-21 11:54:13 +00:00
2fe4a30340 [kernels] Add version to function mapping (#41685)
add version
2025-10-21 13:27:18 +02:00
ede7976cd2 Fixed incorrect model_type for qwen2vl and qwen2.5vl when config is saved and loaded again (#41758)
* fixed incorrect model_type for qwen2vl and qwen2.5vl

* added tests
2025-10-21 10:54:58 +00:00
ee3a1002e2 [v5] Delete videos from image processing classes (#41607)
* delete

* why there were video tests in image file

* fix tests and copies

* docs and autto class
2025-10-21 12:03:31 +02:00
4e50b8459d upgrade xpu docker file to torch 2.8 (#41551)
* upgrade xpu docker file to torch 2.8

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update Dockerfile

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-21 10:03:04 +02:00
9aab965b1e Add vision contribution guide (#41456)
* vision contrib guide

* Update CONTRIBUTING.md

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

* Update CONTRIBUTING.md

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

* Update CONTRIBUTING.md

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

* Update CONTRIBUTING.md

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

* Update CONTRIBUTING.md

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

* Update CONTRIBUTING.md

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

* update tiny things

---------

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
2025-10-20 18:56:47 +02:00
1a034ce1d2 Docs(zh-hans): Refine wording for professionalism in README (#40943)
* [docs] Polish Chinese README translation by replacing informal terms  with professional vocabulary

* [docs] Polish Simplified Chinese README for better professionalism and consistency

- Replace "抱抱脸" with "Hugging Face" to align with standard usage in Chinese developer community
- Replace "流水线" with "pipeline" to maintain consistency with code and technical terminology
- Add proper code formatting (`pipeline`) for API references to match Traditional Chinese version
- Update translation dictionary to reflect these standardized terms
- Improve overall readability and technical accuracy for Chinese developers

These changes enhance the professionalism of the documentation while maintaining consistency with established technical terminology used by the Chinese developer community.
2025-10-20 08:39:49 -07:00
6850ba853f Small Fix for imports (#41411)
small fix
2025-10-20 17:21:04 +02:00
bf0bce8d5f Apply RUFF PIE rules (#41727)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-20 13:32:23 +00:00
2cf8f833b0 Fix documentation issues (#41726)
Fix more documentation issues

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-20 13:31:02 +00:00
517197f795 Update type hints in tokenization_utils.py to use | syntax (#41713)
* Update type hints to use | syntax for Union types

- Replace Union[str, os.PathLike] with str | os.PathLike
- Replace Optional[Union[str, dict]] with str | dict | None
- Keep Union for forward references like 'torch.dtype'
- Update imports to remove unused Union import where possible

This modernizes the type hints to use Python 3.10+ syntax while maintaining
compatibility with forward references.

* Update type hints in tokenization_utils.py to use | syntax

- Replace Union[AddedToken, str] with AddedToken | str
- Replace Union[list[str], list[AddedToken]] with list[str] | list[AddedToken]
- Replace Union[str, list[str]] with str | list[str]
- Replace Union[int, list[int]] with int | list[int]
- Update error messages to use | syntax
- Maintain backward compatibility

This modernizes the type hints to use Python 3.10+ syntax.

* Fix error message formatting in tokenization_utils.py

- Fix error message to use Union syntax instead of | syntax in string
- This prevents potential issues with error message formatting
- Maintains type hint modernization while fixing error messages
2025-10-20 13:24:16 +00:00
9d4ee18e25 [doc] remove broken notebooks on AMD Dev Cloud (#41743)
revert
2025-10-20 14:36:53 +02:00
818f7f10e4 Revert "Remove upper version bound of pandas" (#41744)
Revert "Remove upper version bound of pandas (#41677)"

This reverts commit a15d77cd0c3e4c1d9f0a196da5996b735eead37e.
2025-10-20 14:25:32 +02:00
ce4ffeeb6c Fix typo in LFM-VL (#41742)
oops, remove untrelated commits
2025-10-20 13:55:41 +02:00
cb6f03fce4 Fix Qwen3-Omni inference when mixing video and image inputs in one batch (#41741)
* Fix qwen3omni inference when mixing video and image inputs in one batch

* Fix `router_aux_loss_coef`

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
2025-10-20 11:35:02 +00:00
8fc5420913 Gemma3 conversion script maintenance (#41704)
* conversion: add include_vision_encoder flag (default true)

* conversion: update for inverted model.language_model weight path

* conversion: revert include_vision_encoder to True by default

* conversion: add chat template path flag
2025-10-20 12:52:22 +02:00
286 changed files with 3017 additions and 7711 deletions

View File

@ -22,7 +22,6 @@ tests/generation/ @gante
/src/transformers/models/auto/ @ArthurZucker
/src/transformers/utils/ @ArthurZucker @Rocketknight1
/src/transformers/loss/ @ArthurZucker
/src/transformers/onnx/ @michaelbenayoun
# Specific files come after the sections/globs, so they take priority
/.circleci/config.yml @ArthurZucker @ydshieh

View File

@ -3,7 +3,7 @@ name: Build docker images (scheduled)
on:
push:
branches:
- fix_docker_file
- build_ci_docker_image*
repository_dispatch:
workflow_dispatch:
workflow_call:
@ -42,6 +42,315 @@ jobs:
with:
context: ./docker/transformers-all-latest-gpu
build-args: |
REF=fix_docker_file
REF=main
push: true
tags: huggingface/transformers-all-latest-gpu-test
tags: huggingface/transformers-all-latest-gpu${{ inputs.image_postfix }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-all-latest-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-all-latest-gpu-push-ci
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the transformers-all-latest-gpu-push-ci docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-torch-deepspeed-docker:
name: "Latest PyTorch + DeepSpeed"
runs-on:
group: aws-g4dn-2xlarge-cache
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v4
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-deepspeed-latest-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-deepspeed-latest-gpu${{ inputs.image_postfix }}
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER}}
title: 🤗 Results of the transformers-pytorch-deepspeed-latest-gpu docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
# Can't build 2 images in a single job `latest-torch-deepspeed-docker` (for `nvcr.io/nvidia`)
latest-torch-deepspeed-docker-for-push-ci-daily-build:
name: "Latest PyTorch + DeepSpeed (Push CI - Daily Build)"
runs-on:
group: aws-general-8-plus
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v4
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-deepspeed-latest-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the transformers-pytorch-deepspeed-latest-gpu-push-ci docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
doc-builder:
name: "Doc builder"
# Push CI doesn't need this image
if: inputs.image_postfix != '-push-ci'
runs-on:
group: aws-general-8-plus
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v4
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-doc-builder
push: true
tags: huggingface/transformers-doc-builder
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the huggingface/transformers-doc-builder docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-pytorch:
name: "Latest PyTorch [dev]"
# Push CI doesn't need this image
if: inputs.image_postfix != '-push-ci'
runs-on:
group: aws-general-8-plus
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v4
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-gpu
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the huggingface/transformers-pytorch-gpudocker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-pytorch-amd:
name: "Latest PyTorch (AMD) [dev]"
runs-on:
group: aws-highcpu-32-priv
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v4
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-amd-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-amd-gpu${{ inputs.image_postfix }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-amd-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-amd-gpu-push-ci
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the huggingface/transformers-pytorch-amd-gpu-push-ci build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-pytorch-deepspeed-amd:
name: "PyTorch + DeepSpeed (AMD) [dev]"
runs-on:
group: aws-general-8-plus
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v4
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-deepspeed-amd-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-deepspeed-amd-gpu${{ inputs.image_postfix }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-deepspeed-amd-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-deepspeed-amd-gpu-push-ci
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the transformers-pytorch-deepspeed-amd-gpu build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-quantization-torch-docker:
name: "Latest Pytorch + Quantization [dev]"
# Push CI doesn't need this image
if: inputs.image_postfix != '-push-ci'
runs-on:
group: aws-general-8-plus
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v4
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-quantization-latest-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-quantization-latest-gpu${{ inputs.image_postfix }}
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the transformers-quantization-latest-gpu build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}

View File

@ -112,7 +112,125 @@ New models are constantly released and if you want to implement a new model, ple
If you are willing to contribute the model yourself, let us know so we can help you add it to 🤗 Transformers!
We have a technical guide for [how to add a model to 🤗 Transformers](https://huggingface.co/docs/transformers/add_new_model).
We have a technical guide for [how to add a model to 🤗 Transformers](https://huggingface.co/docs/transformers/modular_transformers).
### Vision-Language Model Contribution Checklist
If you're contributing a **vision-language model** (or any multimodal model that processes images/videos), please follow this checklist. Maintainers will use this to review your PR, and completing these steps will significantly increase the likelihood of your PR being merged quickly.
**Required checklist for all vision-language model contributions:**
**1. Implement a modular file**
All new models should use the modular architecture pattern. Create a `modular_<model_name>.py` file using the modular model converter:
- Use the CLI, [`transformers add-new-model-like`](https://github.com/huggingface/transformers/blob/main/src/transformers/cli/add_new_model_like.py) to generate a modular skeleton and get started
- All code should be in the modular file if possible. Modeling must be in it, it's better if configuration is in it as well.
- Reuse existing patterns from similar models as much as possible
To verify your modular file is correct, run:
```bash
python utils/modular_model_converter.py <model_name>
```
This will generate the separate files (`modeling_*.py`, `configuration_*.py`, etc.) from your modular file. The CI will enforce that these generated files match your modular file.
**2. Add a fast image processor (for image models)**
If your model processes images, implement a fast image processor that uses `torch` and `torchvision` instead of PIL/numpy for better inference performance:
- See the detailed guide in [#36978](https://github.com/huggingface/transformers/issues/36978)
- Fast processors inherit from `BaseImageProcessorFast`
- Examples: `LlavaOnevisionImageProcessorFast`, `Idefics2ImageProcessorFast`
**3. Create a weight conversion script**
Add a `convert_<model_name>_to_hf.py` script that converts the original model weights to the HuggingFace format:
- Script should handle checkpoint loading, key mapping, and saving in HF format
- Include usage examples and documentation in the script
- Examples: [`convert_llava_onevision_weights_to_hf.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava_onevision/convert_llava_onevision_weights_to_hf.py), [`convert_idefics2_weights_to_hf.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/idefics2/convert_idefics2_weights_to_hf.py)
**4. Add integration tests with exact output matching**
At minimum, add an `IntegrationTest` class that tests end-to-end generation (processing and modelling) with **exact** output matching:
- For generative models: test that generated text matches expected output exactly
- For non-generative models: test that output logits match expected values
- Tests should use real checkpoints (load in 4-bit or half precision if the checkpoint is too big to fit in our CI runners) and real inputs
- Example pattern:
```python
class MyModelIntegrationTest(unittest.TestCase):
@slow
def test_model_integration(self):
model = MyModelForConditionalGeneration.from_pretrained("org/model-name")
processor = AutoProcessor.from_pretrained("org/model-name")
inputs = processor(images=image, text=prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=20)
EXPECTED_TEXT = "exact expected output"
self.assertEqual(processor.decode(output[0]), EXPECTED_TEXT)
```
See `tests/models/llava_onevision/test_modeling_llava_onevision.py` for complete examples.
**5. Update documentation**
Add or update model documentation:
- Create if the cli hasn't `docs/source/en/model_doc/<model_name>.md` with usage examples
- Include model description, paper link, and basic usage with `Pipeline` and `AutoModel`
- Add the model to the appropriate TOC files
**6. Look for reusable patterns**
The library has 400+ models with many established patterns:
- Search for similar models (e.g., other vision-language models)
- Reuse attention mechanisms, layer implementations, and processing patterns
- Check models like LLaVA, Idefics2, Fuyu for vision-language patterns
- Use provided decorators like (`auto_docstring`, `can_return_tuple`, `check_model_inputs` and `_can_record_outputs`) where relevant.
- Don't reinvent the wheel
**7. Run quality checks and read the output**
Before submitting your PR, install quality dependencies and run the full check suite:
```bash
pip install -e ".[quality]"
make fixup
```
**Important**: Take time to read the output of `make fixup`. It will:
- Lint and format your code automatically
- Run consistency checks (imports, docstrings, etc.)
- Show any remaining issues that need manual fixes
All checks must pass before your PR can be merged.
**If this checklist is complete, your PR has a very high likelihood of being merged!** Following these steps makes the maintainers' work much easier and will reduce the number of review iterations, getting your important work out there faster.
#### Copy-pastable checklist for maintainers
Here's a condensed version maintainers can copy into PRs:
```markdown
## Multimodal Model Addition Checklist
Please ensure your PR completes all following items. See the [full checklist](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#vision-language-model-contribution-checklist) for details.
- [ ] **Modular file**: `modular_<model_name>.py` implemented and verified with `python utils/modular_model_converter.py <model_name>`
- [ ] **Fast image processor**: Implemented using `BaseImageProcessorFast` (see [#36978](https://github.com/huggingface/transformers/issues/36978))
- [ ] **Conversion script**: `convert_<model_name>_to_hf.py` added with usage examples
- [ ] **Integration tests**: End-to-end tests with exact output matching (text or logits)
- [ ] **Documentation**: Model docs added/updated in `docs/source/en/model_doc/`
- [ ] **Pattern reuse**: Verified against similar models (LLaVA, Idefics2, etc.)
- [ ] **Quality checks**: `make fixup` passes with no errors
```
## Do you want to add documentation?

View File

@ -64,8 +64,8 @@ limitations under the License.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_as_a_model_definition.png"/>
</h3>
Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer
vision, audio, video, and multimodal model, for both inference and training.
Transformers acts as the model-definition framework for state-of-the-art machine learning with text, computer
vision, audio, video, and multimodal models, for both inference and training.
It centralizes the model definition so that this definition is agreed upon across the ecosystem. `transformers` is the
pivot across frameworks: if a model definition is supported, it will be compatible with the majority of training

View File

@ -87,6 +87,8 @@ def pytest_configure(config):
config.addinivalue_line("markers", "not_device_test: mark the tests always running on cpu")
config.addinivalue_line("markers", "torch_compile_test: mark test which tests torch compile functionality")
config.addinivalue_line("markers", "torch_export_test: mark test which tests torch export functionality")
config.addinivalue_line("markers", "flash_attn_test: mark test which tests flash attention functionality")
config.addinivalue_line("markers", "flash_attn_3_test: mark test which tests flash attention 3 functionality")
os.environ["DISABLE_SAFETENSORS_CONVERSION"] = "true"

View File

@ -5,7 +5,7 @@ ARG REF=main
RUN apt-get update && apt-get install -y time git g++ pkg-config make git-lfs
ENV UV_PYTHON=/usr/local/bin/python
RUN pip install uv && uv pip install --no-cache-dir -U pip setuptools GitPython
RUN uv pip install --no-cache-dir --upgrade 'torch<2.9' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir --upgrade 'torch' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir pypi-kenlm
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[quality,testing,torch-speech,vision]"
RUN git lfs install

View File

@ -17,7 +17,7 @@ RUN make install -j 10
WORKDIR /
RUN uv pip install --no-cache --upgrade 'torch<2.9' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache --upgrade 'torch' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir --no-deps accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[ja,testing,sentencepiece,spacy,ftfy,rjieba]" unidic unidic-lite
# spacy is not used so not tested. Causes to failures. TODO fix later

View File

@ -5,7 +5,7 @@ USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git-lfs ffmpeg curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch<2.9' 'torchaudio' 'torchvision' 'torchcodec<0.8' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' 'torchcodec' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing]" seqeval albumentations jiwer

View File

@ -5,7 +5,7 @@ USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git libgl1 g++ tesseract-ocr git-lfs curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch<2.9' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir --no-deps timm accelerate
RUN uv pip install -U --no-cache-dir pytesseract python-Levenshtein opencv-python nltk
# RUN uv pip install --no-cache-dir natten==0.15.1+torch210cpu -f https://shi-labs.com/natten/wheels

View File

@ -5,7 +5,7 @@ USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git pkg-config openssh-client git ffmpeg curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch<2.9' 'torchaudio' 'torchvision' 'torchcodec<0.8' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' 'torchcodec' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing]"

View File

@ -5,7 +5,7 @@ USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git-lfs ffmpeg curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch<2.9' 'torchaudio' 'torchvision' 'torchcodec<0.8' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' 'torchcodec' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing,tiktoken,num2words,video]"

View File

@ -1,8 +1,6 @@
FROM nvidia/cuda:12.6.0-cudnn-devel-ubuntu22.04
LABEL maintainer="Hugging Face"
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility,graphics,video,display,compat32
ARG DEBIAN_FRONTEND=noninteractive
# Use login shell to read variables from `~/.profile` (to pass dynamic created variables between RUN commands)

View File

@ -1,4 +1,4 @@
FROM rocm/pytorch:rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.7.1
FROM rocm/pytorch:rocm7.0.2_ubuntu24.04_py3.12_pytorch_release_2.7.1
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
@ -10,8 +10,8 @@ RUN apt update && \
RUN git lfs install
RUN python3 -m pip install --no-cache-dir --upgrade pip numpy
RUN python3 -m pip install --no-cache-dir --upgrade importlib-metadata setuptools ninja git+https://github.com/facebookresearch/detectron2.git pytesseract "itsdangerous<2.1.0"
RUN python3 -m pip install --no-cache-dir --upgrade pip numpy importlib-metadata setuptools wheel ninja pytesseract "itsdangerous<2.1.0"
RUN python3 -m pip install --no-cache-dir --no-build-isolation git+https://github.com/facebookresearch/detectron2.git
ARG REF=main
WORKDIR /
@ -39,6 +39,7 @@ RUN python3 -m pip install --no-cache-dir "torchcodec==0.5"
# Install flash attention from source. Tested with commit 6387433156558135a998d5568a9d74c1778666d8
RUN git clone https://github.com/ROCm/flash-attention/ -b tridao && \
cd flash-attention && \
GPU_ARCHS="gfx942" python setup.py install
GPU_ARCHS="gfx942;gfx950" python setup.py install
# GPU_ARCHS builds for MI300, MI325 and MI355
RUN python3 -m pip install --no-cache-dir einops

View File

@ -3,11 +3,10 @@ LABEL maintainer="Hugging Face"
SHELL ["/bin/bash", "-c"]
ARG PYTHON_VER=3.11
ARG PYTHON_VER=3.12
ENV TORCH_DEVICE_BACKEND_AUTOLOAD=0
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get remove -y python3.10 && apt-get autoremove -y
RUN apt-get update && \
apt-get install -y software-properties-common && \
add-apt-repository -y ppa:deadsnakes/ppa && \
@ -23,7 +22,6 @@ RUN apt-get update && \
apt-utils \
build-essential \
ca-certificates \
clinfo \
curl \
git \
git-lfs \
@ -35,7 +33,6 @@ RUN apt-get update && \
rsync \
sudo \
libnl-genl-3-200 \
xpu-smi \
unzip \
ffmpeg \
tesseract-ocr \
@ -45,34 +42,47 @@ RUN apt-get update && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN apt-get update && \
apt-get install -y \
linux-headers-$(uname -r) \
linux-modules-extra-$(uname -r) \
linux-headers-$(uname -r) linux-modules-extra-$(uname -r) \
flex bison \
intel-fw-gpu intel-i915-dkms xpu-smi \
intel-fw-gpu intel-i915-dkms xpu-smi intel-ocloc clinfo\
intel-opencl-icd libze-intel-gpu1 libze1 \
intel-media-va-driver-non-free libmfx-gen1 libvpl2 \
libegl-mesa0 libegl1-mesa libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
libegl-mesa0 libegl1 libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
libglapi-mesa libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers \
mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo hwinfo clinfo intel-ocloc \
mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo hwinfo \
libigc-dev intel-igc-cm libigdfcl-dev libigfxcmrt-dev libze-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN pip install --upgrade pip
RUN pip install triton==3.3.0
# Use virtual env because Ubuntu-24 does not allowed pip on original python
RUN curl -LsSf https://astral.sh/uv/install.sh | sh
ENV PATH="/root/.local/bin:$PATH"
ENV VIRTUAL_ENV="/opt/venv"
ENV UV_PYTHON_INSTALL_DIR=/opt/uv/python
RUN uv venv --python ${PYTHON_VER} --seed ${VIRTUAL_ENV}
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
RUN pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/xpu --no-cache-dir
RUN pip install --upgrade pip wheel
RUN pip install triton==3.4.0
RUN pip install evaluate torchdata pyctcdecode pytesseract decord galore-torch fire scipy scikit-learn sentencepiece sacremoses nltk rouge_score librosa soundfile g2p_en mpi4py requests_mock
RUN pip install pretty_midi essentia resampy Levenshtein av sacrebleu phonemizer invisible_watermark schedulefree
RUN pip install gguf hqq compressed_tensors gptqmodel mergekit autoawq deepspeed torchao onnx
RUN pip install hf_transfer huggingface-hub hf-doc-builder datasets optimum-quanto timm transformers accelerate optimum peft
RUN pip install torch==2.8.0+xpu torchvision==0.23.0+xpu torchaudio==2.8.0+xpu --index-url https://download.pytorch.org/whl/xpu --no-cache-dir
RUN pip install torchcodec torchdata --no-cache-dir
RUN pip install evaluate pyctcdecode pytesseract decord galore-torch fire scipy scikit-learn sentencepiece sacremoses nltk rouge_score librosa soundfile g2p_en mpi4py requests_mock
RUN pip install pretty_midi essentia resampy Levenshtein av sacrebleu phonemizer invisible_watermark schedulefree setuptools
RUN pip install gptqmodel --no-build-isolation
RUN pip install gguf hqq compressed_tensors autoawq deepspeed torchao onnx auto_round
RUN pip install hf_transfer huggingface-hub hf-doc-builder datasets optimum-quanto timm transformers accelerate optimum peft diffusers trl kernels
# install liger-kernel
RUN pip install git+https://github.com/linkedin/Liger-Kernel.git --extra-index-url https://download.pytorch.org/whl/test/xpu
# install mergekit
RUN pip install --break-system-packages git+https://github.com/arcee-ai/mergekit.git@v0.1.3
# install bitsandbytes
RUN pip install git+https://github.com/bitsandbytes-foundation/bitsandbytes.git

View File

@ -24,7 +24,7 @@ pip install -e ".[dev]"
```
> [!NOTE]
> This command might fail for some OS that are missing dependencies. Check step 4 in [Create a Pull Request](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request) to workaround it.
> This command might fail for some OS that are missing dependencies. Check step 4 in [Create a Pull Request](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request) to work around it.
Then you need to install our special tool that builds the documentation:
@ -38,7 +38,7 @@ pip install git+https://github.com/huggingface/doc-builder
## Building the documentation
Once you have setup the `doc-builder` and additional packages, you can generate the documentation by
Once you have set up the `doc-builder` and additional packages, you can generate the documentation by
typing the following command:
```bash
@ -295,12 +295,11 @@ Here's an example of a tuple return, comprising several objects:
Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos, and other non-text files. We prefer to leverage a hf.co hosted `dataset` like
the ones hosted on [`hf-internal-testing`](https://huggingface.co/hf-internal-testing) in which to place these files and reference
them by URL. We recommend putting them in the following dataset: [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images).
If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images
to this dataset.
If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate them to this dataset.
## Styling the docstring
We have an automatic script running with the `make style` comment that will make sure that:
We have an automatic script running with the `make style` command that will make sure that:
- the docstrings fully take advantage of the line width
- all code examples are formatted using black, like the code of the Transformers library

View File

@ -258,8 +258,6 @@
# title: النماذج
# - local: main_classes/text_generation
# title: توليد النصوص
# - local: main_classes/onnx
# title: ONNX
# - local: main_classes/optimizer_schedules
# title: التحسين
# - local: main_classes/output

View File

@ -32,7 +32,7 @@
لتصدير نموذج 🤗 Transformers إلى ONNX، قم أولاً بتثبيت اعتماد إضافي:
```bash
pip install optimum[exporters]
pip install optimum-onnx
```
للاطلاع على جميع المعامﻻت المتاحة، يرجى الرجوع إلى [وثائق 🤗 Optimum](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli)، أو عرض المساعدة في سطر الأوامر:
@ -111,60 +111,3 @@ optimum-cli export onnx --model keras-io/transformers-qa distilbert_base_cased_s
### تصدير نموذج لهندسة غير مدعومة
إذا كنت ترغب في المساهمة من خلال إضافة دعم لنموذج لا يُمكن تصديره حاليًا، فيجب عليك أولاً التحقق مما إذا كان مدعومًا في [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview)، وإذا لم يكن مدعومًا، [فيمكنك المساهمة في 🤗 Optimum](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute) مُباشرةً.
### تصدير نموذج باستخدام `transformers.onnx`
<Tip warning={true}>
لم يعد يتم دعم `transformers.onnx` يُرجى تصدير النماذج باستخدام 🤗 Optimum كما هو موضح أعلاه. سيتم إزالة هذا القسم في الإصدارات القادمة.
</Tip>
لتصدير نموذج 🤗 Transformers إلى ONNX باستخدام `transformers.onnx`، ثبّت التبعيات الإضافية:
```bash
pip install transformers[onnx]
```
استخدم حزمة `transformers.onnx` كنموذج Python لتصدير نقطة حفظ باستخدام تكوين جاهز:
```bash
python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
```
يُصدّر هذا رسمًا بيانيًا ONNX لنقطة الحفظ المُحددة بواسطة وسيطة `--model`. مرر أي نقطة حفظ على 🤗 Hub أو نقطة حفظ مُخزنة محليًا.
يُمكن بعد ذلك تشغيل ملف `model.onnx` الناتج على أحد المُسرعات العديدة التي تدعم معيار ONNX. على سبيل المثال، قم بتحميل وتشغيل النموذج باستخدام ONNX Runtime كما يلي:
```python
>>> from transformers import AutoTokenizer
>>> from onnxruntime import InferenceSession
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
>>> session = InferenceSession("onnx/model.onnx")
>>> # يتوقع ONNX Runtime مصفوفات NumPy كمدخلات
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
```
يُمكن الحصول على أسماء المخرجات المطلوبة (مثل `["last_hidden_state"]`) من خلال إلقاء نظرة على تكوين ONNX لكل نموذج. على سبيل المثال، بالنسبة لـ DistilBERT، لدينا:
```python
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
>>> config = DistilBertConfig()
>>> onnx_config = DistilBertOnnxConfig(config)
>>> print(list(onnx_config.outputs.keys()))
["last_hidden_state"]
```
العمليات مُتطابقة لنقاط الحفظ TensorFlow على Hub. على سبيل المثال، صدّر نقطة حفظ TensorFlow خالصة كما يلي:
```bash
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
```
لتصدير نموذج مُخزن محليًا، احفظ أوزان النموذج ومجزىء اللغوى في نفس الدليل (على سبيل المثال `local-pt-checkpoint`)، ثم قم بتصديره إلى ONNX عن طريق توجيه وسيط `--model` لحزمة `transformers.onnx` إلى الدليل المطلوب:
```bash
python -m transformers.onnx --model=local-pt-checkpoint onnx/
```

View File

@ -88,6 +88,8 @@
title: Tool use
- local: chat_templating_writing
title: Writing a chat template
- local: chat_response_parsing
title: Response parsing
title: Chat with models
- sections:
- local: serving

View File

@ -95,9 +95,12 @@ print(tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):]))
The chat model called the `get_current_temperature` tool with the correct parameters from the docstring. It inferred France as the location based on Paris, and that it should use Celsius for the units of temperature.
A model **cannot actually call the tool itself**. It requests a tool call, and it's your job to handle the call and append it and the result to the chat history.
A model **cannot actually call the tool itself**. It requests a tool call, and it's your job to handle the call and append it and the result to the chat history. For
models that support [response parsing](./chat_response_parsing), the response parsing will be handled automatically, and you can just use
[`~PreTrainedTokenizer.parse_response] to extract the tool call. For other models, you'll need to manually translate the output
string into a tool call dict.
Hold the call in the `tool_calls` key of an `assistant` message. This is the recommended API, and should be supported by the chat template of most tool-using models.
Regardless of the approach you use, the tool call should go in the `tool_calls` key of an `assistant` message. This is the recommended API, and should be supported by the chat template of most tool-using models.
> [!WARNING]
> Although `tool_calls` is similar to the OpenAI API, the OpenAI API uses a JSON string as its `tool_calls` format. This may cause errors or strange model behavior if used in Transformers, which expects a dict.

View File

@ -0,0 +1,233 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Response Parsing
It is increasingly common for chat models to generate structured outputs, rather than just a single reply string.
The most common uses for structured outputs are [tool calling](./chat_extras) and [reasoning models](https://huggingface.co/reasoning-course).
Tool calling models can output tool calls, containing the name of the tool to call and any arguments to be passed to it,
while reasoning models often output reasoning steps as a "chain of thought". Some recent models even use both of these,
and may output reasoning and/or one or more tool calls before their final answer.
Models with structured outputs pose a challenge for chat templating, because the output needs to be parsed before it
can be appended to the chat. For a concrete example, let's say we ask [GPT-OSS](https://huggingface.co/openai/gpt-oss-120b)
what the weather is like, and it thinks and decides to call a tool. Here's what the raw model output might look like:
```txt
<|start|>analysis<|message|>The user asks: "What is the weather like in SF?" We need to get the location of the user? The user explicitly asks about SF (San Francisco).
So we need to get the current weather in San Francisco, CA. We need to call get_current_weather function. But we need to call function to get weather data.
So we should call get_current_weather with location "San Francisco, CA". Let's do that.
We will call function get_current_weather.<|end|><|start|>commentary to=functions.get_current_weather<|channel|>commentary <|constrain|>json<|message|>{"location":"San Francisco, CA"}<|call|>
}
```
But if you want to append this to a chat, you'll need to format it as a chat message dict, like this:
```json
{
"role": "assistant",
"thinking": "The user asks: \"What is the weather like in SF?\" We need to get the location of the user? The user explicitly asks about SF (San Francisco). So we need to get the current weather in San Francisco, CA. We need to call get_current_weather function. But we need to call function to get weather data. So we should call get_current_weather with location \"San Francisco, CA\". Let's do that.",
"tool_calls": [
{
"name": "get_current_weather",
"arguments": {
"location": "San Francisco, CA"
}
}
]
}
```
Chat **templates** give us a way to turn messages into formatted input for a model, but we need something else to
parse model output back into a standard message dict. This is what chat **parsing** is for.
## The [parse_response](~PreTrainedTokenizerBase.parse_response) method
Parsing a chat response on a model that supports it is straightforward. Simply take the raw, decoded output from
[generate](`~generation.GenerationMixin.generate`), and pass it to the tokenizer's [parse_response](~PreTrainedTokenizerBase.parse_response) method:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM3-3B"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, dtype="auto", device_map="auto")
messages = [
{
"role": "user",
"content": "Hey! Can you summarize the end of the Cold War as briefly as possible? Like, comically briefly. It should really leave out almost most of the relevant information."
}
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(input_ids, max_new_tokens=1024)[0, input_ids.shape[1]:]
out_text = tokenizer.decode(outputs)
parsed = tokenizer.parse_response(out_text)
print(parsed.keys())
```
And you should get:
```text
dict_keys(['thinking', 'content'])
```
And that's all you need to start using response parsing! `parse_response` should return a complete message dict that is ready to be appended to the chat history.
When the tokenizer does not support response parsing, `parse_response` will throw an error. We hope to add support
to more tokenizers over time.
## Developers: Understanding a simple response schema
Under the hood, `parse_response` uses a **JSON schema** to parse the model output. A JSON schema represents
the structure of the output message dict. The schema is augmented with additional fields that indicate how the
output message string should be parsed into the expected format. Let's take a look at the schema for a SmolLM response,
excluding tool calls for now:
```python
{
"x-regex": "(?:<think>\n?(?P<thinking>.+?)\n?</think>)?\s*(?P<content>.+?)?\s*(?:<\|im_end\|>|$)",
"type": "object",
"properties": {
"role": {"const": "assistant"},
"content": {"type": "string"},
"thinking": {"type": "string"}
}
}
```
We can see that the schema describes a JSON "object" (a `dict`, in other words) with three keys: `role`, `content`, and `thinking`.
Because all assistant responses have the role "assistant", the `role` key is a `const`(ant). The other two keys are strings, extracted
from the named groups in the regex in the `x-regex` field.
Like chat templates, response schemas are set as a property of the tokenizer. To enable response parsing, all you need
to do is set `tokenizer.response_schema` to a valid schema dict, and `tokenizer.parse_response()` will work! Again, like
chat templates, this schema will be saved with the processor, so once you set it, you can use `save_pretrained()` or `push_to_hub()` to
save and share the schema.
## Developers: Complex schemas
Now, let's look at a more complex schema, which includes tool calls, to gain more of an understanding of the parser
internals. For this, we'll use the `GPT-OSS` schema. GPT-OSS emits both tool calls and thinking blocks, and it uses
an unusual format where model responses are tagged with one of three "channels": `commentary` for things like
tool calls, `analysis` for chain of thought blocks, and `final` for messages intended to be sent to the user.
A full message where the model calls a tool named `get_current_weather` might look like this, with some extra linebreaks added for clarity:
```text
<|channel|>analysis<|message|>
The user asks: "What is the weather like in SF?" So we need to get the current weather in San Francisco, CA.
We need to call get_current_weather function. So we should call get_current_weather with location "San Francisco, CA".
<|end|>
<|start|>assistant<|channel|>commentary
to=functions.get_current_weather <|constrain|>json<|message|>
{
"location": "San Francisco, CA"
}
<|call|>
```
Parsing proceeds recursively; the output of a regex (or other parser) at one level becomes the input to the nodes below it.
In other words, don't feel like you have to parse the entire output in one enormous regex! Instead, start with the schema,
and then add regexes to extract the relevant chunks as you go. Here's a schema that will parse it, with some
explanatory comments:
```python
{
"type": "object",
"properties": {
"role": {"const": "assistant"},
# "content" and "thinking" are both similar to the previous example, and just extract a single string
# However, rather than using a single regex with named groups to extract both, we use a regex in each subkey.
# When an object node has no parser/regex, the entire input string is passed to all of its children, so
# parsing can either be done with named groups at the object level, or with separate regexes at the property level.
"content": {"type": "string", "x-regex": r"<\|channel\|>final<\|message\|>(.*?)(?:<\|end\|>|$)"},
"thinking": {"type": "string", "x-regex": r"<\|channel\|>analysis<\|message\|>(.*?)<\|end\|>"},
"tool_calls": {
# "x-regex-iterator" uses re.findall to find multiple possible manages, and returns them as an
# array/list. You don't need to worry about array handling, though - each item in the array will be
# parsed by the `items` schema, so just write the schema for a single item.
"x-regex-iterator": r"<\|channel\|>commentary (to=functions\..*?<\|message\|>.*?)(?:<\|call\|>|$)",
"type": "array",
"items": {
"type": "object",
"properties": {
# A const property is a fixed value, and the input has no effect on it.
"type": {"const": "function"},
# Here, we wrap the entire tool call dict in a `{"function": ...}` block. The input string is passed through to it unchanged.
"function": {
"type": "object",
"properties": {
"name": {"type": "string", "x-regex": r"^to=functions\.(\w+)"},
"arguments": {
"type": "object",
"x-regex": "<\|message\|>(.*)",
# The "x-parser" field indicates that the extracted string should be parsed as JSON.
# The output is then passed to the schema nodes below and recursive parsing continues.
"x-parser": "json",
"additionalProperties": {"type": "any"},
},
},
},
},
},
},
},
}
```
## Developers: Understanding the parser logic
The parser follows a few simple rules:
1. Each level of the schema receives input from the level above, applies any regex or parser it has, and then passes the output to its children.
2. The root level receives the entire decoded model output string as input.
3. If a node has structured content after parsing (for example, if the regex has named groups and returns a dict, or if the parser returns a dict or list),
then that structured content is mapped to the node's children, and each child node receives its corresponding value as input.
4. If an `object` (dict) node has unstructured (string) output, then the entire string is passed to all of its children. This allows child nodes
to handle parsing individually rather than requiring a single parent regex to extract all keys at once.
5. If an `array` (list) node has unstructured (string) output, then this throws an error.
There is a small set of allowable `x-` keys that indicate how parsing should be done at each node:
- `x-regex`: A regex string to apply to the input. If the regex has named groups, the output is a dict of group names to values. Named groups should only be used in `object` nodes.
Otherwise, the regex must have exactly one unnamed capturing group, and the output is the value of that group as a string.
- `x-regex-iterator`: A regex string to apply to the input using `re.findall()`. The output is a list of all matches.
This should only be used in `array` nodes, and the regex must have exactly one unnamed capturing group. The output is distributed to
the node's `items` schema.
- `x-parser`: Calls a built-in parser to apply to the input. Currently, the only supported parser is `json`, which parses the input string as JSON.
The output is passed to the child nodes for further parsing. Note that the `json` parser can return deeply nested output - in this case, the output
will be progressively unwrapped as it is passed through child nodes. The child nodes do not need additional `x-parser` or `x-regex` fields in this case,
but their structure must match the structure of the parsed JSON.
- `x-parser-args`: Only allowed in conjunction with `x-parser`. This is a dict of additional arguments that control parsing. Right now, the only supported
argument is `transform`, which specifies a `jmespath` transformation to apply to the output. This is useful when the JSON parser returns a structure
that needs to be modified to match the schema.
- `x-regex-key-value`: This is rarely necessary, but it can be useful when parsing key-value pairs in non-JSON format where the names of the keys are not known
in advance, such as when a model emits XML tool calls with arbitrary argument names. The regex must have exactly two named capturing groups,
`key` and `value`, and the output is a dict mapping keys to values. This should only be used in `object` nodes.
In general, multiple regexes/parsers cannot be combined at the same level. The exception is that `x-regex`, returning a single string, can be combined with the other parsers. In this case,
`x-regex` is applied first, and then the output is passed to the other parser, either `x-regex-iterator`, `x-parser`, or `x-regex-key-value`.
Putting these ideas together, you can see that the input flows through the schema, being parsed at each level and then distributed to child nodes. Each level
only needs to extract the input content that is relevant for that part of the schema, and can then let its child nodes handle the rest. Internally, this is handled
with a parser function that receives input, applies any regexes/parsers at the current level, then maps the result to its child nodes before recursively calling itself on each of them.
Recursion terminates when it reaches leaf nodes, usually primitive types like `string` or `number`, which simply return the input they receive.

View File

@ -18,7 +18,7 @@ rendered properly in your Markdown viewer.
[ExecuTorch](https://pytorch.org/executorch/stable/index.html) runs PyTorch models on mobile and edge devices. Export your Transformers models to the ExecuTorch format with [Optimum ExecuTorch](https://github.com/huggingface/optimum-executorch) with the command below.
```
```bash
optimum-cli export executorch \
--model "HuggingFaceTB/SmolLM2-135M-Instruct" \
--task "text-generation" \

View File

@ -18,7 +18,6 @@ rendered properly in your Markdown viewer.
This page explains how the Rotary Embedding is computed and applied in Transformers and what types of RoPE are supported.
## Overview
Rotary Position Embeddings are a technique used to inject positional information into attention mechanisms without relying on explicit position encodings.
@ -35,11 +34,9 @@ The Transformers library provides a flexible and extensible implementation of va
| `"longrope"` | [LongRoPE](https://github.com/microsoft/LongRoPE) scaling as in Phi-2 model series. |
| `"llama3"` | RoPE scaling as in Llama3.1. |
## Configuration in Model Configs
# Configuration in Model Configs
To enable and customize rotary embeddings, add a `rope_parameters` field to your models configuration file (`config.json`). This field controls the RoPE behavior across model layers. Note that each RoPE variant defines its own set of expected keys and missing keys will raise an error. See the example below which creates a llama config with default RoPE parameters:
To enable and customize rotary embeddings, add a `rope_parameters` field to your models configuration file (`config.json`). This field controls the RoPE behavior across model layers. Note that each RoPE variant defines its own set of expected keys and missing keys will raise an error. See the example below which creates a llama config with default RoPE parameters:
```python
from transformers import LlamaConfig
@ -62,7 +59,6 @@ config.rope_parameters = {
Some models such as Gemma-3 use different layer types with different attention mechanisms, i.e. "full attention" in some blocks and "sliding-window attention" in others. Transformers supports specifying distinct RoPE parameters per layer type for these models. In this case, `rope_parameters` should be a nested dictionary, where top-level keys correspond to `config.layer_types` and values are per-type RoPE parameters. During model initialization, each decoder layer will automatically look up the matching RoPE configuration based on its declared layer type.
```python
from transformers import Gemma3Config
@ -81,9 +77,7 @@ config.rope_parameters = {
}
```
# Utilities
## Utilities
[[autodoc]] RopeParameters
- __call__

View File

@ -67,6 +67,6 @@ Examples of use can be found in the [example scripts](../examples) or [example n
[[autodoc]] data.data_collator.DataCollatorWithFlattening
# DataCollatorForMultipleChoice
## DataCollatorForMultipleChoice
[[autodoc]] data.data_collator.DataCollatorForMultipleChoice

View File

@ -50,14 +50,14 @@ several advanced alignment methods which can be used to map between the original
token space (e.g., getting the index of the token comprising a given character or the span of characters corresponding
to a given token).
# Multimodal Tokenizer
## Multimodal Tokenizer
Apart from that each tokenizer can be a "multimodal" tokenizer which means that the tokenizer will hold all relevant special tokens
as part of tokenizer attributes for easier access. For example, if the tokenizer is loaded from a vision-language model like LLaVA, you will
be able to access `tokenizer.image_token_id` to obtain the special image token used as a placeholder.
To enable extra special tokens for any type of tokenizer, you have to add the following lines and save the tokenizer. Extra special tokens do not
have to be modality related and can ne anything that the model often needs access to. In the below code, tokenizer at `output_dir` will have direct access
have to be modality related and can be anything that the model often needs access to. In the below code, tokenizer at `output_dir` will have direct access
to three more special tokens.
```python

View File

@ -31,7 +31,7 @@ This model was contributed by [Connor Henderson](https://huggingface.co/connor-h
FastSpeech2's general structure with a Mel-spectrogram decoder was implemented, and the traditional transformer blocks were replaced with conformer blocks as done in the ESPnet library.
#### FastSpeech2 Model Architecture
### FastSpeech2 Model Architecture
![FastSpeech2 Model Architecture](https://www.microsoft.com/en-us/research/uploads/prod/2021/04/fastspeech2-1.png)

View File

@ -33,7 +33,7 @@ this model, including [Alternating Updates][altup] (AltUp), [Learned Augmented R
[MatFormer][matformer], Per-Layer Embeddings (PLE), [Activation Sparsity with Statistical Top-k][spark-transformer], and KV cache sharing. The language model uses
a similar attention pattern to [Gemma 3](./gemma3) with alternating 4 local sliding window self-attention layers for
every global self-attention layer with a maximum context length of 32k tokens. Gemma 3n introduces
[MobileNet v5][mobilenetv5] as the vision encoder, using a default resolution of 768x768 pixels, and adds a newly
MobileNet v5 as the vision encoder, using a default resolution of 768x768 pixels, and adds a newly
trained audio encoder based on the [Universal Speech Model][usm] (USM) architecture.
The instruction-tuned variant was post-trained with knowledge distillation and reinforcement learning.

View File

@ -63,11 +63,6 @@ The attributes can be obtained from model config, as `model.config.num_query_tok
[[autodoc]] InstructBlipVideoVideoProcessor
- preprocess
## InstructBlipVideoImageProcessor
[[autodoc]] InstructBlipVideoImageProcessor
- preprocess
## InstructBlipVideoVisionModel
[[autodoc]] InstructBlipVideoVisionModel

View File

@ -88,16 +88,16 @@ processed_outputs = processor.post_process_keypoint_matching(outputs, image_size
import torch
from PIL import Image
import requests
processor = AutoImageProcessor.from_pretrained("ETH-CVG/lightglue_superpoint")
model = AutoModel.from_pretrained("ETH-CVG/lightglue_superpoint")
# LightGlue requires pairs of images
images = [image1, image2]
inputs = processor(images, return_tensors="pt")
with torch.inference_mode():
outputs = model(**inputs)
# Extract matching information
keypoints0 = outputs.keypoints0 # Keypoints in first image
keypoints1 = outputs.keypoints1 # Keypoints in second image
@ -112,7 +112,7 @@ processed_outputs = processor.post_process_keypoint_matching(outputs, image_size
# Process outputs for visualization
image_sizes = [[(image.height, image.width) for image in images]]
processed_outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2)
for i, output in enumerate(processed_outputs):
print(f"For the image pair {i}")
for keypoint0, keypoint1, matching_score in zip(
@ -147,6 +147,13 @@ processed_outputs = processor.post_process_keypoint_matching(outputs, image_size
- post_process_keypoint_matching
- visualize_keypoint_matching
## LightGlueImageProcessorFast
[[autodoc]] LightGlueImageProcessorFast
- preprocess
- post_process_keypoint_matching
- visualize_keypoint_matching
## LightGlueForKeypointMatching
[[autodoc]] LightGlueForKeypointMatching

View File

@ -247,10 +247,6 @@ model = LlavaNextVideoForConditionalGeneration.from_pretrained(
[[autodoc]] LlavaNextVideoProcessor
## LlavaNextVideoImageProcessor
[[autodoc]] LlavaNextVideoImageProcessor
## LlavaNextVideoVideoProcessor
[[autodoc]] LlavaNextVideoVideoProcessor

View File

@ -54,7 +54,7 @@ model.set_output_embeddings(resized_embeddings)
## Usage Example
#### Instruct model
### Instruct model
```python
import torch
@ -80,7 +80,7 @@ output = model.generate(**inputs, max_new_tokens=25)
print(processor.decode(output[0]))
```
#### Base model
### Base model
```python
import requests

View File

@ -154,7 +154,7 @@ pip install schedulefree
[Schedule Free optimizer (SFO)](https://hf.co/papers/2405.15682) replaces the base optimizers momentum with a combination of averaging and interpolation. Unlike a traditional scheduler, SFO completely removes the need to anneal the learning rate.
SFO supports the RAdam (`schedule_free_radam`), AdamW (`schedule_free_adamw`) and SGD (`schedule_free_sgd`) optimizers. The RAdam scheduler doesn't require `warmup_steps` or `warmup_ratio`.
SFO supports the RAdam (`schedule_free_radam`), AdamW (`schedule_free_adamw`) and SGD (`schedule_free_sgd`) optimizers. The RAdam scheduler doesn't require `warmup_steps`.
By default, it is recommended to set `lr_scheduler_type="constant"`. Other `lr_scheduler_type` values may also work, but combining SFO optimizers with other learning rate schedules could affect SFOs intended behavior and performance.

View File

@ -33,7 +33,7 @@ Export a Transformers model to ONNX with the Optimum CLI or the `optimum.onnxrun
Run the command below to install Optimum and the [exporters](https://huggingface.co/docs/optimum/exporters/overview) module.
```bash
pip install optimum[exporters]
pip install optimum-onnx
```
> [!TIP]

View File

@ -383,6 +383,30 @@ transformers serve \
--attn_implementation "sdpa"
```
### Quantization
transformers serve is compatible with all [quantization methods](https://huggingface.co/docs/transformers/main/quantization/overview) supported in transformers. Quantization can significantly reduce memory usage and improve inference speed, with two main workflows: pre-quantized models and on-the-fly quantization.
#### Pre-quantized Models
For models that are already quantized (e.g., GPTQ, AWQ, bitsandbytes), simply choose a quantized model name for serving.
Make sure to install the required libraries listed in the quantization documentation.
> [!TIP]
> Pre-quantized models generally provide the best balance of performance and accuracy.
#### On the fly quantization
If you want to quantize a model at runtime, you can specify the --quantization flag in the CLI. Note that not all quantization methods support on-the-fly conversion. The full list of supported methods is available in the quantization [overview](https://huggingface.co/docs/transformers/main/quantization/overview).
Currently, with transformers serve, we only supports some methods: ["bnb-4bit", "bnb-8bit"]
For example, to enable 4-bit quantization with bitsandbytes, you need to pass add `--quantization bnb-4bit`:
```sh
transformers serve --quantization bnb-4bit
```
### Performance tips
- Use an efficient attention backend when available:
@ -397,6 +421,4 @@ transformers serve \
- `--dtype {bfloat16|float16}` typically improve throughput and memory use vs. `float32`
- `--load_in_4bit`/`--load_in_8bit` can reduce memory footprint for LoRA setups
- `--force-model <repo_id>` avoids per-request model hints and helps produce stable, repeatable runs

View File

@ -220,7 +220,7 @@ At this point, only three steps remain:
... gradient_accumulation_steps=4,
... per_device_eval_batch_size=32,
... num_train_epochs=10,
... warmup_ratio=0.1,
... warmup_steps=0.1,
... logging_steps=10,
... load_best_model_at_end=True,
... metric_for_best_model="accuracy",

View File

@ -211,7 +211,7 @@ At this point, only three steps remain:
... gradient_accumulation_steps=4,
... per_device_eval_batch_size=16,
... num_train_epochs=3,
... warmup_ratio=0.1,
... warmup_steps=0.1,
... logging_steps=10,
... load_best_model_at_end=True,
... metric_for_best_model="accuracy",

View File

@ -378,7 +378,7 @@ Most of the training arguments are self-explanatory, but one that is quite impor
... learning_rate=5e-5,
... per_device_train_batch_size=batch_size,
... per_device_eval_batch_size=batch_size,
... warmup_ratio=0.1,
... warmup_steps=0.1,
... logging_steps=10,
... load_best_model_at_end=True,
... metric_for_best_model="accuracy",

View File

@ -220,7 +220,7 @@ Al llegar a este punto, solo quedan tres pasos:
... gradient_accumulation_steps=4,
... per_device_eval_batch_size=32,
... num_train_epochs=10,
... warmup_ratio=0.1,
... warmup_steps=0.1,
... logging_steps=10,
... load_best_model_at_end=True,
... metric_for_best_model="accuracy",

View File

@ -200,8 +200,6 @@
title: モデル
- local: main_classes/text_generation
title: テキストの生成
- local: main_classes/onnx
title: ONNX
- local: main_classes/optimizer_schedules
title: 最適化
- local: main_classes/output

View File

@ -1292,7 +1292,7 @@ DeepSpeed は、`LRRangeTest`、`OneCycle`、`WarmupLR`、および`WarmupDecayL
したがって、スケジューラを設定しない場合、これがデフォルトで設定されるスケジューラになります。
設定ファイルで `scheduler` エントリを設定しない場合、[`Trainer`] は
`--lr_scheduler_type`、`--learning_rate`、および `--warmup_steps` または `--warmup_ratio` の値を設定します。
`--lr_scheduler_type`、`--learning_rate`、および `--warmup_steps` の値を設定します。
🤗 それのトランスフォーマーバージョン。
以下は、`WarmupLR`の自動構成された`scheduler`エントリの例です。
@ -1316,8 +1316,7 @@ DeepSpeed は、`LRRangeTest`、`OneCycle`、`WarmupLR`、および`WarmupDecayL
- `warmup_min_lr` の値は `0` です。
- `warmup_max_lr` と `--learning_rate` の値。
- `warmup_num_steps` と `--warmup_steps` の値 (指定されている場合)。それ以外の場合は `--warmup_ratio` を使用します
トレーニング ステップの数を乗算し、切り上げます。
- `warmup_num_steps` と `--warmup_steps` の値 (指定されている場合)
- `total_num_steps` には `--max_steps` の値を指定するか、指定されていない場合は実行時に自動的に導出されます。
環境、データセットのサイズ、およびその他のコマンド ライン引数 (
`WarmupDecayLR`)。

View File

@ -1,50 +0,0 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Exporting 🤗 Transformers models to ONNX
🤗 Transformers は `transformers.onnx` パッケージを提供します。
設定オブジェクトを利用することで、モデルのチェックポイントをONNXグラフに変換することができます。
詳細は[ガイド](../serialization) を参照してください。
を参照してください。
## ONNX Configurations
以下の3つの抽象クラスを提供しています。
エクスポートしたいモデルアーキテクチャのタイプに応じて、継承すべき3つの抽象クラスを提供します
* エンコーダーベースのモデルは [`~onnx.config.OnnxConfig`] を継承します。
* デコーダーベースのモデルは [`~onnx.config.OnnxConfigWithPast`] を継承します。
* エンコーダー・デコーダーモデルは [`~onnx.config.OnnxSeq2SeqConfigWithPast`] を継承しています。
### OnnxConfig
[[autodoc]] onnx.config.OnnxConfig
### OnnxConfigWithPast
[[autodoc]] onnx.config.OnnxConfigWithPast
### OnnxSeq2SeqConfigWithPast
[[autodoc]] onnx.config.OnnxSeq2SeqConfigWithPast
## ONNX Features
各 ONNX 構成は、次のことを可能にする一連の _機能_ に関連付けられています。
さまざまなタイプのトポロジまたはタスクのモデルをエクスポートします。

View File

@ -47,7 +47,7 @@ ONNX形式にエクスポートされたモデルは、以下のように使用
🤗 TransformersモデルをONNXにエクスポートするには、まず追加の依存関係をインストールしてください
```bash
pip install optimum[exporters]
pip install optimum-onnx
```
すべての利用可能な引数を確認するには、[🤗 Optimumドキュメント](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli)を参照してください。または、コマンドラインでヘルプを表示することもできます:
@ -128,64 +128,3 @@ CLIの代わりに、🤗 TransformersモデルをONNXにプログラム的に
### Exporting a model for an unsupported architecture
現在エクスポートできないモデルをサポートするために貢献したい場合、まず[`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview)でサポートされているかどうかを確認し、サポートされていない場合は[🤗 Optimumに貢献](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute)してください。
### Exporting a model with `transformers.onnx`
<Tip warning={true}>
`transformers.onnx`はもはやメンテナンスされていないため、モデルを上記で説明したように🤗 Optimumでエクスポートしてください。このセクションは将来のバージョンで削除されます。
</Tip>
🤗 TransformersモデルをONNXにエクスポートするには、追加の依存関係をインストールしてください
```bash
pip install transformers[onnx]
```
`transformers.onnx`パッケージをPythonモジュールとして使用して、事前に用意された設定を使用してチェックポイントをエクスポートする方法は以下の通りです
```bash
python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
```
この方法は、`--model`引数で定義されたチェックポイントのONNXグラフをエクスポートします。🤗 Hubのいずれかのチェックポイントまたはローカルに保存されたチェックポイントを渡すことができます。エクスポートされた`model.onnx`ファイルは、ONNX標準をサポートする多くのアクセラレータで実行できます。例えば、ONNX Runtimeを使用してモデルを読み込んで実行する方法は以下の通りです
```python
>>> from transformers import AutoTokenizer
>>> from onnxruntime import InferenceSession
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
>>> session = InferenceSession("onnx/model.onnx")
>>> # ONNX Runtime expects NumPy arrays as input
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
```
必要な出力名(例: `["last_hidden_state"]`は、各モデルのONNX構成を確認することで取得できます。例えば、DistilBERTの場合、次のようになります
```python
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
>>> config = DistilBertConfig()
>>> onnx_config = DistilBertOnnxConfig(config)
>>> print(list(onnx_config.outputs.keys()))
["last_hidden_state"]
```
ハブから純粋なTensorFlowのチェックポイントをプログラム的にエクスポートするプロセスは、以下のように同様です
```bash
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
```
ローカルに保存されたモデルをエクスポートする場合、モデルの重みとトークナイザのファイルを同じディレクトリに保存してください(例: `local-pt-checkpoint`)。その後、`transformers.onnx`パッケージの `--model`引数を希望するディレクトリに向けて設定して、ONNXにエクスポートします
```bash
python -m transformers.onnx --model=local-pt-checkpoint onnx/
```

View File

@ -219,7 +219,7 @@ MInDS-14 データセットのサンプリング レートは 8khz です (こ
... gradient_accumulation_steps=4,
... per_device_eval_batch_size=32,
... num_train_epochs=10,
... warmup_ratio=0.1,
... warmup_steps=0.1,
... logging_steps=10,
... load_best_model_at_end=True,
... metric_for_best_model="accuracy",

View File

@ -216,7 +216,7 @@ Datasets、🤗 データセット ライブラリから Food-101 データセ
... gradient_accumulation_steps=4,
... per_device_eval_batch_size=16,
... num_train_epochs=3,
... warmup_ratio=0.1,
... warmup_steps=0.1,
... logging_steps=10,
... load_best_model_at_end=True,
... metric_for_best_model="accuracy",

View File

@ -360,7 +360,7 @@ You should probably TRAIN this model on a down-stream task to be able to use it
... learning_rate=5e-5,
... per_device_train_batch_size=batch_size,
... per_device_eval_batch_size=batch_size,
... warmup_ratio=0.1,
... warmup_steps=0.1,
... logging_steps=10,
... load_best_model_at_end=True,
... metric_for_best_model="accuracy",

View File

@ -406,8 +406,6 @@
title: Models
- local: main_classes/text_generation
title: 텍스트 생성
- local: main_classes/onnx
title: ONNX
- local: main_classes/optimizer_schedules
title: 최적화
- local: main_classes/output

View File

@ -1,45 +0,0 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# 🤗 Transformers 모델을 ONNX로 내보내기[[exporting--transformers-models-to-onnx]]
🤗 트랜스포머는 `transformers.onnx` 패키지를 제공하며, 이 패키지는 설정 객체를 활용하여 모델 체크포인트를 ONNX 그래프로 변환할 수 있게 합니다.
🤗 Transformers에 대한 자세한 내용은 [이 가이드](../serialization)를 참조하세요.
## ONNX 설정[[onnx-configurations]]
내보내려는(export) 모델 아키텍처의 유형에 따라 상속받아야 할 세 가지 추상 클래스를 제공합니다:
* 인코더 기반 모델은 [`~onnx.config.OnnxConfig`]을 상속받습니다.
* 디코더 기반 모델은 [`~onnx.config.OnnxConfigWithPast`]을 상속받습니다.
* 인코더-디코더 기반 모델은 [`~onnx.config.OnnxSeq2SeqConfigWithPast`]을 상속받습니다.
### OnnxConfig[[transformers.onnx.OnnxConfig]]
[[autodoc]] onnx.config.OnnxConfig
### OnnxConfigWithPast[[transformers.onnx.OnnxConfigWithPast]]
[[autodoc]] onnx.config.OnnxConfigWithPast
### OnnxSeq2SeqConfigWithPast[[OnnxSeq2SeqConfigWithPast]]
[[autodoc]] onnx.config.OnnxSeq2SeqConfigWithPast
## ONNX 특징[[onnx-features]]
각 ONNX 설정은 다양한 유형의 토폴로지나 작업에 대해 모델을 내보낼 수 있게(exporting) 해주는 _features_ 세트와 연관되어 있습니다.

View File

@ -154,7 +154,7 @@ pip install schedulefree
[Schedule Free optimizer (SFO)](https://hf.co/papers/2405.15682)는 기본 옵티마이저의 모멘텀 대신 평균화(averaging)와 보간(interpolation)을 조합하여 사용합니다. 덕분에 기존의 학습률 스케줄러와 달리, SFO는 학습률을 점진적으로 낮추는 절차가 아예 필요 없습니다.
SFO는 RAdam(`schedule_free_radam`), AdamW(`schedule_free_adamw`), SGD(`schedule_free_sgd`) 옵티마이저를 지원합니다. RAdam 스케줄러는 `warmup_steps``warmup_ratio` 설정이 필요하지 않습니다.
SFO는 RAdam(`schedule_free_radam`), AdamW(`schedule_free_adamw`), SGD(`schedule_free_sgd`) 옵티마이저를 지원합니다. RAdam 스케줄러는 `warmup_steps`.
기본적으로 `lr_scheduler_type="constant"`로 설정하는 것을 권장합니다. 다른 `lr_scheduler_type` 값도 동작할 순 있으나, SFO 옵티마이저와 다른 학습률 스케줄을 함께 사용하면 SFO의 의도된 동작과 성능에 영향을 줄 수 있습니다.

View File

@ -47,7 +47,7 @@ ONNX 형식으로 내보낸 모델은 다음과 같이 사용할 수 있습니
🤗 Transformers 모델을 ONNX로 내보내려면 먼저 추가 종속성을 설치하세요:
```bash
pip install optimum[exporters]
pip install optimum-onnx
```
사용 가능한 모든 인수를 확인하려면 [🤗 Optimum 문서](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli)를 참조하거나 명령줄에서 도움말을 보세요.
@ -123,59 +123,3 @@ CLI 대신에 `optimum.onnxruntime`을 사용하여 프로그래밍 방식으로
### 지원되지 않는 아키텍처의 모델 내보내기 [[exporting-a-model-for-an-unsupported-architecture]]
현재 내보낼 수 없는 모델을 지원하기 위해 기여하려면, 먼저 [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview)에서 지원되는지 확인한 후 지원되지 않는 경우에는 [🤗 Optimum에 기여](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute)하세요.
### `transformers.onnx`를 사용하여 모델 내보내기 [[exporting-a-model-with-transformersonnx]]
<Tip warning={true}>
`tranformers.onnx`는 더 이상 유지되지 않습니다. 위에서 설명한 대로 🤗 Optimum을 사용하여 모델을 내보내세요. 이 섹션은 향후 버전에서 제거될 예정입니다.
</Tip>
🤗 Transformers 모델을 ONNX로 내보내려면 추가 종속성을 설치하세요:
```bash
pip install transformers[onnx]
```
`transformers.onnx` 패키지를 Python 모듈로 사용하여 준비된 구성을 사용하여 체크포인트를 내보냅니다:
```bash
python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
```
이렇게 하면 `--model` 인수에 정의된 체크포인트의 ONNX 그래프가 내보내집니다. 🤗 Hub에서 제공하는 체크포인트나 로컬에 저장된 체크포인트를 전달할 수 있습니다. 결과로 생성된 `model.onnx` 파일은 ONNX 표준을 지원하는 많은 가속기 중 하나에서 실행할 수 있습니다. 예를 들어, 다음과 같이 ONNX Runtime을 사용하여 모델을 로드하고 실행할 수 있습니다:
```python
>>> from transformers import AutoTokenizer
>>> from onnxruntime import InferenceSession
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
>>> session = InferenceSession("onnx/model.onnx")
>>> # ONNX Runtime expects NumPy arrays as input
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
```
필요한 출력 이름(예: `["last_hidden_state"]`)은 각 모델의 ONNX 구성을 확인하여 얻을 수 있습니다. 예를 들어, DistilBERT의 경우 다음과 같습니다:
```python
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
>>> config = DistilBertConfig()
>>> onnx_config = DistilBertOnnxConfig(config)
>>> print(list(onnx_config.outputs.keys()))
["last_hidden_state"]
```
Hub의 TensorFlow 체크포인트에 대해서도 동일한 프로세스가 적용됩니다. 예를 들어, 다음과 같이 순수한 TensorFlow 체크포인트를 내보냅니다:
```bash
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
```
로컬에 저장된 모델을 내보내려면 모델의 가중치 파일과 토크나이저 파일을 동일한 디렉토리에 저장한 다음, transformers.onnx 패키지의 --model 인수를 원하는 디렉토리로 지정하여 ONNX로 내보냅니다:
```bash
python -m transformers.onnx --model=local-pt-checkpoint onnx/
```

View File

@ -221,7 +221,7 @@ MinDS-14 데이터 세트의 샘플링 속도는 8khz이므로(이 정보는 [
... gradient_accumulation_steps=4,
... per_device_eval_batch_size=32,
... num_train_epochs=10,
... warmup_ratio=0.1,
... warmup_steps=0.1,
... logging_steps=10,
... load_best_model_at_end=True,
... metric_for_best_model="accuracy",

View File

@ -212,7 +212,7 @@ Hugging Face 계정에 로그인하여 모델을 업로드하고 커뮤니티에
... gradient_accumulation_steps=4,
... per_device_eval_batch_size=16,
... num_train_epochs=3,
... warmup_ratio=0.1,
... warmup_steps=0.1,
... logging_steps=10,
... load_best_model_at_end=True,
... metric_for_best_model="accuracy",

View File

@ -357,7 +357,7 @@ You should probably TRAIN this model on a down-stream task to be able to use it
... learning_rate=5e-5,
... per_device_train_batch_size=batch_size,
... per_device_eval_batch_size=batch_size,
... warmup_ratio=0.1,
... warmup_steps=0.1,
... logging_steps=10,
... load_best_model_at_end=True,
... metric_for_best_model="accuracy",

View File

@ -107,8 +107,6 @@
title: 模型
- local: main_classes/text_generation
title: 文本生成
- local: main_classes/onnx
title: ONNX
- local: main_classes/optimizer_schedules
title: Optimization
- local: main_classes/output

View File

@ -1206,7 +1206,7 @@ DeepSpeed支持`LRRangeTest`、`OneCycle`、`WarmupLR`和`WarmupDecayLR`学习
- 通过 `--lr_scheduler_type constant_with_warmup` 实现 `WarmupLR`
- 通过 `--lr_scheduler_type linear` 实现 `WarmupDecayLR`。这也是 `--lr_scheduler_type` 的默认值,因此,如果不配置调度器,这将是默认配置的调度器。
如果在配置文件中不配置 `scheduler` 条目,[`Trainer`] 将使用 `--lr_scheduler_type`、`--learning_rate` 和 `--warmup_steps` 或 `--warmup_ratio` 的值来配置其🤗 Transformers 版本。
如果在配置文件中不配置 `scheduler` 条目,[`Trainer`] 将使用 `--lr_scheduler_type`、`--learning_rate` 和 `--warmup_steps` 的值来配置其🤗 Transformers 版本。
以下是 `WarmupLR` 的自动配置示例:
@ -1227,7 +1227,7 @@ DeepSpeed支持`LRRangeTest`、`OneCycle`、`WarmupLR`和`WarmupDecayLR`学习
- `warmup_min_lr` 的值为 `0`。
- `warmup_max_lr` 的值为 `--learning_rate`。
- `warmup_num_steps` 的值为 `--warmup_steps`(如果提供)。否则,将使用 `--warmup_ratio` 乘以训练步骤的数量,并四舍五入。
- `warmup_num_steps` 的值为 `--warmup_steps`(如果提供)。
- `total_num_steps` 的值为 `--max_steps` 或者如果没有提供,将在运行时根据环境、数据集的大小和其他命令行参数(对于 `WarmupDecayLR` 来说需要)自动推导。
当然,您可以接管任何或所有的配置值,并自行设置这些值:

View File

@ -1,45 +0,0 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# 导出 🤗 Transformers 模型到 ONNX
🤗 Transformers提供了一个`transformers.onnx`通过利用配置对象您可以将模型checkpoints转换为ONNX图。
有关更多详细信息,请参阅导出 🤗 Transformers 模型的[指南](../serialization)。
## ONNX Configurations
我们提供了三个抽象类,取决于您希望导出的模型架构类型:
* 基于编码器的模型继承 [`~onnx.config.OnnxConfig`]
* 基于解码器的模型继承 [`~onnx.config.OnnxConfigWithPast`]
* 编码器-解码器模型继承 [`~onnx.config.OnnxSeq2SeqConfigWithPast`]
### OnnxConfig
[[autodoc]] onnx.config.OnnxConfig
### OnnxConfigWithPast
[[autodoc]] onnx.config.OnnxConfigWithPast
### OnnxSeq2SeqConfigWithPast
[[autodoc]] onnx.config.OnnxSeq2SeqConfigWithPast
## ONNX Features
每个ONNX配置与一组 _特性_ 相关联,使您能够为不同类型的拓扑结构或任务导出模型。

View File

@ -47,7 +47,7 @@ rendered properly in your Markdown viewer.
要将 🤗 Transformers 模型导出为 ONNX首先需要安装额外的依赖项
```bash
pip install optimum[exporters]
pip install optimum-onnx
```
请参阅 [🤗 Optimum 文档](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli) 以查看所有可用参数,或者在命令行中查看帮助:
@ -117,53 +117,3 @@ optimum-cli export onnx --model local_path --task question-answering distilbert_
### 导出尚未支持的架构的模型
如果你想要为当前无法导出的模型添加支持,请先检查 [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview) 是否支持该模型,如果不支持,你可以 [直接为 🤗 Optimum 贡献代码](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute)。
### 使用 `transformers.onnx` 导出模型
<Tip warning={true}>
`transformers.onnx` 不再进行维护,请如上所述,使用 🤗 Optimum 导出模型。这部分内容将在未来版本中删除。
</Tip>
要使用 `transformers.onnx` 将 🤗 Transformers 模型导出为 ONNX请安装额外的依赖项
```bash
pip install transformers[onnx]
```
`transformers.onnx` 包作为 Python 模块使用,以使用现成的配置导出检查点:
```bash
python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
```
以上代码将导出由 `--model` 参数定义的检查点的 ONNX 图。传入任何 🤗 Hub 上或者存储与本地的检查点。生成的 `model.onnx` 文件可以在支持 ONNX 标准的众多加速引擎上运行。例如,使用 ONNX Runtime 加载并运行模型,如下所示:
```python
>>> from transformers import AutoTokenizer
>>> from onnxruntime import InferenceSession
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
>>> session = InferenceSession("onnx/model.onnx")
>>> # ONNX Runtime expects NumPy arrays as input
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
```
可以通过查看每个模型的 ONNX 配置来获取所需的输出名(例如 `["last_hidden_state"]`)。例如,对于 DistilBERT可以用以下代码获取输出名称
```python
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
>>> config = DistilBertConfig()
>>> onnx_config = DistilBertOnnxConfig(config)
>>> print(list(onnx_config.outputs.keys()))
["last_hidden_state"]
```
要导出本地存储的模型,请将模型的权重和分词器文件保存在同一目录中(例如 `local-pt-checkpoint`),然后通过将 `transformers.onnx` 包的 `--model` 参数指向该目录,将其导出为 ONNX
```bash
python -m transformers.onnx --model=local-pt-checkpoint onnx/
```

View File

@ -151,7 +151,6 @@ def main():
if dist.is_initialized() and dp_mesh.size() > 1:
model = FSDP(model, device_mesh=dp_mesh, sharding_strategy=ShardingStrategy.NO_SHARD)
use_ddp = True
pass
model.train()

View File

@ -122,7 +122,7 @@ class GLUETransformer(BaseTransformer):
out_label_list = [[] for _ in range(out_label_ids.shape[0])]
preds_list = [[] for _ in range(out_label_ids.shape[0])]
results = {**{"val_loss": val_loss_mean}, **compute_metrics(self.hparams.task, preds, out_label_ids)}
results = {"val_loss": val_loss_mean, **compute_metrics(self.hparams.task, preds, out_label_ids)}
ret = dict(results.items())
ret["log"] = results

View File

@ -125,15 +125,23 @@ def token_type_ids_mask_function(
# If it's 1 for both query and key/value, we are in an image block
# NOTE: static cache shape goes beyond input seq length, while token_type_ids.shape[1] == input seq length
# Since vmap doesn't support `if statement` we workaround it with `torch.where`
safe_idx = torch.where(kv_idx < token_type_ids.shape[1], kv_idx, 0)
token_type_ids_at_kv_idx = token_type_ids[batch_idx, safe_idx]
safe_q_idx = torch.where(q_idx < token_type_ids.shape[1], q_idx, 0)
safe_kv_idx = torch.where(kv_idx < token_type_ids.shape[1], kv_idx, 0)
token_type_ids_at_q_idx = token_type_ids[batch_idx, safe_q_idx]
token_type_ids_at_q_idx = torch.where(q_idx < token_type_ids.shape[1], token_type_ids_at_q_idx, 0)
token_type_ids_at_kv_idx = token_type_ids[batch_idx, safe_kv_idx]
token_type_ids_at_kv_idx = torch.where(kv_idx < token_type_ids.shape[1], token_type_ids_at_kv_idx, 0)
image_group_ids_at_kv_idx = image_group_ids[batch_idx, safe_idx]
image_group_ids_at_q_idx = image_group_ids[batch_idx, safe_q_idx]
image_group_ids_at_q_idx = torch.where(q_idx < image_group_ids.shape[1], image_group_ids_at_q_idx, -1)
image_group_ids_at_kv_idx = image_group_ids[batch_idx, safe_kv_idx]
image_group_ids_at_kv_idx = torch.where(kv_idx < image_group_ids.shape[1], image_group_ids_at_kv_idx, -1)
is_image_block = (token_type_ids[batch_idx, q_idx] == 1) & (token_type_ids_at_kv_idx == 1)
same_image_block = image_group_ids[batch_idx, q_idx] == image_group_ids_at_kv_idx
is_image_block = (token_type_ids_at_q_idx == 1) & (token_type_ids_at_kv_idx == 1)
same_image_block = image_group_ids_at_q_idx == image_group_ids_at_kv_idx
# This is bidirectional attention whenever we are dealing with image tokens
return is_image_block & same_image_block

View File

@ -41,7 +41,7 @@ python run_audio_classification.py \
--learning_rate 3e-5 \
--max_length_seconds 1 \
--attention_mask False \
--warmup_ratio 0.1 \
--warmup_steps 0.1 \
--num_train_epochs 5 \
--per_device_train_batch_size 32 \
--gradient_accumulation_steps 4 \
@ -82,7 +82,7 @@ python run_audio_classification.py \
--learning_rate 3e-4 \
--max_length_seconds 16 \
--attention_mask False \
--warmup_ratio 0.1 \
--warmup_steps 0.1 \
--num_train_epochs 10 \
--per_device_train_batch_size 8 \
--gradient_accumulation_steps 4 \

View File

@ -165,7 +165,7 @@ python run_mae.py \
--lr_scheduler_type cosine \
--weight_decay 0.05 \
--num_train_epochs 800 \
--warmup_ratio 0.05 \
--warmup_steps 0.05 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 8 \
--logging_strategy steps \

View File

@ -21,12 +21,12 @@ A useful guide for English-Chinese translation of Hugging Face documentation
Dictionary
Hugging Face: 抱抱脸
Hugging Face: Hugging Face不翻译
token: 词符(并用括号标注原英文)
tokenize: 词符化(并用括号标注原英文)
tokenizer: 词符化器(并用括号标注原英文)
transformer: transformer不翻译
pipeline: 流水线
pipeline: pipeline不翻译
API: API (不翻译)
inference: 推理
Trainer: 训练器。当作为类名出现时不翻译。
@ -107,9 +107,9 @@ checkpoint: 检查点
- [用 DistilBERT 做问答](https://huggingface.co/distilbert/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species)
- [用 T5 做翻译](https://huggingface.co/google-t5/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin)
**[Write With Transformer](https://transformer.huggingface.co)**,由抱抱脸团队打造,是一个文本生成的官方 demo。
**[Write With Transformer](https://transformer.huggingface.co)**,由 Hugging Face 团队打造,是一个文本生成的官方 demo。
## 如果你在寻找由抱抱脸团队提供的定制化支持服务
## 如果你在寻找由 Hugging Face 团队提供的定制化支持服务
<a target="_blank" href="https://huggingface.co/support">
<img alt="HuggingFace Expert Acceleration Program" src="https://huggingface.co/front/thumbnails/support.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
@ -117,25 +117,25 @@ checkpoint: 检查点
## 快速上手
我们为快速使用模型提供了 `pipeline` 流水线API。流水线聚合了预训练模型和对应的文本预处理。下面是一个快速使用流水线去判断正负面情绪的例子:
我们为快速使用模型提供了 `pipeline` API。Pipeline 聚合了预训练模型和对应的文本预处理。下面是一个快速使用 pipeline 去判断正负面情绪的例子:
```python
>>> from transformers import pipeline
# 使用情绪分析流水线
# 使用情绪分析 pipeline
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to introduce pipeline to the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9996980428695679}]
```
第二行代码下载并缓存了流水线使用的预训练模型,而第三行代码则在给定的文本上进行了评估。这里的答案正面 (positive) 具有 99 的置信度。
第二行代码下载并缓存了 pipeline 使用的预训练模型,而第三行代码则在给定的文本上进行了评估。这里的答案"正面" (positive) 具有 99 的置信度。
许多的 NLP 任务都有开箱即用的预训练流水线。比如说,我们可以轻松的从给定文本中抽取问题答案:
许多的 NLP 任务都有开箱即用的预训练 `pipeline`。比如说,我们可以轻松的从给定文本中抽取问题答案:
``` python
>>> from transformers import pipeline
# 使用问答流水线
# 使用问答 pipeline
>>> question_answerer = pipeline('question-answering')
>>> question_answerer({
... 'question': 'What is the name of the repository ?',
@ -145,7 +145,7 @@ checkpoint: 检查点
```
除了给出答案,预训练模型还给出了对应的置信度分数、答案在词符化 (tokenized) 后的文本中开始和结束的位置。你可以从[这个教程](https://huggingface.co/docs/transformers/task_summary)了解更多流水线API支持的任务。
除了给出答案,预训练模型还给出了对应的置信度分数、答案在词符化 (tokenized) 后的文本中开始和结束的位置。你可以从[这个教程](https://huggingface.co/docs/transformers/task_summary)了解更多 `pipeline` API 支持的任务。
要在你的任务上下载和使用任意预训练模型也很简单,只需三行代码。这里是 PyTorch 版的示例:
```python
@ -193,11 +193,11 @@ checkpoint: 检查点
1. 为你的需求轻松定制专属模型和用例:
- 我们为每种模型架构提供了多个用例来复现原论文结果
- 模型内部结构保持透明一致
- 模型文件可单独使用,方便改和快速实验
- 模型文件可单独使用,方便改和快速实验
## 什么情况下我不该用 transformers
- 本库并不是模块化的神经网络工具箱。模型文件中的代码特意呈若璞玉,未经额外抽象封装,以便研究人员快速迭代改而不致溺于抽象和文件跳转之中。
- 本库并不是模块化的神经网络工具箱。模型文件中的代码特意呈若璞玉,未经额外抽象封装,以便研究人员快速迭代改而不致溺于抽象和文件跳转之中。
- `Trainer` API 并非兼容任何模型,只为本库之模型优化。若是在寻找适用于通用机器学习的训练循环实现,请另觅他库。
- 尽管我们已尽力而为,[examples 目录](https://github.com/huggingface/transformers/tree/main/examples)中的脚本也仅为用例而已。对于你的特定问题,它们并不一定开箱即用,可能需要改几行代码以适之。

View File

@ -30,10 +30,10 @@ You can open any page of the documentation as a notebook in Colab (there is a bu
| Notebook | Description | | | |
|:----------|:-------------|:-------------|:------------|------:|
| [Quicktour of the library](https://github.com/huggingface/notebooks/blob/main/transformers_doc/en/quicktour.ipynb) | A presentation of the various APIs in Transformers |[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/transformers_doc/en/quicktour.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/en/transformers_doc/quicktour.ipynb)| [![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/transformers_doc/en/quicktour.ipynb )|
| [Summary of the tasks](https://github.com/huggingface/notebooks/blob/main/transformers_doc/en/task_summary.ipynb) | How to run the models of the Transformers library task by task |[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/transformers_doc/en/task_summary.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/transformers_doc/en/task_summary.ipynb)| [![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/transformers_doc/en/task_summary.ipynb )|
| [Preprocessing data](https://github.com/huggingface/notebooks/blob/main/transformers_doc/en/preprocessing.ipynb) | How to use a tokenizer to preprocess your data |[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/transformers_doc/en/preprocessing.ipynb) | [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/transformers_doc/en/preprocessing.ipynb)|[![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/transformers_doc/en/preprocessing.ipynb )|
| [Fine-tuning a pretrained model](https://github.com/huggingface/notebooks/blob/main/transformers_doc/en/training.ipynb) | How to use the Trainer to fine-tune a pretrained model |[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/transformers_doc/en/training.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/transformers_doc/en/training.ipynb)| [![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/transformers_doc/en/training.ipynb )|
| [Quicktour of the library](https://github.com/huggingface/notebooks/blob/main/transformers_doc/en/quicktour.ipynb) | A presentation of the various APIs in Transformers |[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/transformers_doc/en/quicktour.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/en/transformers_doc/quicktour.ipynb)| |
| [Summary of the tasks](https://github.com/huggingface/notebooks/blob/main/transformers_doc/en/task_summary.ipynb) | How to run the models of the Transformers library task by task |[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/transformers_doc/en/task_summary.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/transformers_doc/en/task_summary.ipynb)| |
| [Preprocessing data](https://github.com/huggingface/notebooks/blob/main/transformers_doc/en/preprocessing.ipynb) | How to use a tokenizer to preprocess your data |[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/transformers_doc/en/preprocessing.ipynb) | [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/transformers_doc/en/preprocessing.ipynb)||
| [Fine-tuning a pretrained model](https://github.com/huggingface/notebooks/blob/main/transformers_doc/en/training.ipynb) | How to use the Trainer to fine-tune a pretrained model |[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/transformers_doc/en/training.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/transformers_doc/en/training.ipynb)| |
| [Summary of the tokenizers](https://github.com/huggingface/notebooks/blob/main/transformers_doc/en/tokenizer_summary.ipynb) | The differences between the tokenizers algorithm |[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/transformers_doc/en/tokenizer_summary.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/transformers_doc/en/tokenizer_summary.ipynb)|[![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/transformers_doc/en/tokenizer_summary.ipynb )|
| [Multilingual models](https://github.com/huggingface/notebooks/blob/main/transformers_doc/en/multilingual.ipynb) | How to use the multilingual models of the library |[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/transformers_doc/en/multilingual.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/transformers_doc/en/multilingual.ipynb)|[![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/transformers_doc/en/multilingual.ipynb)|
@ -45,16 +45,16 @@ You can open any page of the documentation as a notebook in Colab (there is a bu
|:----------|:-------------|:-------------|:-------------|------:|
| [Train your tokenizer](https://github.com/huggingface/notebooks/blob/main/examples/tokenizer_training.ipynb) | How to train and use your very own tokenizer |[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/tokenizer_training.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/tokenizer_training.ipynb)|[![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/examples/tokenizer_training.ipynb)|
| [Train your language model](https://github.com/huggingface/notebooks/blob/main/examples/language_modeling_from_scratch.ipynb) | How to easily start using transformers |[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling_from_scratch.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/language_modeling_from_scratch.ipynb)|[![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/examples/language_modeling_from_scratch.ipynb)|
| [How to fine-tune a model on text classification](https://github.com/huggingface/notebooks/blob/main/examples/text_classification.ipynb)| Show how to preprocess the data and fine-tune a pretrained model on any GLUE task. | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb)| [![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb)|
| [How to fine-tune a model on text classification](https://github.com/huggingface/notebooks/blob/main/examples/text_classification.ipynb)| Show how to preprocess the data and fine-tune a pretrained model on any GLUE task. | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb)| |
| [How to fine-tune a model on language modeling](https://github.com/huggingface/notebooks/blob/main/examples/language_modeling.ipynb)| Show how to preprocess the data and fine-tune a pretrained model on a causal or masked LM task. | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb)| [![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb)|
| [How to fine-tune a model on token classification](https://github.com/huggingface/notebooks/blob/main/examples/token_classification.ipynb)| Show how to preprocess the data and fine-tune a pretrained model on a token classification task (NER, PoS). | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb)| [![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb)|
| [How to fine-tune a model on question answering](https://github.com/huggingface/notebooks/blob/main/examples/question_answering.ipynb)| Show how to preprocess the data and fine-tune a pretrained model on SQUAD. | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb)| [![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb)|
| [How to fine-tune a model on multiple choice](https://github.com/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb)| Show how to preprocess the data and fine-tune a pretrained model on SWAG. | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb)| [![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb)|
| [How to fine-tune a model on translation](https://github.com/huggingface/notebooks/blob/main/examples/translation.ipynb)| Show how to preprocess the data and fine-tune a pretrained model on WMT. | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/translation.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/translation.ipynb)| [![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/examples/translation.ipynb)|
| [How to fine-tune a model on summarization](https://github.com/huggingface/notebooks/blob/main/examples/summarization.ipynb)| Show how to preprocess the data and fine-tune a pretrained model on XSUM. | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/summarization.ipynb)| [![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/examples/summarization.ipynb)|
| [How to train a language model from scratch](https://github.com/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb)| Highlight all the steps to effectively train Transformer model on custom data | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb)| [![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/notebooks/01_how_to_train.ipynb)|
| [How to generate text](https://github.com/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb)| How to use different decoding methods for language generation with transformers | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb)| [![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/notebooks/02_how_to_generate.ipynb)|
| [Reformer](https://github.com/huggingface/blog/blob/main/notebooks/03_reformer.ipynb)| How Reformer pushes the limits of language modeling | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/blog/blob/main/notebooks/03_reformer.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/patrickvonplaten/blog/blob/main/notebooks/03_reformer.ipynb)| [![Open in AMD Dev Cloud](https://oneclickamd.ai/static/amd.svg?v=2)](http://oneclickamd.ai/github/huggingface/notebooks/blob/main/notebooks/03_reformer.ipynb)|
| [How to fine-tune a model on token classification](https://github.com/huggingface/notebooks/blob/main/examples/token_classification.ipynb)| Show how to preprocess the data and fine-tune a pretrained model on a token classification task (NER, PoS). | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb)| |
| [How to fine-tune a model on question answering](https://github.com/huggingface/notebooks/blob/main/examples/question_answering.ipynb)| Show how to preprocess the data and fine-tune a pretrained model on SQUAD. | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb)| |
| [How to fine-tune a model on multiple choice](https://github.com/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb)| Show how to preprocess the data and fine-tune a pretrained model on SWAG. | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb)| |
| [How to fine-tune a model on translation](https://github.com/huggingface/notebooks/blob/main/examples/translation.ipynb)| Show how to preprocess the data and fine-tune a pretrained model on WMT. | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/translation.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/translation.ipynb)| |
| [How to fine-tune a model on summarization](https://github.com/huggingface/notebooks/blob/main/examples/summarization.ipynb)| Show how to preprocess the data and fine-tune a pretrained model on XSUM. | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/summarization.ipynb)| |
| [How to train a language model from scratch](https://github.com/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb)| Highlight all the steps to effectively train Transformer model on custom data | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb)| |
| [How to generate text](https://github.com/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb)| How to use different decoding methods for language generation with transformers | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb)| |
| [Reformer](https://github.com/huggingface/blog/blob/main/notebooks/03_reformer.ipynb)| How Reformer pushes the limits of language modeling | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/blog/blob/main/notebooks/03_reformer.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/patrickvonplaten/blog/blob/main/notebooks/03_reformer.ipynb)| |
#### Computer Vision[[pytorch-cv]]

View File

@ -117,6 +117,7 @@ _deps = [
"importlib_metadata",
"ipadic>=1.0.0,<2.0",
"jinja2>=3.1.0",
"jmespath>=1.0.1",
"kenlm",
"kernels>=0.10.2,<0.11",
"librosa",
@ -169,7 +170,7 @@ _deps = [
"tiktoken",
"timm<=1.0.19,!=1.0.18",
"tokenizers>=0.22.0,<=0.23.0",
"torch>=2.2,<2.9",
"torch>=2.2",
"torchaudio",
"torchvision",
"pyctcdecode>=0.4.0",
@ -294,7 +295,7 @@ extras["num2words"] = deps_list("num2words")
extras["sentencepiece"] = deps_list("sentencepiece", "protobuf")
extras["tiktoken"] = deps_list("tiktoken", "blobfile")
extras["mistral-common"] = deps_list("mistral-common[opencv]")
extras["chat_template"] = deps_list("jinja2")
extras["chat_template"] = deps_list("jinja2", "jmespath")
extras["testing"] = (
deps_list(
"pytest",

View File

@ -129,8 +129,6 @@ _import_structure = {
],
"loss": [],
"modelcard": ["ModelCard"],
# Models
"onnx": [],
"pipelines": [
"AudioClassificationPipeline",
"AutomaticSpeechRecognitionPipeline",

View File

@ -249,7 +249,7 @@ class Chat:
# Generation settings
config = load_generation_config(generation_config)
config.update(**{"do_sample": True, "max_new_tokens": 256}) # some default values
config.update(do_sample=True, max_new_tokens=256) # some default values
config.update(**parse_generate_flags(generate_flags))
self.config = config

View File

@ -51,7 +51,7 @@ def run(
Optional[str],
typer.Option(help="Name of the column to use as input. For multi columns input use 'column1,columns2'"),
] = None,
format: Annotated[FormatEnum, typer.Option(help="Input format to read from", case_sensitive=False)] = "infer", # type: ignore
format: Annotated[FormatEnum, typer.Option(help="Input format to read from", case_sensitive=False)] = "pipe", # type: ignore
device: Annotated[
int, typer.Option(help="Indicate the device to run onto, -1 indicates CPU, >= 0 indicates GPU.")
] = -1,

View File

@ -377,14 +377,10 @@ class Serve:
help="Which attention implementation to use; you can run --attn_implementation=flash_attention_2, in which case you must install this manually by running `pip install flash-attn --no-build-isolation`."
),
] = None,
load_in_8bit: Annotated[
bool, typer.Option(help="Whether to use 8 bit precision for the base model - works only with LoRA.")
] = False,
load_in_4bit: Annotated[
bool, typer.Option(help="Whether to use 4 bit precision for the base model - works only with LoRA.")
] = False,
bnb_4bit_quant_type: Annotated[str, typer.Option(help="Quantization type.")] = "nf4",
use_bnb_nested_quant: Annotated[bool, typer.Option(help="Whether to use nested quantization.")] = False,
quantization: Annotated[
Optional[str],
typer.Option(help="Which quantization method to use. choices: 'bnb-4bit', 'bnb-8bit'"),
] = None,
host: Annotated[str, typer.Option(help="Interface the server will listen to.")] = "localhost",
port: Annotated[int, typer.Option(help="Port the server will listen to.")] = 8000,
model_timeout: Annotated[
@ -424,10 +420,7 @@ class Serve:
self.dtype = dtype
self.trust_remote_code = trust_remote_code
self.attn_implementation = attn_implementation
self.load_in_8bit = load_in_8bit
self.load_in_4bit = load_in_4bit
self.bnb_4bit_quant_type = bnb_4bit_quant_type
self.use_bnb_nested_quant = use_bnb_nested_quant
self.quantization = quantization
self.host = host
self.port = port
self.model_timeout = model_timeout
@ -1688,22 +1681,20 @@ class Serve:
Returns:
`Optional[BitsAndBytesConfig]`: The quantization config.
"""
if self.load_in_4bit:
if self.quantization == "bnb-4bit":
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
# For consistency with model weights, we use the same value as `dtype`
bnb_4bit_compute_dtype=self.dtype,
bnb_4bit_quant_type=self.bnb_4bit_quant_type,
bnb_4bit_use_double_quant=self.use_bnb_nested_quant,
bnb_4bit_quant_storage=self.dtype,
)
elif self.load_in_8bit:
quantization_config = BitsAndBytesConfig(
load_in_8bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)
elif self.quantization == "bnb-8bit":
quantization_config = BitsAndBytesConfig(load_in_8bit=True)
else:
quantization_config = None
if quantization_config is not None:
logger.info(f"Quantization applied with the following config: {quantization_config}")
return quantization_config
def process_model_name(self, model_id: str) -> str:
@ -1750,7 +1741,6 @@ class Serve:
revision=revision,
trust_remote_code=self.trust_remote_code,
)
dtype = self.dtype if self.dtype in ["auto", None] else getattr(torch, self.dtype)
quantization_config = self.get_quantization_config()
@ -1758,19 +1748,15 @@ class Serve:
"revision": revision,
"attn_implementation": self.attn_implementation,
"dtype": dtype,
"device_map": "auto",
"device_map": self.device,
"trust_remote_code": self.trust_remote_code,
"quantization_config": quantization_config,
}
if quantization_config is not None:
model_kwargs["quantization_config"] = quantization_config
config = AutoConfig.from_pretrained(model_id, **model_kwargs)
architecture = getattr(transformers, config.architectures[0])
model = architecture.from_pretrained(model_id, **model_kwargs)
if getattr(model, "hf_device_map", None) is None:
model = model.to(self.device)
has_default_max_length = (
model.generation_config.max_new_tokens is None and model.generation_config.max_length == 20
)

View File

@ -390,7 +390,7 @@ class PreTrainedConfig(PushToHubMixin):
return self._attn_implementation_internal
@_attn_implementation.setter
def _attn_implementation(self, value: Optional[Union[str, dict]]):
def _attn_implementation(self, value: str | dict | None):
"""We set it recursively on the sub-configs as well"""
# Set if for current config
current_attn = getattr(self, "_attn_implementation", None)
@ -425,7 +425,7 @@ class PreTrainedConfig(PushToHubMixin):
def rope_scaling(self, value):
self.rope_parameters = value
def save_pretrained(self, save_directory: Union[str, os.PathLike], push_to_hub: bool = False, **kwargs):
def save_pretrained(self, save_directory: str | os.PathLike, push_to_hub: bool = False, **kwargs):
"""
Save a configuration object to the directory `save_directory`, so that it can be re-loaded using the
[`~PreTrainedConfig.from_pretrained`] class method.
@ -490,11 +490,11 @@ class PreTrainedConfig(PushToHubMixin):
@classmethod
def from_pretrained(
cls: type[SpecificPreTrainedConfigType],
pretrained_model_name_or_path: Union[str, os.PathLike],
cache_dir: Optional[Union[str, os.PathLike]] = None,
pretrained_model_name_or_path: str | os.PathLike,
cache_dir: str | os.PathLike | None = None,
force_download: bool = False,
local_files_only: bool = False,
token: Optional[Union[str, bool]] = None,
token: str | bool | None = None,
revision: str = "main",
**kwargs,
) -> SpecificPreTrainedConfigType:
@ -597,7 +597,7 @@ class PreTrainedConfig(PushToHubMixin):
@classmethod
def get_config_dict(
cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs
cls, pretrained_model_name_or_path: str | os.PathLike, **kwargs
) -> tuple[dict[str, Any], dict[str, Any]]:
"""
From a `pretrained_model_name_or_path`, resolve to a dictionary of parameters, to be used for instantiating a
@ -630,7 +630,7 @@ class PreTrainedConfig(PushToHubMixin):
@classmethod
def _get_config_dict(
cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs
cls, pretrained_model_name_or_path: str | os.PathLike, **kwargs
) -> tuple[dict[str, Any], dict[str, Any]]:
cache_dir = kwargs.pop("cache_dir", None)
force_download = kwargs.pop("force_download", False)
@ -793,7 +793,7 @@ class PreTrainedConfig(PushToHubMixin):
@classmethod
def from_json_file(
cls: type[SpecificPreTrainedConfigType], json_file: Union[str, os.PathLike]
cls: type[SpecificPreTrainedConfigType], json_file: str | os.PathLike
) -> SpecificPreTrainedConfigType:
"""
Instantiates a [`PreTrainedConfig`] from the path to a JSON file of parameters.
@ -810,7 +810,7 @@ class PreTrainedConfig(PushToHubMixin):
return cls(**config_dict)
@classmethod
def _dict_from_json_file(cls, json_file: Union[str, os.PathLike]):
def _dict_from_json_file(cls, json_file: str | os.PathLike):
with open(json_file, encoding="utf-8") as reader:
text = reader.read()
return json.loads(text)
@ -935,7 +935,7 @@ class PreTrainedConfig(PushToHubMixin):
config_dict = self.to_dict()
return json.dumps(config_dict, indent=2, sort_keys=True) + "\n"
def to_json_file(self, json_file_path: Union[str, os.PathLike], use_diff: bool = True):
def to_json_file(self, json_file_path: str | os.PathLike, use_diff: bool = True):
"""
Save this instance to a JSON file.

View File

@ -27,6 +27,7 @@ deps = {
"importlib_metadata": "importlib_metadata",
"ipadic": "ipadic>=1.0.0,<2.0",
"jinja2": "jinja2>=3.1.0",
"jmespath": "jmespath>=1.0.1",
"kenlm": "kenlm",
"kernels": "kernels>=0.10.2,<0.11",
"librosa": "librosa",
@ -76,7 +77,7 @@ deps = {
"tiktoken": "tiktoken",
"timm": "timm<=1.0.19,!=1.0.18",
"tokenizers": "tokenizers>=0.22.0,<=0.23.0",
"torch": "torch>=2.2,<2.9",
"torch": "torch>=2.2",
"torchaudio": "torchaudio",
"torchvision": "torchvision",
"pyctcdecode": "pyctcdecode>=0.4.0",

View File

@ -31,7 +31,6 @@ class CacheAllocator(ABC):
def allocate_blocks(self, n_blocks: int, request_id: str, free_blocks: deque[int]) -> Optional[int]:
"""Allocates n_blocks for a given request_id. Returns the num of blocks allocated if successful and None
otherwise."""
pass
def free_blocks(self, request_id: str, free_blocks: deque[int]) -> None:
"""Frees all blocks associated with a request_id."""
@ -46,17 +45,14 @@ class CacheAllocator(ABC):
@abstractmethod
def get_read_indices(self, request_id: str, past_length: int, query_length: int) -> list[int]:
"""Returns the physical indices of where to read request_id's cache in the cache tensor."""
pass
@abstractmethod
def get_write_indices(self, request_id: str, past_length: int, query_length: int) -> list[int]:
"""Returns the physical indices of where to write request_id's cache in the cache tensor."""
pass
@abstractmethod
def get_seqlens_k(self, request_id: str, past_length: int, query_length: int) -> tuple[str, int]:
"""Returns the attention type of the cache allocator and the key sequence length for the given request_id."""
pass
class FullAttentionCacheAllocator(CacheAllocator):

View File

@ -27,7 +27,6 @@ from ...utils.metrics import traced
logger = logging.getLogger("ContinuousBatchingLogger")
@staticmethod
def get_device_and_memory_breakdown() -> tuple[torch.device, int, int, int]:
if torch.cuda.is_available():
device = torch.device("cuda")

View File

@ -53,7 +53,6 @@ class Scheduler(ABC):
"""Schedules requests for the next batch based on available token budget. This method selects which requests
should be processed in the current batch, considering the token budget and the scheduler's prioritization rules.
The token_budget is the maximum number of tokens that can be processed in this batch."""
pass
@traced
def has_pending_requests(self) -> bool:

View File

@ -410,7 +410,6 @@ class GenerationMixin(ContinuousMixin):
logger.info(
"Generation config file not found, using a generation config created from the model config."
)
pass
# Load custom generate function if `pretrained_model_name_or_path` defines it (and override `generate`)
if hasattr(self, "load_custom_generate"):
try:
@ -1941,7 +1940,7 @@ class GenerationMixin(ContinuousMixin):
"minimax",
"xlnet",
"lfm2",
"lfm2-vl",
"lfm2_vl",
]
)

View File

@ -171,7 +171,6 @@ class TorchExportableModuleForVLM:
Returns:
Output with logits for text generation
"""
pass
def generate(
self, pixel_values=None, input_ids=None, max_new_tokens=50, do_sample=False, temperature=1.0, **kwargs
@ -189,7 +188,6 @@ class TorchExportableModuleForVLM:
Returns:
Generated sequences
"""
pass
class TorchExportableModuleForDecoderOnlyLM(torch.nn.Module):

View File

@ -162,8 +162,8 @@ except ImportError:
raise RuntimeError("register_kernel_mapping requires `kernels` to be installed. Run `pip install kernels`.")
_HUB_KERNEL_MAPPING: dict[str, str] = {
"causal-conv1d": "kernels-community/causal-conv1d",
_HUB_KERNEL_MAPPING: dict[str, dict[str, str]] = {
"causal-conv1d": {"repo_id": "kernels-community/causal-conv1d"},
}
_KERNEL_MODULE_MAPPING: dict[str, Optional[ModuleType]] = {}
@ -242,7 +242,9 @@ def lazy_load_kernel(kernel_name: str, mapping: dict[str, Optional[ModuleType]]
from kernels import get_kernel
try:
kernel = get_kernel(_HUB_KERNEL_MAPPING[kernel_name])
repo_id = _HUB_KERNEL_MAPPING[kernel_name]["repo_id"]
version = _HUB_KERNEL_MAPPING[kernel_name].get("version", None)
kernel = get_kernel(repo_id, version=version)
mapping[kernel_name] = kernel
except FileNotFoundError:
mapping[kernel_name] = None

View File

@ -737,7 +737,7 @@ class WandbCallback(TrainerCallback):
combined_dict = {**model_config, **combined_dict}
if hasattr(model, "peft_config") and model.peft_config is not None:
peft_config = model.peft_config
combined_dict = {**{"peft_config": peft_config}, **combined_dict}
combined_dict = {"peft_config": peft_config, **combined_dict}
trial_name = state.trial_name
init_args = {}
if trial_name is not None:
@ -982,7 +982,7 @@ class TrackioCallback(TrainerCallback):
combined_dict = {**model_config, **combined_dict}
if hasattr(model, "peft_config") and model.peft_config is not None:
peft_config = model.peft_config
combined_dict = {**{"peft_config": peft_config}, **combined_dict}
combined_dict = {"peft_config": peft_config, **combined_dict}
self._trackio.init(
project=project,
@ -2246,7 +2246,7 @@ class SwanLabCallback(TrainerCallback):
combined_dict = {**model_config, **combined_dict}
if hasattr(model, "peft_config") and model.peft_config is not None:
peft_config = model.peft_config
combined_dict = {**{"peft_config": peft_config}, **combined_dict}
combined_dict = {"peft_config": peft_config, **combined_dict}
trial_name = state.trial_name
init_args = {}
if trial_name is not None and args.run_name is not None:

View File

@ -628,7 +628,7 @@ def maybe_load_adapters(
**adapter_kwargs,
):
if pretrained_model_name_or_path is None or not is_peft_available():
return None, pretrained_model_name_or_path
return None, pretrained_model_name_or_path, adapter_kwargs
token = download_kwargs.get("token")
@ -670,4 +670,4 @@ def maybe_load_adapters(
_adapter_model_path = pretrained_model_name_or_path
pretrained_model_name_or_path = json.load(f)["base_model_name_or_path"]
return _adapter_model_path, pretrained_model_name_or_path
return _adapter_model_path, pretrained_model_name_or_path, adapter_kwargs

View File

@ -752,8 +752,6 @@ def extract_hyperparameters_from_trainer(trainer):
hyperparameters["optimizer"] = f"Use {optimizer_name} and the args are:\n{optimizer_args}"
hyperparameters["lr_scheduler_type"] = trainer.args.lr_scheduler_type.value
if trainer.args.warmup_ratio != 0.0:
hyperparameters["lr_scheduler_warmup_ratio"] = trainer.args.warmup_ratio
if trainer.args.warmup_steps != 0.0:
hyperparameters["lr_scheduler_warmup_steps"] = trainer.args.warmup_steps
if trainer.args.max_steps != -1:

View File

@ -14,7 +14,7 @@
import math
from functools import wraps
from typing import Optional, TypedDict, Union
from typing import Optional, TypedDict
from .configuration_utils import PreTrainedConfig
from .utils import is_torch_available, logging
@ -27,7 +27,7 @@ if is_torch_available():
import torch
def standardize_rope_params(config, rope_theta: Optional[Union[float, dict[str, float]]] = None):
def standardize_rope_params(config, rope_theta: float | dict[str, float] | None = None):
"""
Helper to standardize the config's rope params field by ensuring the params are defined for each
later type. For old model the fn will duplicate a single rope param in each layer type (backward compatibility)

View File

@ -4353,7 +4353,7 @@ class PreTrainedModel(nn.Module, EmbeddingAccessMixin, ModuleUtilsMixin, PushToH
if adapter_kwargs is None:
adapter_kwargs = {}
_adapter_model_path, pretrained_model_name_or_path = maybe_load_adapters(
_adapter_model_path, pretrained_model_name_or_path, adapter_kwargs = maybe_load_adapters(
pretrained_model_name_or_path,
download_kwargs_with_commit,
**adapter_kwargs,
@ -5413,9 +5413,7 @@ class PreTrainedAudioTokenizerBase(PreTrainedModel):
"""
Encode raw audio retrieved from a respective `FeatureExtractor` into discrete audio codebooks (with x channels)
"""
pass
@abstractmethod
def decode(self, audio_codes: torch.Tensor, *args, **kwargs):
"""Decode from discrete audio codebooks back to raw audio"""
pass

View File

@ -15,11 +15,7 @@
# limitations under the License.
"""ALBERT model configuration"""
from collections import OrderedDict
from collections.abc import Mapping
from ...configuration_utils import PreTrainedConfig
from ...onnx import OnnxConfig
class AlbertConfig(PreTrainedConfig):
@ -142,21 +138,4 @@ class AlbertConfig(PreTrainedConfig):
self.classifier_dropout_prob = classifier_dropout_prob
# Copied from transformers.models.bert.configuration_bert.BertOnnxConfig with Roberta->Albert
class AlbertOnnxConfig(OnnxConfig):
@property
def inputs(self) -> Mapping[str, Mapping[int, str]]:
if self.task == "multiple-choice":
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
else:
dynamic_axis = {0: "batch", 1: "sequence"}
return OrderedDict(
[
("input_ids", dynamic_axis),
("attention_mask", dynamic_axis),
("token_type_ids", dynamic_axis),
]
)
__all__ = ["AlbertConfig", "AlbertOnnxConfig"]
__all__ = ["AlbertConfig"]

View File

@ -1155,8 +1155,6 @@ class AriaTextMoELayer(nn.Module):
class AriaTextAttention(LlamaAttention):
"""Multi-headed attention from 'Attention Is All You Need' paper"""
pass
class AriaTextDecoderLayer(LlamaDecoderLayer):
"""

View File

@ -114,7 +114,6 @@ else:
("ijepa", ("ViTImageProcessor", "ViTImageProcessorFast")),
("imagegpt", ("ImageGPTImageProcessor", "ImageGPTImageProcessorFast")),
("instructblip", ("BlipImageProcessor", "BlipImageProcessorFast")),
("instructblipvideo", ("InstructBlipVideoImageProcessor", None)),
("janus", ("JanusImageProcessor", "JanusImageProcessorFast")),
("kosmos-2", ("CLIPImageProcessor", "CLIPImageProcessorFast")),
("kosmos-2.5", ("Kosmos2_5ImageProcessor", "Kosmos2_5ImageProcessorFast")),
@ -122,11 +121,11 @@ else:
("layoutlmv3", ("LayoutLMv3ImageProcessor", "LayoutLMv3ImageProcessorFast")),
("levit", ("LevitImageProcessor", "LevitImageProcessorFast")),
("lfm2_vl", (None, "Lfm2VlImageProcessorFast")),
("lightglue", ("LightGlueImageProcessor", None)),
("lightglue", ("LightGlueImageProcessor", "LightGlueImageProcessorFast")),
("llama4", ("Llama4ImageProcessor", "Llama4ImageProcessorFast")),
("llava", ("LlavaImageProcessor", "LlavaImageProcessorFast")),
("llava_next", ("LlavaNextImageProcessor", "LlavaNextImageProcessorFast")),
("llava_next_video", ("LlavaNextVideoImageProcessor", None)),
("llava_next_video", ("LlavaNextImageProcessor", "LlavaNextImageProcessorFast")),
("llava_onevision", ("LlavaOnevisionImageProcessor", "LlavaOnevisionImageProcessorFast")),
("mask2former", ("Mask2FormerImageProcessor", "Mask2FormerImageProcessorFast")),
("maskformer", ("MaskFormerImageProcessor", "MaskFormerImageProcessorFast")),

View File

@ -486,13 +486,11 @@ def segment_sum(input_tensor):
return tensor_segsum
is_fast_path_available = all((selective_state_update, causal_conv1d_fn, causal_conv1d_update))
def apply_mask_to_padding_states(hidden_states, attention_mask):
"""
Tunes out the hidden states for padding tokens, see https://github.com/state-spaces/mamba/issues/66
"""
# NOTE: attention mask is a 2D boolean tensor
if attention_mask is not None and attention_mask.shape[1] > 1 and attention_mask.shape[0] > 1:
dtype = hidden_states.dtype
hidden_states = (hidden_states * attention_mask[:, :, None]).to(dtype)
@ -500,6 +498,9 @@ def apply_mask_to_padding_states(hidden_states, attention_mask):
return hidden_states
is_fast_path_available = all((selective_state_update, causal_conv1d_fn, causal_conv1d_update))
# Adapted from transformers.models.mamba2.modeling_mamba2.Mamba2Mixer
class BambaMixer(nn.Module):
"""

View File

@ -36,6 +36,7 @@ from transformers.models.llama.modeling_llama import (
)
from transformers.models.mamba2.modeling_mamba2 import (
MambaRMSNormGated,
apply_mask_to_padding_states,
pad_tensor_by_size,
reshape_into_chunks,
segment_sum,
@ -203,17 +204,6 @@ class BambaRMSNormGated(MambaRMSNormGated):
pass
def apply_mask_to_padding_states(hidden_states, attention_mask):
"""
Tunes out the hidden states for padding tokens, see https://github.com/state-spaces/mamba/issues/66
"""
if attention_mask is not None and attention_mask.shape[1] > 1 and attention_mask.shape[0] > 1:
dtype = hidden_states.dtype
hidden_states = (hidden_states * attention_mask[:, :, None]).to(dtype)
return hidden_states
# Adapted from transformers.models.mamba2.modeling_mamba2.Mamba2Mixer
class BambaMixer(nn.Module):
"""

View File

@ -235,7 +235,6 @@ class BarkFineGenerationConfig(GenerationConfig):
Overrides GenerationConfig.validate because BarkFineGenerationConfig don't use any parameters outside
temperature.
"""
pass
class BarkGenerationConfig(GenerationConfig):

View File

@ -1318,7 +1318,7 @@ class BarkFineModel(BarkPreTrainedModel):
output sound according to specific predefined voice.
"""
)
class BarkModel(BarkPreTrainedModel):
class BarkModel(BarkPreTrainedModel, GenerationMixin):
config: BarkConfig
def __init__(self, config):

View File

@ -15,15 +15,9 @@
"""BART model configuration"""
import warnings
from collections import OrderedDict
from collections.abc import Mapping
from typing import Any
from ... import PreTrainedTokenizer
from ...configuration_utils import PreTrainedConfig
from ...onnx import OnnxConfig, OnnxConfigWithPast, OnnxSeq2SeqConfigWithPast
from ...onnx.utils import compute_effective_axis_dimension
from ...utils import is_torch_available, logging
from ...utils import logging
logger = logging.get_logger(__name__)
@ -180,223 +174,4 @@ class BartConfig(PreTrainedConfig):
)
class BartOnnxConfig(OnnxSeq2SeqConfigWithPast):
@property
def inputs(self) -> Mapping[str, Mapping[int, str]]:
if self.task in ["default", "seq2seq-lm"]:
common_inputs = OrderedDict(
[
("input_ids", {0: "batch", 1: "encoder_sequence"}),
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
]
)
if self.use_past:
common_inputs["decoder_input_ids"] = {0: "batch"}
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
else:
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
if self.use_past:
self.fill_with_past_key_values_(common_inputs, direction="inputs")
elif self.task == "causal-lm":
# TODO: figure this case out.
common_inputs = OrderedDict(
[
("input_ids", {0: "batch", 1: "encoder_sequence"}),
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
]
)
if self.use_past:
num_encoder_layers, _ = self.num_layers
for i in range(num_encoder_layers):
common_inputs[f"past_key_values.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
common_inputs[f"past_key_values.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
else:
common_inputs = OrderedDict(
[
("input_ids", {0: "batch", 1: "encoder_sequence"}),
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
("decoder_input_ids", {0: "batch", 1: "decoder_sequence"}),
("decoder_attention_mask", {0: "batch", 1: "decoder_sequence"}),
]
)
return common_inputs
@property
def outputs(self) -> Mapping[str, Mapping[int, str]]:
if self.task in ["default", "seq2seq-lm"]:
common_outputs = super().outputs
else:
common_outputs = super(OnnxConfigWithPast, self).outputs
if self.use_past:
num_encoder_layers, _ = self.num_layers
for i in range(num_encoder_layers):
common_outputs[f"present.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
common_outputs[f"present.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
return common_outputs
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
self,
tokenizer: PreTrainedTokenizer,
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
encoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
tokenizer, batch_size, seq_length, is_pair
)
# Generate decoder inputs
decoder_seq_length = seq_length if not self.use_past else 1
decoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
tokenizer, batch_size, decoder_seq_length, is_pair
)
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
common_inputs = dict(**encoder_inputs, **decoder_inputs)
if self.use_past:
if not is_torch_available():
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
else:
import torch
batch, encoder_seq_length = common_inputs["input_ids"].shape
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
encoder_shape = (
batch,
num_encoder_attention_heads,
encoder_seq_length,
self._config.hidden_size // num_encoder_attention_heads,
)
decoder_past_length = decoder_seq_length + 3
decoder_shape = (
batch,
num_decoder_attention_heads,
decoder_past_length,
self._config.hidden_size // num_decoder_attention_heads,
)
common_inputs["decoder_attention_mask"] = torch.cat(
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
)
common_inputs["past_key_values"] = []
# If the number of encoder and decoder layers are present in the model configuration, both are considered
num_encoder_layers, num_decoder_layers = self.num_layers
min_num_layers = min(num_encoder_layers, num_decoder_layers)
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
for _ in range(min_num_layers):
common_inputs["past_key_values"].append(
(
torch.zeros(decoder_shape),
torch.zeros(decoder_shape),
torch.zeros(encoder_shape),
torch.zeros(encoder_shape),
)
)
# TODO: test this.
shape = encoder_shape if remaining_side_name == "encoder" else decoder_shape
for _ in range(min_num_layers, max_num_layers):
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
return common_inputs
def _generate_dummy_inputs_for_causal_lm(
self,
tokenizer: PreTrainedTokenizer,
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
tokenizer, batch_size, seq_length, is_pair
)
if self.use_past:
if not is_torch_available():
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
else:
import torch
batch, seqlen = common_inputs["input_ids"].shape
# Not using the same length for past_key_values
past_key_values_length = seqlen + 2
num_encoder_layers, _ = self.num_layers
num_encoder_attention_heads, _ = self.num_attention_heads
past_shape = (
batch,
num_encoder_attention_heads,
past_key_values_length,
self._config.hidden_size // num_encoder_attention_heads,
)
mask_dtype = common_inputs["attention_mask"].dtype
common_inputs["attention_mask"] = torch.cat(
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
)
common_inputs["past_key_values"] = [
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(num_encoder_layers)
]
return common_inputs
def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
self,
tokenizer: PreTrainedTokenizer,
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
# Copied from OnnxConfig.generate_dummy_inputs
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
batch_size = compute_effective_axis_dimension(
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
)
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
seq_length = compute_effective_axis_dimension(
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
)
# Generate dummy inputs according to compute batch and sequence
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
return common_inputs
def generate_dummy_inputs(
self,
tokenizer: PreTrainedTokenizer,
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
if self.task in ["default", "seq2seq-lm"]:
common_inputs = self._generate_dummy_inputs_for_default_and_seq2seq_lm(
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
)
elif self.task == "causal-lm":
common_inputs = self._generate_dummy_inputs_for_causal_lm(
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
)
else:
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
)
return common_inputs
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
if self.task in ["default", "seq2seq-lm"]:
flattened_output = super()._flatten_past_key_values_(flattened_output, name, idx, t)
else:
flattened_output = super(OnnxSeq2SeqConfigWithPast, self)._flatten_past_key_values_(
flattened_output, name, idx, t
)
__all__ = ["BartConfig", "BartOnnxConfig"]
__all__ = ["BartConfig"]

View File

@ -538,12 +538,12 @@ class BartEncoder(BartPreTrainedModel):
self.max_source_positions = config.max_position_embeddings
embed_scale = math.sqrt(embed_dim) if config.scale_embedding else 1.0
self.embed_tokens = BartScaledWordEmbedding(
config.vocab_size, embed_dim, self.padding_idx, embed_scale=embed_scale
)
if embed_tokens is not None:
self.embed_tokens.weight = embed_tokens.weight
self.embed_tokens = embed_tokens
else:
self.embed_tokens = BartScaledWordEmbedding(
config.vocab_size, embed_dim, self.padding_idx, embed_scale=embed_scale
)
self.embed_positions = BartLearnedPositionalEmbedding(
config.max_position_embeddings,
@ -682,12 +682,12 @@ class BartDecoder(BartPreTrainedModel):
self.max_target_positions = config.max_position_embeddings
embed_scale = math.sqrt(config.d_model) if config.scale_embedding else 1.0
self.embed_tokens = BartScaledWordEmbedding(
config.vocab_size, config.d_model, self.padding_idx, embed_scale=embed_scale
)
if embed_tokens is not None:
self.embed_tokens.weight = embed_tokens.weight
self.embed_tokens = embed_tokens
else:
self.embed_tokens = BartScaledWordEmbedding(
config.vocab_size, config.d_model, self.padding_idx, embed_scale=embed_scale
)
self.embed_positions = BartLearnedPositionalEmbedding(
config.max_position_embeddings,

View File

@ -15,13 +15,8 @@
"""BEiT model configuration"""
import warnings
from collections import OrderedDict
from collections.abc import Mapping
from packaging import version
from ...configuration_utils import PreTrainedConfig
from ...onnx import OnnxConfig
from ...utils.backbone_utils import BackboneConfigMixin, get_aligned_output_features_output_indices
@ -209,21 +204,4 @@ class BeitConfig(BackboneConfigMixin, PreTrainedConfig):
self.reshape_hidden_states = reshape_hidden_states
# Copied from transformers.models.vit.configuration_vit.ViTOnnxConfig
class BeitOnnxConfig(OnnxConfig):
torch_onnx_minimum_version = version.parse("1.11")
@property
def inputs(self) -> Mapping[str, Mapping[int, str]]:
return OrderedDict(
[
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
]
)
@property
def atol_for_validation(self) -> float:
return 1e-4
__all__ = ["BeitConfig", "BeitOnnxConfig"]
__all__ = ["BeitConfig"]

View File

@ -15,11 +15,7 @@
# limitations under the License.
"""BERT model configuration"""
from collections import OrderedDict
from collections.abc import Mapping
from ...configuration_utils import PreTrainedConfig
from ...onnx import OnnxConfig
from ...utils import logging
@ -127,20 +123,4 @@ class BertConfig(PreTrainedConfig):
self.classifier_dropout = classifier_dropout
class BertOnnxConfig(OnnxConfig):
@property
def inputs(self) -> Mapping[str, Mapping[int, str]]:
if self.task == "multiple-choice":
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
else:
dynamic_axis = {0: "batch", 1: "sequence"}
return OrderedDict(
[
("input_ids", dynamic_axis),
("attention_mask", dynamic_axis),
("token_type_ids", dynamic_axis),
]
)
__all__ = ["BertConfig", "BertOnnxConfig"]
__all__ = ["BertConfig"]

View File

@ -14,11 +14,7 @@
# limitations under the License.
"""BigBird model configuration"""
from collections import OrderedDict
from collections.abc import Mapping
from ...configuration_utils import PreTrainedConfig
from ...onnx import OnnxConfig
from ...utils import logging
@ -158,19 +154,4 @@ class BigBirdConfig(PreTrainedConfig):
self.classifier_dropout = classifier_dropout
class BigBirdOnnxConfig(OnnxConfig):
@property
def inputs(self) -> Mapping[str, Mapping[int, str]]:
if self.task == "multiple-choice":
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
else:
dynamic_axis = {0: "batch", 1: "sequence"}
return OrderedDict(
[
("input_ids", dynamic_axis),
("attention_mask", dynamic_axis),
]
)
__all__ = ["BigBirdConfig", "BigBirdOnnxConfig"]
__all__ = ["BigBirdConfig"]

View File

@ -14,15 +14,8 @@
# limitations under the License.
"""BigBirdPegasus model configuration"""
from collections import OrderedDict
from collections.abc import Mapping
from typing import Any
from ... import PreTrainedTokenizer
from ...configuration_utils import PreTrainedConfig
from ...onnx import OnnxConfig, OnnxConfigWithPast, OnnxSeq2SeqConfigWithPast
from ...onnx.utils import compute_effective_axis_dimension
from ...utils import is_torch_available, logging
from ...utils import logging
logger = logging.get_logger(__name__)
@ -186,224 +179,4 @@ class BigBirdPegasusConfig(PreTrainedConfig):
)
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig with Bart->BigBirdPegasus
class BigBirdPegasusOnnxConfig(OnnxSeq2SeqConfigWithPast):
@property
def inputs(self) -> Mapping[str, Mapping[int, str]]:
if self.task in ["default", "seq2seq-lm"]:
common_inputs = OrderedDict(
[
("input_ids", {0: "batch", 1: "encoder_sequence"}),
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
]
)
if self.use_past:
common_inputs["decoder_input_ids"] = {0: "batch"}
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
else:
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
if self.use_past:
self.fill_with_past_key_values_(common_inputs, direction="inputs")
elif self.task == "causal-lm":
# TODO: figure this case out.
common_inputs = OrderedDict(
[
("input_ids", {0: "batch", 1: "encoder_sequence"}),
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
]
)
if self.use_past:
num_encoder_layers, _ = self.num_layers
for i in range(num_encoder_layers):
common_inputs[f"past_key_values.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
common_inputs[f"past_key_values.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
else:
common_inputs = OrderedDict(
[
("input_ids", {0: "batch", 1: "encoder_sequence"}),
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
("decoder_input_ids", {0: "batch", 1: "decoder_sequence"}),
("decoder_attention_mask", {0: "batch", 1: "decoder_sequence"}),
]
)
return common_inputs
@property
def outputs(self) -> Mapping[str, Mapping[int, str]]:
if self.task in ["default", "seq2seq-lm"]:
common_outputs = super().outputs
else:
common_outputs = super(OnnxConfigWithPast, self).outputs
if self.use_past:
num_encoder_layers, _ = self.num_layers
for i in range(num_encoder_layers):
common_outputs[f"present.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
common_outputs[f"present.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
return common_outputs
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
self,
tokenizer: PreTrainedTokenizer,
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
encoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
tokenizer, batch_size, seq_length, is_pair
)
# Generate decoder inputs
decoder_seq_length = seq_length if not self.use_past else 1
decoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
tokenizer, batch_size, decoder_seq_length, is_pair
)
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
common_inputs = dict(**encoder_inputs, **decoder_inputs)
if self.use_past:
if not is_torch_available():
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
else:
import torch
batch, encoder_seq_length = common_inputs["input_ids"].shape
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
encoder_shape = (
batch,
num_encoder_attention_heads,
encoder_seq_length,
self._config.hidden_size // num_encoder_attention_heads,
)
decoder_past_length = decoder_seq_length + 3
decoder_shape = (
batch,
num_decoder_attention_heads,
decoder_past_length,
self._config.hidden_size // num_decoder_attention_heads,
)
common_inputs["decoder_attention_mask"] = torch.cat(
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
)
common_inputs["past_key_values"] = []
# If the number of encoder and decoder layers are present in the model configuration, both are considered
num_encoder_layers, num_decoder_layers = self.num_layers
min_num_layers = min(num_encoder_layers, num_decoder_layers)
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
for _ in range(min_num_layers):
common_inputs["past_key_values"].append(
(
torch.zeros(decoder_shape),
torch.zeros(decoder_shape),
torch.zeros(encoder_shape),
torch.zeros(encoder_shape),
)
)
# TODO: test this.
shape = encoder_shape if remaining_side_name == "encoder" else decoder_shape
for _ in range(min_num_layers, max_num_layers):
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
return common_inputs
def _generate_dummy_inputs_for_causal_lm(
self,
tokenizer: PreTrainedTokenizer,
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
tokenizer, batch_size, seq_length, is_pair
)
if self.use_past:
if not is_torch_available():
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
else:
import torch
batch, seqlen = common_inputs["input_ids"].shape
# Not using the same length for past_key_values
past_key_values_length = seqlen + 2
num_encoder_layers, _ = self.num_layers
num_encoder_attention_heads, _ = self.num_attention_heads
past_shape = (
batch,
num_encoder_attention_heads,
past_key_values_length,
self._config.hidden_size // num_encoder_attention_heads,
)
mask_dtype = common_inputs["attention_mask"].dtype
common_inputs["attention_mask"] = torch.cat(
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
)
common_inputs["past_key_values"] = [
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(num_encoder_layers)
]
return common_inputs
def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
self,
tokenizer: PreTrainedTokenizer,
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
# Copied from OnnxConfig.generate_dummy_inputs
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
batch_size = compute_effective_axis_dimension(
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
)
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
seq_length = compute_effective_axis_dimension(
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
)
# Generate dummy inputs according to compute batch and sequence
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
return common_inputs
def generate_dummy_inputs(
self,
tokenizer: PreTrainedTokenizer,
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
if self.task in ["default", "seq2seq-lm"]:
common_inputs = self._generate_dummy_inputs_for_default_and_seq2seq_lm(
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
)
elif self.task == "causal-lm":
common_inputs = self._generate_dummy_inputs_for_causal_lm(
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
)
else:
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
)
return common_inputs
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
if self.task in ["default", "seq2seq-lm"]:
flattened_output = super()._flatten_past_key_values_(flattened_output, name, idx, t)
else:
flattened_output = super(OnnxSeq2SeqConfigWithPast, self)._flatten_past_key_values_(
flattened_output, name, idx, t
)
__all__ = ["BigBirdPegasusConfig", "BigBirdPegasusOnnxConfig"]
__all__ = ["BigBirdPegasusConfig"]

View File

@ -14,15 +14,7 @@
# limitations under the License.
"""Blenderbot model configuration"""
from collections import OrderedDict
from collections.abc import Mapping
from typing import Any
from ... import PreTrainedTokenizer
from ...configuration_utils import PreTrainedConfig
from ...file_utils import is_torch_available
from ...onnx import OnnxConfig, OnnxConfigWithPast, OnnxSeq2SeqConfigWithPast
from ...onnx.utils import compute_effective_axis_dimension
from ...utils import logging
@ -166,227 +158,4 @@ class BlenderbotConfig(PreTrainedConfig):
)
class BlenderbotOnnxConfig(OnnxSeq2SeqConfigWithPast):
@property
def inputs(self) -> Mapping[str, Mapping[int, str]]:
if self.task in ["default", "seq2seq-lm"]:
common_inputs = OrderedDict(
[
("input_ids", {0: "batch", 1: "encoder_sequence"}),
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
]
)
if self.use_past:
common_inputs["decoder_input_ids"] = {0: "batch"}
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
else:
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
if self.use_past:
self.fill_with_past_key_values_(common_inputs, direction="inputs")
elif self.task == "causal-lm":
common_inputs = OrderedDict(
[
("input_ids", {0: "batch", 1: "encoder_sequence"}),
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
]
)
if self.use_past:
_, num_decoder_layers = self.num_layers
for i in range(num_decoder_layers):
common_inputs[f"past_key_values.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
common_inputs[f"past_key_values.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
else:
common_inputs = OrderedDict(
[
("input_ids", {0: "batch", 1: "encoder_sequence"}),
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
("decoder_input_ids", {0: "batch", 1: "decoder_sequence"}),
("decoder_attention_mask", {0: "batch", 1: "decoder_sequence"}),
]
)
return common_inputs
@property
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig.outputs
def outputs(self) -> Mapping[str, Mapping[int, str]]:
if self.task in ["default", "seq2seq-lm"]:
common_outputs = super().outputs
else:
common_outputs = super(OnnxConfigWithPast, self).outputs
if self.use_past:
num_encoder_layers, _ = self.num_layers
for i in range(num_encoder_layers):
common_outputs[f"present.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
common_outputs[f"present.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
return common_outputs
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
self,
tokenizer: PreTrainedTokenizer,
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
encoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
tokenizer, batch_size, seq_length, is_pair
)
# Generate decoder inputs
decoder_seq_length = seq_length if not self.use_past else 1
decoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
tokenizer, batch_size, decoder_seq_length, is_pair
)
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
common_inputs = dict(**encoder_inputs, **decoder_inputs)
if self.use_past:
if not is_torch_available():
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
else:
import torch
batch, encoder_seq_length = common_inputs["input_ids"].shape
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
encoder_shape = (
batch,
num_encoder_attention_heads,
encoder_seq_length,
self._config.hidden_size // num_encoder_attention_heads,
)
decoder_past_length = decoder_seq_length
decoder_shape = (
batch,
num_decoder_attention_heads,
decoder_past_length,
self._config.hidden_size // num_decoder_attention_heads,
)
common_inputs["decoder_attention_mask"] = torch.cat(
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
)
common_inputs["past_key_values"] = []
_, num_decoder_layers = self.num_layers
for _ in range(num_decoder_layers):
common_inputs["past_key_values"].append(
(
torch.zeros(decoder_shape),
torch.zeros(decoder_shape),
torch.zeros(encoder_shape),
torch.zeros(encoder_shape),
)
)
return common_inputs
def _generate_dummy_inputs_for_causal_lm(
self,
tokenizer: PreTrainedTokenizer,
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
tokenizer, batch_size, seq_length, is_pair
)
if self.use_past:
if not is_torch_available():
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
else:
import torch
batch, seqlen = common_inputs["input_ids"].shape
past_key_values_length = seqlen
_, num_decoder_layers = self.num_layers
num_encoder_attention_heads, _ = self.num_attention_heads
past_shape = (
batch,
num_encoder_attention_heads,
past_key_values_length,
self._config.hidden_size // num_encoder_attention_heads,
)
mask_dtype = common_inputs["attention_mask"].dtype
common_inputs["attention_mask"] = torch.cat(
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
)
common_inputs["past_key_values"] = [
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(num_decoder_layers)
]
return common_inputs
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig._generate_dummy_inputs_for_sequence_classification_and_question_answering
def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
self,
tokenizer: PreTrainedTokenizer,
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
# Copied from OnnxConfig.generate_dummy_inputs
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
batch_size = compute_effective_axis_dimension(
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
)
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
seq_length = compute_effective_axis_dimension(
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
)
# Generate dummy inputs according to compute batch and sequence
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
return common_inputs
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig.generate_dummy_inputs
def generate_dummy_inputs(
self,
tokenizer: PreTrainedTokenizer,
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
if self.task in ["default", "seq2seq-lm"]:
common_inputs = self._generate_dummy_inputs_for_default_and_seq2seq_lm(
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
)
elif self.task == "causal-lm":
common_inputs = self._generate_dummy_inputs_for_causal_lm(
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
)
else:
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
)
return common_inputs
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig._flatten_past_key_values_
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
if self.task in ["default", "seq2seq-lm"]:
flattened_output = super()._flatten_past_key_values_(flattened_output, name, idx, t)
else:
flattened_output = super(OnnxSeq2SeqConfigWithPast, self)._flatten_past_key_values_(
flattened_output, name, idx, t
)
def fill_with_past_key_values_(self, inputs_or_outputs: Mapping[str, Mapping[int, str]], direction: str):
if direction not in ["inputs", "outputs"]:
raise ValueError(f'direction must either be "inputs" or "outputs", but {direction} was given')
name = "past_key_values" if direction == "inputs" else "present"
_, num_decoder_layers = self.num_layers
encoder_sequence = "past_encoder_sequence"
decoder_sequence = "past_decoder_sequence" if direction == "inputs" else "past_decoder_sequence + sequence"
for i in range(num_decoder_layers):
inputs_or_outputs[f"{name}.{i}.decoder.key"] = {0: "batch", 2: decoder_sequence}
inputs_or_outputs[f"{name}.{i}.decoder.value"] = {0: "batch", 2: decoder_sequence}
inputs_or_outputs[f"{name}.{i}.encoder.key"] = {0: "batch", 2: encoder_sequence}
inputs_or_outputs[f"{name}.{i}.encoder.value"] = {0: "batch", 2: encoder_sequence}
__all__ = ["BlenderbotConfig", "BlenderbotOnnxConfig"]
__all__ = ["BlenderbotConfig"]

View File

@ -14,15 +14,7 @@
# limitations under the License.
"""BlenderbotSmall model configuration"""
from collections import OrderedDict
from collections.abc import Mapping
from typing import Any
from ... import PreTrainedTokenizer
from ...configuration_utils import PreTrainedConfig
from ...file_utils import is_torch_available
from ...onnx import OnnxConfig, OnnxConfigWithPast, OnnxSeq2SeqConfigWithPast
from ...onnx.utils import compute_effective_axis_dimension
from ...utils import logging
@ -164,224 +156,4 @@ class BlenderbotSmallConfig(PreTrainedConfig):
)
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig with Bart->BlenderbotSmall
class BlenderbotSmallOnnxConfig(OnnxSeq2SeqConfigWithPast):
@property
def inputs(self) -> Mapping[str, Mapping[int, str]]:
if self.task in ["default", "seq2seq-lm"]:
common_inputs = OrderedDict(
[
("input_ids", {0: "batch", 1: "encoder_sequence"}),
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
]
)
if self.use_past:
common_inputs["decoder_input_ids"] = {0: "batch"}
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
else:
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
if self.use_past:
self.fill_with_past_key_values_(common_inputs, direction="inputs")
elif self.task == "causal-lm":
# TODO: figure this case out.
common_inputs = OrderedDict(
[
("input_ids", {0: "batch", 1: "encoder_sequence"}),
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
]
)
if self.use_past:
num_encoder_layers, _ = self.num_layers
for i in range(num_encoder_layers):
common_inputs[f"past_key_values.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
common_inputs[f"past_key_values.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
else:
common_inputs = OrderedDict(
[
("input_ids", {0: "batch", 1: "encoder_sequence"}),
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
("decoder_input_ids", {0: "batch", 1: "decoder_sequence"}),
("decoder_attention_mask", {0: "batch", 1: "decoder_sequence"}),
]
)
return common_inputs
@property
def outputs(self) -> Mapping[str, Mapping[int, str]]:
if self.task in ["default", "seq2seq-lm"]:
common_outputs = super().outputs
else:
common_outputs = super(OnnxConfigWithPast, self).outputs
if self.use_past:
num_encoder_layers, _ = self.num_layers
for i in range(num_encoder_layers):
common_outputs[f"present.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
common_outputs[f"present.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
return common_outputs
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
self,
tokenizer: PreTrainedTokenizer,
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
encoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
tokenizer, batch_size, seq_length, is_pair
)
# Generate decoder inputs
decoder_seq_length = seq_length if not self.use_past else 1
decoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
tokenizer, batch_size, decoder_seq_length, is_pair
)
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
common_inputs = dict(**encoder_inputs, **decoder_inputs)
if self.use_past:
if not is_torch_available():
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
else:
import torch
batch, encoder_seq_length = common_inputs["input_ids"].shape
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
encoder_shape = (
batch,
num_encoder_attention_heads,
encoder_seq_length,
self._config.hidden_size // num_encoder_attention_heads,
)
decoder_past_length = decoder_seq_length + 3
decoder_shape = (
batch,
num_decoder_attention_heads,
decoder_past_length,
self._config.hidden_size // num_decoder_attention_heads,
)
common_inputs["decoder_attention_mask"] = torch.cat(
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
)
common_inputs["past_key_values"] = []
# If the number of encoder and decoder layers are present in the model configuration, both are considered
num_encoder_layers, num_decoder_layers = self.num_layers
min_num_layers = min(num_encoder_layers, num_decoder_layers)
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
for _ in range(min_num_layers):
common_inputs["past_key_values"].append(
(
torch.zeros(decoder_shape),
torch.zeros(decoder_shape),
torch.zeros(encoder_shape),
torch.zeros(encoder_shape),
)
)
# TODO: test this.
shape = encoder_shape if remaining_side_name == "encoder" else decoder_shape
for _ in range(min_num_layers, max_num_layers):
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
return common_inputs
def _generate_dummy_inputs_for_causal_lm(
self,
tokenizer: PreTrainedTokenizer,
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
tokenizer, batch_size, seq_length, is_pair
)
if self.use_past:
if not is_torch_available():
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
else:
import torch
batch, seqlen = common_inputs["input_ids"].shape
# Not using the same length for past_key_values
past_key_values_length = seqlen + 2
num_encoder_layers, _ = self.num_layers
num_encoder_attention_heads, _ = self.num_attention_heads
past_shape = (
batch,
num_encoder_attention_heads,
past_key_values_length,
self._config.hidden_size // num_encoder_attention_heads,
)
mask_dtype = common_inputs["attention_mask"].dtype
common_inputs["attention_mask"] = torch.cat(
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
)
common_inputs["past_key_values"] = [
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(num_encoder_layers)
]
return common_inputs
def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
self,
tokenizer: PreTrainedTokenizer,
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
# Copied from OnnxConfig.generate_dummy_inputs
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
batch_size = compute_effective_axis_dimension(
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
)
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
seq_length = compute_effective_axis_dimension(
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
)
# Generate dummy inputs according to compute batch and sequence
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
return common_inputs
def generate_dummy_inputs(
self,
tokenizer: PreTrainedTokenizer,
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
if self.task in ["default", "seq2seq-lm"]:
common_inputs = self._generate_dummy_inputs_for_default_and_seq2seq_lm(
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
)
elif self.task == "causal-lm":
common_inputs = self._generate_dummy_inputs_for_causal_lm(
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
)
else:
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
)
return common_inputs
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
if self.task in ["default", "seq2seq-lm"]:
flattened_output = super()._flatten_past_key_values_(flattened_output, name, idx, t)
else:
flattened_output = super(OnnxSeq2SeqConfigWithPast, self)._flatten_past_key_values_(
flattened_output, name, idx, t
)
__all__ = ["BlenderbotSmallConfig", "BlenderbotSmallOnnxConfig"]
__all__ = ["BlenderbotSmallConfig"]

View File

@ -14,19 +14,8 @@
# limitations under the License.
"""Bloom configuration"""
from collections import OrderedDict
from collections.abc import Mapping
from typing import TYPE_CHECKING, Any, Optional
from packaging import version
if TYPE_CHECKING:
from ... import PreTrainedTokenizer
from ...configuration_utils import PreTrainedConfig
from ...onnx import OnnxConfigWithPast, PatchingSpec
from ...utils import is_torch_available, logging
from ...utils import logging
logger = logging.get_logger(__name__)
@ -142,99 +131,4 @@ class BloomConfig(PreTrainedConfig):
super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
class BloomOnnxConfig(OnnxConfigWithPast):
torch_onnx_minimum_version = version.parse("1.12")
def __init__(
self,
config: PreTrainedConfig,
task: str = "default",
patching_specs: Optional[list[PatchingSpec]] = None,
use_past: bool = False,
):
super().__init__(config, task=task, patching_specs=patching_specs, use_past=use_past)
if not getattr(self._config, "pad_token_id", None):
# TODO: how to do that better?
self._config.pad_token_id = 0
@property
def inputs(self) -> Mapping[str, Mapping[int, str]]:
common_inputs = OrderedDict({"input_ids": {0: "batch", 1: "sequence"}})
if self.use_past:
# BLOOM stores values on dynamic axis 2. For more details see: https://github.com/huggingface/transformers/pull/18344
self.fill_with_past_key_values_(common_inputs, direction="inputs", inverted_values_shape=True)
common_inputs["attention_mask"] = {0: "batch", 1: "past_sequence + sequence"}
else:
common_inputs["attention_mask"] = {0: "batch", 1: "sequence"}
return common_inputs
@property
def num_layers(self) -> int:
return self._config.n_layer
@property
def num_attention_heads(self) -> int:
return self._config.n_head
@property
def atol_for_validation(self) -> float:
return 1e-3
def generate_dummy_inputs(
self,
tokenizer: "PreTrainedTokenizer",
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
) -> Mapping[str, Any]:
common_inputs = super(OnnxConfigWithPast, self).generate_dummy_inputs(
tokenizer,
batch_size=batch_size,
seq_length=seq_length,
is_pair=is_pair,
)
# We need to order the input in the way they appears in the forward()
ordered_inputs = OrderedDict({"input_ids": common_inputs["input_ids"]})
# Need to add the past_keys
if self.use_past:
if not is_torch_available():
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
else:
import torch
batch, seqlen = common_inputs["input_ids"].shape
# Not using the same length for past_key_values
past_key_values_length = seqlen + 2
head_dim = self._config.hidden_size // self.num_attention_heads
past_key_shape = (
batch * self.num_attention_heads,
head_dim,
past_key_values_length,
)
past_value_shape = (
batch * self.num_attention_heads,
past_key_values_length,
head_dim,
)
ordered_inputs["past_key_values"] = [
(torch.zeros(past_key_shape), torch.zeros(past_value_shape)) for _ in range(self.num_layers)
]
ordered_inputs["attention_mask"] = common_inputs["attention_mask"]
if self.use_past:
mask_dtype = ordered_inputs["attention_mask"].dtype
ordered_inputs["attention_mask"] = torch.cat(
[ordered_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
)
return ordered_inputs
@property
def default_onnx_opset(self) -> int:
return 13
__all__ = ["BloomConfig", "BloomOnnxConfig"]
__all__ = ["BloomConfig"]

View File

@ -15,11 +15,7 @@
# limitations under the License.
"""CamemBERT configuration"""
from collections import OrderedDict
from collections.abc import Mapping
from ...configuration_utils import PreTrainedConfig
from ...onnx import OnnxConfig
from ...utils import logging
@ -129,19 +125,4 @@ class CamembertConfig(PreTrainedConfig):
self.classifier_dropout = classifier_dropout
class CamembertOnnxConfig(OnnxConfig):
@property
def inputs(self) -> Mapping[str, Mapping[int, str]]:
if self.task == "multiple-choice":
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
else:
dynamic_axis = {0: "batch", 1: "sequence"}
return OrderedDict(
[
("input_ids", dynamic_axis),
("attention_mask", dynamic_axis),
]
)
__all__ = ["CamembertConfig", "CamembertOnnxConfig"]
__all__ = ["CamembertConfig"]

View File

@ -14,16 +14,7 @@
# limitations under the License.
"""Chinese-CLIP model configuration"""
from collections import OrderedDict
from collections.abc import Mapping
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from ...processing_utils import ProcessorMixin
from ...configuration_utils import PreTrainedConfig
from ...onnx import OnnxConfig
from ...utils import logging
@ -368,52 +359,4 @@ class ChineseCLIPConfig(PreTrainedConfig):
super().__init__(**kwargs)
class ChineseCLIPOnnxConfig(OnnxConfig):
@property
def inputs(self) -> Mapping[str, Mapping[int, str]]:
return OrderedDict(
[
("input_ids", {0: "batch", 1: "sequence"}),
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
("attention_mask", {0: "batch", 1: "sequence"}),
]
)
@property
def outputs(self) -> Mapping[str, Mapping[int, str]]:
return OrderedDict(
[
("logits_per_image", {0: "batch"}),
("logits_per_text", {0: "batch"}),
("text_embeds", {0: "batch"}),
("image_embeds", {0: "batch"}),
]
)
@property
def atol_for_validation(self) -> float:
return 1e-4
def generate_dummy_inputs(
self,
processor: "ProcessorMixin",
batch_size: int = -1,
seq_length: int = -1,
) -> Mapping[str, Any]:
text_input_dict = super().generate_dummy_inputs(
processor.tokenizer,
batch_size=batch_size,
seq_length=seq_length,
)
image_input_dict = super().generate_dummy_inputs(
processor.image_processor,
batch_size=batch_size,
)
return {**text_input_dict, **image_input_dict}
@property
def default_onnx_opset(self) -> int:
return 14
__all__ = ["ChineseCLIPConfig", "ChineseCLIPOnnxConfig", "ChineseCLIPTextConfig", "ChineseCLIPVisionConfig"]
__all__ = ["ChineseCLIPConfig", "ChineseCLIPTextConfig", "ChineseCLIPVisionConfig"]

View File

@ -14,16 +14,7 @@
# limitations under the License.
"""CLIP model configuration"""
from collections import OrderedDict
from collections.abc import Mapping
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from ...processing_utils import ProcessorMixin
from ...configuration_utils import PreTrainedConfig
from ...onnx import OnnxConfig
from ...utils import logging
@ -364,52 +355,4 @@ class CLIPConfig(PreTrainedConfig):
super().__init__(**kwargs)
class CLIPOnnxConfig(OnnxConfig):
@property
def inputs(self) -> Mapping[str, Mapping[int, str]]:
return OrderedDict(
[
("input_ids", {0: "batch", 1: "sequence"}),
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
("attention_mask", {0: "batch", 1: "sequence"}),
]
)
@property
def outputs(self) -> Mapping[str, Mapping[int, str]]:
return OrderedDict(
[
("logits_per_image", {0: "batch"}),
("logits_per_text", {0: "batch"}),
("text_embeds", {0: "batch"}),
("image_embeds", {0: "batch"}),
]
)
@property
def atol_for_validation(self) -> float:
return 1e-4
def generate_dummy_inputs(
self,
processor: "ProcessorMixin",
batch_size: int = -1,
seq_length: int = -1,
) -> Mapping[str, Any]:
text_input_dict = super().generate_dummy_inputs(
processor.tokenizer,
batch_size=batch_size,
seq_length=seq_length,
)
image_input_dict = super().generate_dummy_inputs(
processor.image_processor,
batch_size=batch_size,
)
return {**text_input_dict, **image_input_dict}
@property
def default_onnx_opset(self) -> int:
return 14
__all__ = ["CLIPConfig", "CLIPOnnxConfig", "CLIPTextConfig", "CLIPVisionConfig"]
__all__ = ["CLIPConfig", "CLIPTextConfig", "CLIPVisionConfig"]

View File

@ -26,7 +26,17 @@ from ...modeling_attn_mask_utils import _create_4d_causal_attention_mask, _prepa
from ...modeling_layers import GradientCheckpointingLayer
from ...modeling_outputs import BaseModelOutput, BaseModelOutputWithPooling, ImageClassifierOutput
from ...modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel
from ...utils import ModelOutput, auto_docstring, can_return_tuple, filter_out_non_signature_kwargs, logging, torch_int
from ...processing_utils import Unpack
from ...utils import (
ModelOutput,
TransformersKwargs,
auto_docstring,
can_return_tuple,
filter_out_non_signature_kwargs,
logging,
torch_int,
)
from ...utils.generic import check_model_inputs
from .configuration_clip import CLIPConfig, CLIPTextConfig, CLIPVisionConfig
@ -260,8 +270,7 @@ def eager_attention_forward(
attention_mask: Optional[torch.Tensor],
scaling: float,
dropout: float = 0.0,
output_attentions: bool = True,
**kwargs,
**kwargs: Unpack[TransformersKwargs],
):
attn_weights = torch.matmul(query, key.transpose(-1, -2)) * scaling
if attention_mask is not None:
@ -271,8 +280,6 @@ def eager_attention_forward(
attn_output = torch.matmul(attn_weights, value)
attn_output = attn_output.transpose(1, 2).contiguous()
if not output_attentions:
attn_weights = None
return attn_output, attn_weights
@ -304,7 +311,7 @@ class CLIPAttention(nn.Module):
hidden_states: torch.Tensor,
attention_mask: Optional[torch.Tensor] = None,
causal_attention_mask: Optional[torch.Tensor] = None,
output_attentions: Optional[bool] = False,
**kwargs: Unpack[TransformersKwargs],
) -> tuple[torch.Tensor, Optional[torch.Tensor]]:
"""Input shape: Batch x Time x Channel"""
@ -340,14 +347,12 @@ class CLIPAttention(nn.Module):
is_causal=self.is_causal,
scaling=self.scale,
dropout=0.0 if not self.training else self.dropout,
output_attentions=output_attentions,
**kwargs,
)
attn_output = attn_output.reshape(batch_size, seq_length, embed_dim).contiguous()
attn_output = self.out_proj(attn_output)
if not output_attentions:
attn_weights = None
return attn_output, attn_weights
@ -380,18 +385,8 @@ class CLIPEncoderLayer(GradientCheckpointingLayer):
hidden_states: torch.Tensor,
attention_mask: torch.Tensor,
causal_attention_mask: torch.Tensor,
output_attentions: Optional[bool] = False,
) -> tuple[torch.FloatTensor]:
"""
Args:
hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
attention_mask (`torch.FloatTensor`): attention mask of size
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
`(config.encoder_attention_heads,)`.
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail.
"""
**kwargs: Unpack[TransformersKwargs],
) -> torch.FloatTensor:
residual = hidden_states
hidden_states = self.layer_norm1(hidden_states)
@ -399,7 +394,7 @@ class CLIPEncoderLayer(GradientCheckpointingLayer):
hidden_states=hidden_states,
attention_mask=attention_mask,
causal_attention_mask=causal_attention_mask,
output_attentions=output_attentions,
**kwargs,
)
hidden_states = residual + hidden_states
@ -408,12 +403,7 @@ class CLIPEncoderLayer(GradientCheckpointingLayer):
hidden_states = self.mlp(hidden_states)
hidden_states = residual + hidden_states
outputs = (hidden_states,)
if output_attentions:
outputs += (attn_weights,)
return outputs
return hidden_states
@auto_docstring
@ -426,6 +416,10 @@ class CLIPPreTrainedModel(PreTrainedModel):
_supports_flash_attn = True
_supports_flex_attn = True
_supports_attention_backend = True
_can_record_outputs = {
"hidden_states": CLIPEncoderLayer,
"attentions": CLIPAttention,
}
def _init_weights(self, module):
"""Initialize the weights"""
@ -504,8 +498,7 @@ class CLIPEncoder(nn.Module):
inputs_embeds,
attention_mask: Optional[torch.Tensor] = None,
causal_attention_mask: Optional[torch.Tensor] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
**kwargs: Unpack[TransformersKwargs],
) -> BaseModelOutput:
r"""
Args:
@ -527,46 +520,18 @@ class CLIPEncoder(nn.Module):
- 0 for tokens that are **masked**.
[What are attention masks?](../glossary#attention-mask)
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
returned tensors for more detail.
output_hidden_states (`bool`, *optional*):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
for more detail.
return_dict (`bool`, *optional*):
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
"""
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
)
encoder_states = () if output_hidden_states else None
all_attentions = () if output_attentions else None
hidden_states = inputs_embeds
for idx, encoder_layer in enumerate(self.layers):
if output_hidden_states:
encoder_states = encoder_states + (hidden_states,)
layer_outputs = encoder_layer(
for encoder_layer in self.layers:
hidden_states = encoder_layer(
hidden_states,
attention_mask,
causal_attention_mask,
output_attentions=output_attentions,
**kwargs,
)
hidden_states = layer_outputs[0]
if output_attentions:
all_attentions = all_attentions + (layer_outputs[1],)
if output_hidden_states:
encoder_states = encoder_states + (hidden_states,)
return BaseModelOutput(
last_hidden_state=hidden_states,
hidden_states=encoder_states,
attentions=all_attentions,
)
@ -588,14 +553,8 @@ class CLIPTextTransformer(nn.Module):
input_ids: Optional[torch.Tensor] = None,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.Tensor] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
**kwargs: Unpack[TransformersKwargs],
) -> BaseModelOutputWithPooling:
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
)
if input_ids is None:
raise ValueError("You have to specify input_ids")
@ -604,23 +563,18 @@ class CLIPTextTransformer(nn.Module):
hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
# CLIP's text model uses causal mask, prepare it here.
# https://github.com/openai/CLIP/blob/cfcffb90e69f37bf2ff1e988237a0fbe41f33c04/clip/model.py#L324
causal_attention_mask = _create_4d_causal_attention_mask(
input_shape, hidden_states.dtype, device=hidden_states.device
)
# expand attention_mask
if attention_mask is not None and self.config._attn_implementation != "flash_attention_2":
# [batch_size, seq_len] -> [batch_size, 1, tgt_seq_len, src_seq_len]
attention_mask = _prepare_4d_attention_mask(attention_mask, hidden_states.dtype)
encoder_outputs: BaseModelOutput = self.encoder(
inputs_embeds=hidden_states,
attention_mask=attention_mask,
causal_attention_mask=causal_attention_mask,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
**kwargs,
)
last_hidden_state = encoder_outputs.last_hidden_state
@ -651,8 +605,6 @@ class CLIPTextTransformer(nn.Module):
return BaseModelOutputWithPooling(
last_hidden_state=last_hidden_state,
pooler_output=pooled_output,
hidden_states=encoder_outputs.hidden_states,
attentions=encoder_outputs.attentions,
)
@ -680,6 +632,7 @@ class CLIPTextModel(CLIPPreTrainedModel):
def set_input_embeddings(self, value):
self.text_model.embeddings.token_embedding = value
@check_model_inputs()
@can_return_tuple
@auto_docstring
def forward(
@ -687,8 +640,7 @@ class CLIPTextModel(CLIPPreTrainedModel):
input_ids: Optional[torch.Tensor] = None,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.Tensor] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
**kwargs: Unpack[TransformersKwargs],
) -> BaseModelOutputWithPooling:
r"""
Examples:
@ -710,8 +662,7 @@ class CLIPTextModel(CLIPPreTrainedModel):
input_ids=input_ids,
attention_mask=attention_mask,
position_ids=position_ids,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
**kwargs,
)
@ -730,15 +681,9 @@ class CLIPVisionTransformer(nn.Module):
def forward(
self,
pixel_values: Optional[torch.FloatTensor] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
interpolate_pos_encoding: Optional[bool] = False,
**kwargs: Unpack[TransformersKwargs],
) -> BaseModelOutputWithPooling:
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
)
if pixel_values is None:
raise ValueError("You have to specify pixel_values")
@ -747,8 +692,7 @@ class CLIPVisionTransformer(nn.Module):
encoder_outputs: BaseModelOutput = self.encoder(
inputs_embeds=hidden_states,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
**kwargs,
)
last_hidden_state = encoder_outputs.last_hidden_state
@ -758,8 +702,6 @@ class CLIPVisionTransformer(nn.Module):
return BaseModelOutputWithPooling(
last_hidden_state=last_hidden_state,
pooler_output=pooled_output,
hidden_states=encoder_outputs.hidden_states,
attentions=encoder_outputs.attentions,
)
@ -783,14 +725,14 @@ class CLIPVisionModel(CLIPPreTrainedModel):
def get_input_embeddings(self) -> nn.Module:
return self.vision_model.embeddings.patch_embedding
@check_model_inputs(tie_last_hidden_states=False)
@can_return_tuple
@auto_docstring
def forward(
self,
pixel_values: Optional[torch.FloatTensor] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
interpolate_pos_encoding: bool = False,
**kwargs: Unpack[TransformersKwargs],
) -> BaseModelOutputWithPooling:
r"""
Example:
@ -815,9 +757,8 @@ class CLIPVisionModel(CLIPPreTrainedModel):
return self.vision_model(
pixel_values=pixel_values,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
interpolate_pos_encoding=interpolate_pos_encoding,
**kwargs,
)
@ -947,9 +888,8 @@ class CLIPModel(CLIPPreTrainedModel):
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
return_loss: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
interpolate_pos_encoding: bool = False,
**kwargs: Unpack[TransformersKwargs],
) -> CLIPOutput:
r"""
return_loss (`bool`, *optional*):
@ -977,25 +917,17 @@ class CLIPModel(CLIPPreTrainedModel):
>>> logits_per_image = outputs.logits_per_image # this is the image-text similarity score
>>> probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
```"""
# Use CLIP model's config for some fields (if specified) instead of those of vision & text components.
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
)
vision_outputs: BaseModelOutputWithPooling = self.vision_model(
pixel_values=pixel_values,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
interpolate_pos_encoding=interpolate_pos_encoding,
**kwargs,
)
text_outputs: BaseModelOutputWithPooling = self.text_model(
input_ids=input_ids,
attention_mask=attention_mask,
position_ids=position_ids,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
**kwargs,
)
image_embeds = vision_outputs.pooler_output
@ -1054,6 +986,7 @@ class CLIPTextModelWithProjection(CLIPPreTrainedModel):
def set_input_embeddings(self, value):
self.text_model.embeddings.token_embedding = value
@check_model_inputs()
@can_return_tuple
@auto_docstring
def forward(
@ -1061,8 +994,7 @@ class CLIPTextModelWithProjection(CLIPPreTrainedModel):
input_ids: Optional[torch.Tensor] = None,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.Tensor] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
**kwargs: Unpack[TransformersKwargs],
) -> CLIPTextModelOutput:
r"""
Examples:
@ -1085,8 +1017,7 @@ class CLIPTextModelWithProjection(CLIPPreTrainedModel):
input_ids=input_ids,
attention_mask=attention_mask,
position_ids=position_ids,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
**kwargs,
)
pooled_output = text_outputs.pooler_output
text_embeds = self.text_projection(pooled_output)
@ -1094,8 +1025,6 @@ class CLIPTextModelWithProjection(CLIPPreTrainedModel):
return CLIPTextModelOutput(
text_embeds=text_embeds,
last_hidden_state=text_outputs.last_hidden_state,
hidden_states=text_outputs.hidden_states,
attentions=text_outputs.attentions,
)
@ -1119,14 +1048,14 @@ class CLIPVisionModelWithProjection(CLIPPreTrainedModel):
def get_input_embeddings(self) -> nn.Module:
return self.vision_model.embeddings.patch_embedding
@check_model_inputs(tie_last_hidden_states=False)
@can_return_tuple
@auto_docstring
def forward(
self,
pixel_values: Optional[torch.FloatTensor] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
interpolate_pos_encoding: bool = False,
**kwargs: Unpack[TransformersKwargs],
) -> CLIPVisionModelOutput:
r"""
Examples:
@ -1151,9 +1080,8 @@ class CLIPVisionModelWithProjection(CLIPPreTrainedModel):
vision_outputs: BaseModelOutputWithPooling = self.vision_model(
pixel_values=pixel_values,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
interpolate_pos_encoding=interpolate_pos_encoding,
**kwargs,
)
pooled_output = vision_outputs.pooler_output
image_embeds = self.visual_projection(pooled_output)
@ -1161,8 +1089,6 @@ class CLIPVisionModelWithProjection(CLIPPreTrainedModel):
return CLIPVisionModelOutput(
image_embeds=image_embeds,
last_hidden_state=vision_outputs.last_hidden_state,
hidden_states=vision_outputs.hidden_states,
attentions=vision_outputs.attentions,
)
@ -1191,14 +1117,14 @@ class CLIPForImageClassification(CLIPPreTrainedModel):
# Initialize weights and apply final processing
self.post_init()
@check_model_inputs()
@can_return_tuple
@auto_docstring
def forward(
self,
pixel_values: Optional[torch.Tensor] = None,
labels: Optional[torch.Tensor] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
**kwargs: Unpack[TransformersKwargs],
) -> ImageClassifierOutput:
r"""
labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
@ -1206,22 +1132,14 @@ class CLIPForImageClassification(CLIPPreTrainedModel):
config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
`config.num_labels > 1` a classification loss is computed (Cross-Entropy).
"""
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
)
outputs: BaseModelOutputWithPooling = self.vision_model(
pixel_values,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
**kwargs,
)
sequence_output = outputs.last_hidden_state
# average pool the patch tokens
sequence_output = torch.mean(sequence_output[:, 1:, :], dim=1)
# apply classifier
logits = self.classifier(sequence_output)
loss = None
@ -1231,8 +1149,6 @@ class CLIPForImageClassification(CLIPPreTrainedModel):
return ImageClassifierOutput(
loss=loss,
logits=logits,
hidden_states=outputs.hidden_states,
attentions=outputs.attentions,
)

Some files were not shown because too many files have changed in this diff Show More