mirror of
https://github.com/huggingface/transformers.git
synced 2025-10-23 19:04:35 +08:00
Compare commits
16 Commits
modernbert
...
vision_vis
Author | SHA1 | Date | |
---|---|---|---|
557ecce22e | |||
f3b187027a | |||
2767a59df9 | |||
c9f1003c70 | |||
b356fce1da | |||
af7f75e682 | |||
34ba5909a2 | |||
fbec904fb0 | |||
a1263dfe7b | |||
1878d6c4ff | |||
a6a18efe53 | |||
e581d2f2ce | |||
1f6822d114 | |||
edb70ae15c | |||
27bc371bea | |||
58c619e809 |
@ -32,7 +32,7 @@
|
||||
لتصدير نموذج 🤗 Transformers إلى ONNX، قم أولاً بتثبيت اعتماد إضافي:
|
||||
|
||||
```bash
|
||||
pip install optimum-onnx
|
||||
pip install optimum[exporters]
|
||||
```
|
||||
|
||||
للاطلاع على جميع المعامﻻت المتاحة، يرجى الرجوع إلى [وثائق 🤗 Optimum](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli)، أو عرض المساعدة في سطر الأوامر:
|
||||
@ -111,3 +111,60 @@ optimum-cli export onnx --model keras-io/transformers-qa distilbert_base_cased_s
|
||||
### تصدير نموذج لهندسة غير مدعومة
|
||||
|
||||
إذا كنت ترغب في المساهمة من خلال إضافة دعم لنموذج لا يُمكن تصديره حاليًا، فيجب عليك أولاً التحقق مما إذا كان مدعومًا في [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview)، وإذا لم يكن مدعومًا، [فيمكنك المساهمة في 🤗 Optimum](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute) مُباشرةً.
|
||||
|
||||
### تصدير نموذج باستخدام `transformers.onnx`
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
لم يعد يتم دعم `transformers.onnx` يُرجى تصدير النماذج باستخدام 🤗 Optimum كما هو موضح أعلاه. سيتم إزالة هذا القسم في الإصدارات القادمة.
|
||||
|
||||
</Tip>
|
||||
|
||||
لتصدير نموذج 🤗 Transformers إلى ONNX باستخدام `transformers.onnx`، ثبّت التبعيات الإضافية:
|
||||
|
||||
```bash
|
||||
pip install transformers[onnx]
|
||||
```
|
||||
|
||||
استخدم حزمة `transformers.onnx` كنموذج Python لتصدير نقطة حفظ باستخدام تكوين جاهز:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
|
||||
```
|
||||
|
||||
يُصدّر هذا رسمًا بيانيًا ONNX لنقطة الحفظ المُحددة بواسطة وسيطة `--model`. مرر أي نقطة حفظ على 🤗 Hub أو نقطة حفظ مُخزنة محليًا.
|
||||
يُمكن بعد ذلك تشغيل ملف `model.onnx` الناتج على أحد المُسرعات العديدة التي تدعم معيار ONNX. على سبيل المثال، قم بتحميل وتشغيل النموذج باستخدام ONNX Runtime كما يلي:
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoTokenizer
|
||||
>>> from onnxruntime import InferenceSession
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
>>> session = InferenceSession("onnx/model.onnx")
|
||||
>>> # يتوقع ONNX Runtime مصفوفات NumPy كمدخلات
|
||||
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
|
||||
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
|
||||
```
|
||||
|
||||
يُمكن الحصول على أسماء المخرجات المطلوبة (مثل `["last_hidden_state"]`) من خلال إلقاء نظرة على تكوين ONNX لكل نموذج. على سبيل المثال، بالنسبة لـ DistilBERT، لدينا:
|
||||
|
||||
```python
|
||||
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
|
||||
|
||||
>>> config = DistilBertConfig()
|
||||
>>> onnx_config = DistilBertOnnxConfig(config)
|
||||
>>> print(list(onnx_config.outputs.keys()))
|
||||
["last_hidden_state"]
|
||||
```
|
||||
|
||||
العمليات مُتطابقة لنقاط الحفظ TensorFlow على Hub. على سبيل المثال، صدّر نقطة حفظ TensorFlow خالصة كما يلي:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
|
||||
```
|
||||
|
||||
لتصدير نموذج مُخزن محليًا، احفظ أوزان النموذج ومجزىء اللغوى في نفس الدليل (على سبيل المثال `local-pt-checkpoint`)، ثم قم بتصديره إلى ONNX عن طريق توجيه وسيط `--model` لحزمة `transformers.onnx` إلى الدليل المطلوب:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=local-pt-checkpoint onnx/
|
||||
```
|
@ -88,8 +88,6 @@
|
||||
title: Tool use
|
||||
- local: chat_templating_writing
|
||||
title: Writing a chat template
|
||||
- local: chat_response_parsing
|
||||
title: Response parsing
|
||||
title: Chat with models
|
||||
- sections:
|
||||
- local: serving
|
||||
|
@ -95,12 +95,9 @@ print(tokenizer.decode(outputs[0][len(inputs["input_ids"][0]):]))
|
||||
|
||||
The chat model called the `get_current_temperature` tool with the correct parameters from the docstring. It inferred France as the location based on Paris, and that it should use Celsius for the units of temperature.
|
||||
|
||||
A model **cannot actually call the tool itself**. It requests a tool call, and it's your job to handle the call and append it and the result to the chat history. For
|
||||
models that support [response parsing](./chat_response_parsing), the response parsing will be handled automatically, and you can just use
|
||||
[`~PreTrainedTokenizer.parse_response] to extract the tool call. For other models, you'll need to manually translate the output
|
||||
string into a tool call dict.
|
||||
A model **cannot actually call the tool itself**. It requests a tool call, and it's your job to handle the call and append it and the result to the chat history.
|
||||
|
||||
Regardless of the approach you use, the tool call should go in the `tool_calls` key of an `assistant` message. This is the recommended API, and should be supported by the chat template of most tool-using models.
|
||||
Hold the call in the `tool_calls` key of an `assistant` message. This is the recommended API, and should be supported by the chat template of most tool-using models.
|
||||
|
||||
> [!WARNING]
|
||||
> Although `tool_calls` is similar to the OpenAI API, the OpenAI API uses a JSON string as its `tool_calls` format. This may cause errors or strange model behavior if used in Transformers, which expects a dict.
|
||||
|
@ -1,233 +0,0 @@
|
||||
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Response Parsing
|
||||
|
||||
It is increasingly common for chat models to generate structured outputs, rather than just a single reply string.
|
||||
The most common uses for structured outputs are [tool calling](./chat_extras) and [reasoning models](https://huggingface.co/reasoning-course).
|
||||
Tool calling models can output tool calls, containing the name of the tool to call and any arguments to be passed to it,
|
||||
while reasoning models often output reasoning steps as a "chain of thought". Some recent models even use both of these,
|
||||
and may output reasoning and/or one or more tool calls before their final answer.
|
||||
|
||||
Models with structured outputs pose a challenge for chat templating, because the output needs to be parsed before it
|
||||
can be appended to the chat. For a concrete example, let's say we ask [GPT-OSS](https://huggingface.co/openai/gpt-oss-120b)
|
||||
what the weather is like, and it thinks and decides to call a tool. Here's what the raw model output might look like:
|
||||
|
||||
```txt
|
||||
<|start|>analysis<|message|>The user asks: "What is the weather like in SF?" We need to get the location of the user? The user explicitly asks about SF (San Francisco).
|
||||
So we need to get the current weather in San Francisco, CA. We need to call get_current_weather function. But we need to call function to get weather data.
|
||||
So we should call get_current_weather with location "San Francisco, CA". Let's do that.
|
||||
We will call function get_current_weather.<|end|><|start|>commentary to=functions.get_current_weather<|channel|>commentary <|constrain|>json<|message|>{"location":"San Francisco, CA"}<|call|>
|
||||
}
|
||||
```
|
||||
|
||||
But if you want to append this to a chat, you'll need to format it as a chat message dict, like this:
|
||||
|
||||
```json
|
||||
{
|
||||
"role": "assistant",
|
||||
"thinking": "The user asks: \"What is the weather like in SF?\" We need to get the location of the user? The user explicitly asks about SF (San Francisco). So we need to get the current weather in San Francisco, CA. We need to call get_current_weather function. But we need to call function to get weather data. So we should call get_current_weather with location \"San Francisco, CA\". Let's do that.",
|
||||
"tool_calls": [
|
||||
{
|
||||
"name": "get_current_weather",
|
||||
"arguments": {
|
||||
"location": "San Francisco, CA"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Chat **templates** give us a way to turn messages into formatted input for a model, but we need something else to
|
||||
parse model output back into a standard message dict. This is what chat **parsing** is for.
|
||||
|
||||
## The [parse_response](~PreTrainedTokenizerBase.parse_response) method
|
||||
|
||||
Parsing a chat response on a model that supports it is straightforward. Simply take the raw, decoded output from
|
||||
[generate](`~generation.GenerationMixin.generate`), and pass it to the tokenizer's [parse_response](~PreTrainedTokenizerBase.parse_response) method:
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
checkpoint = "HuggingFaceTB/SmolLM3-3B"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
|
||||
model = AutoModelForCausalLM.from_pretrained(checkpoint, dtype="auto", device_map="auto")
|
||||
|
||||
messages = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Hey! Can you summarize the end of the Cold War as briefly as possible? Like, comically briefly. It should really leave out almost most of the relevant information."
|
||||
}
|
||||
]
|
||||
|
||||
input_ids = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
add_generation_prompt=True,
|
||||
tokenize=True,
|
||||
return_tensors="pt"
|
||||
).to(model.device)
|
||||
|
||||
outputs = model.generate(input_ids, max_new_tokens=1024)[0, input_ids.shape[1]:]
|
||||
out_text = tokenizer.decode(outputs)
|
||||
parsed = tokenizer.parse_response(out_text)
|
||||
print(parsed.keys())
|
||||
```
|
||||
|
||||
And you should get:
|
||||
|
||||
```text
|
||||
dict_keys(['thinking', 'content'])
|
||||
```
|
||||
|
||||
And that's all you need to start using response parsing! `parse_response` should return a complete message dict that is ready to be appended to the chat history.
|
||||
When the tokenizer does not support response parsing, `parse_response` will throw an error. We hope to add support
|
||||
to more tokenizers over time.
|
||||
|
||||
## Developers: Understanding a simple response schema
|
||||
|
||||
Under the hood, `parse_response` uses a **JSON schema** to parse the model output. A JSON schema represents
|
||||
the structure of the output message dict. The schema is augmented with additional fields that indicate how the
|
||||
output message string should be parsed into the expected format. Let's take a look at the schema for a SmolLM response,
|
||||
excluding tool calls for now:
|
||||
|
||||
```python
|
||||
{
|
||||
"x-regex": "(?:<think>\n?(?P<thinking>.+?)\n?</think>)?\s*(?P<content>.+?)?\s*(?:<\|im_end\|>|$)",
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"role": {"const": "assistant"},
|
||||
"content": {"type": "string"},
|
||||
"thinking": {"type": "string"}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
We can see that the schema describes a JSON "object" (a `dict`, in other words) with three keys: `role`, `content`, and `thinking`.
|
||||
Because all assistant responses have the role "assistant", the `role` key is a `const`(ant). The other two keys are strings, extracted
|
||||
from the named groups in the regex in the `x-regex` field.
|
||||
|
||||
Like chat templates, response schemas are set as a property of the tokenizer. To enable response parsing, all you need
|
||||
to do is set `tokenizer.response_schema` to a valid schema dict, and `tokenizer.parse_response()` will work! Again, like
|
||||
chat templates, this schema will be saved with the processor, so once you set it, you can use `save_pretrained()` or `push_to_hub()` to
|
||||
save and share the schema.
|
||||
|
||||
## Developers: Complex schemas
|
||||
|
||||
Now, let's look at a more complex schema, which includes tool calls, to gain more of an understanding of the parser
|
||||
internals. For this, we'll use the `GPT-OSS` schema. GPT-OSS emits both tool calls and thinking blocks, and it uses
|
||||
an unusual format where model responses are tagged with one of three "channels": `commentary` for things like
|
||||
tool calls, `analysis` for chain of thought blocks, and `final` for messages intended to be sent to the user.
|
||||
A full message where the model calls a tool named `get_current_weather` might look like this, with some extra linebreaks added for clarity:
|
||||
|
||||
```text
|
||||
<|channel|>analysis<|message|>
|
||||
The user asks: "What is the weather like in SF?" So we need to get the current weather in San Francisco, CA.
|
||||
We need to call get_current_weather function. So we should call get_current_weather with location "San Francisco, CA".
|
||||
<|end|>
|
||||
<|start|>assistant<|channel|>commentary
|
||||
to=functions.get_current_weather <|constrain|>json<|message|>
|
||||
{
|
||||
"location": "San Francisco, CA"
|
||||
}
|
||||
<|call|>
|
||||
```
|
||||
|
||||
Parsing proceeds recursively; the output of a regex (or other parser) at one level becomes the input to the nodes below it.
|
||||
In other words, don't feel like you have to parse the entire output in one enormous regex! Instead, start with the schema,
|
||||
and then add regexes to extract the relevant chunks as you go. Here's a schema that will parse it, with some
|
||||
explanatory comments:
|
||||
|
||||
```python
|
||||
{
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"role": {"const": "assistant"},
|
||||
# "content" and "thinking" are both similar to the previous example, and just extract a single string
|
||||
# However, rather than using a single regex with named groups to extract both, we use a regex in each subkey.
|
||||
# When an object node has no parser/regex, the entire input string is passed to all of its children, so
|
||||
# parsing can either be done with named groups at the object level, or with separate regexes at the property level.
|
||||
"content": {"type": "string", "x-regex": r"<\|channel\|>final<\|message\|>(.*?)(?:<\|end\|>|$)"},
|
||||
"thinking": {"type": "string", "x-regex": r"<\|channel\|>analysis<\|message\|>(.*?)<\|end\|>"},
|
||||
"tool_calls": {
|
||||
# "x-regex-iterator" uses re.findall to find multiple possible manages, and returns them as an
|
||||
# array/list. You don't need to worry about array handling, though - each item in the array will be
|
||||
# parsed by the `items` schema, so just write the schema for a single item.
|
||||
"x-regex-iterator": r"<\|channel\|>commentary (to=functions\..*?<\|message\|>.*?)(?:<\|call\|>|$)",
|
||||
"type": "array",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
# A const property is a fixed value, and the input has no effect on it.
|
||||
"type": {"const": "function"},
|
||||
# Here, we wrap the entire tool call dict in a `{"function": ...}` block. The input string is passed through to it unchanged.
|
||||
"function": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {"type": "string", "x-regex": r"^to=functions\.(\w+)"},
|
||||
"arguments": {
|
||||
"type": "object",
|
||||
"x-regex": "<\|message\|>(.*)",
|
||||
# The "x-parser" field indicates that the extracted string should be parsed as JSON.
|
||||
# The output is then passed to the schema nodes below and recursive parsing continues.
|
||||
"x-parser": "json",
|
||||
"additionalProperties": {"type": "any"},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
## Developers: Understanding the parser logic
|
||||
|
||||
The parser follows a few simple rules:
|
||||
|
||||
1. Each level of the schema receives input from the level above, applies any regex or parser it has, and then passes the output to its children.
|
||||
2. The root level receives the entire decoded model output string as input.
|
||||
3. If a node has structured content after parsing (for example, if the regex has named groups and returns a dict, or if the parser returns a dict or list),
|
||||
then that structured content is mapped to the node's children, and each child node receives its corresponding value as input.
|
||||
4. If an `object` (dict) node has unstructured (string) output, then the entire string is passed to all of its children. This allows child nodes
|
||||
to handle parsing individually rather than requiring a single parent regex to extract all keys at once.
|
||||
5. If an `array` (list) node has unstructured (string) output, then this throws an error.
|
||||
|
||||
There is a small set of allowable `x-` keys that indicate how parsing should be done at each node:
|
||||
- `x-regex`: A regex string to apply to the input. If the regex has named groups, the output is a dict of group names to values. Named groups should only be used in `object` nodes.
|
||||
Otherwise, the regex must have exactly one unnamed capturing group, and the output is the value of that group as a string.
|
||||
- `x-regex-iterator`: A regex string to apply to the input using `re.findall()`. The output is a list of all matches.
|
||||
This should only be used in `array` nodes, and the regex must have exactly one unnamed capturing group. The output is distributed to
|
||||
the node's `items` schema.
|
||||
- `x-parser`: Calls a built-in parser to apply to the input. Currently, the only supported parser is `json`, which parses the input string as JSON.
|
||||
The output is passed to the child nodes for further parsing. Note that the `json` parser can return deeply nested output - in this case, the output
|
||||
will be progressively unwrapped as it is passed through child nodes. The child nodes do not need additional `x-parser` or `x-regex` fields in this case,
|
||||
but their structure must match the structure of the parsed JSON.
|
||||
- `x-parser-args`: Only allowed in conjunction with `x-parser`. This is a dict of additional arguments that control parsing. Right now, the only supported
|
||||
argument is `transform`, which specifies a `jmespath` transformation to apply to the output. This is useful when the JSON parser returns a structure
|
||||
that needs to be modified to match the schema.
|
||||
- `x-regex-key-value`: This is rarely necessary, but it can be useful when parsing key-value pairs in non-JSON format where the names of the keys are not known
|
||||
in advance, such as when a model emits XML tool calls with arbitrary argument names. The regex must have exactly two named capturing groups,
|
||||
`key` and `value`, and the output is a dict mapping keys to values. This should only be used in `object` nodes.
|
||||
|
||||
In general, multiple regexes/parsers cannot be combined at the same level. The exception is that `x-regex`, returning a single string, can be combined with the other parsers. In this case,
|
||||
`x-regex` is applied first, and then the output is passed to the other parser, either `x-regex-iterator`, `x-parser`, or `x-regex-key-value`.
|
||||
|
||||
Putting these ideas together, you can see that the input flows through the schema, being parsed at each level and then distributed to child nodes. Each level
|
||||
only needs to extract the input content that is relevant for that part of the schema, and can then let its child nodes handle the rest. Internally, this is handled
|
||||
with a parser function that receives input, applies any regexes/parsers at the current level, then maps the result to its child nodes before recursively calling itself on each of them.
|
||||
Recursion terminates when it reaches leaf nodes, usually primitive types like `string` or `number`, which simply return the input they receive.
|
@ -88,16 +88,16 @@ processed_outputs = processor.post_process_keypoint_matching(outputs, image_size
|
||||
import torch
|
||||
from PIL import Image
|
||||
import requests
|
||||
|
||||
|
||||
processor = AutoImageProcessor.from_pretrained("ETH-CVG/lightglue_superpoint")
|
||||
model = AutoModel.from_pretrained("ETH-CVG/lightglue_superpoint")
|
||||
|
||||
|
||||
# LightGlue requires pairs of images
|
||||
images = [image1, image2]
|
||||
inputs = processor(images, return_tensors="pt")
|
||||
with torch.inference_mode():
|
||||
outputs = model(**inputs)
|
||||
|
||||
|
||||
# Extract matching information
|
||||
keypoints0 = outputs.keypoints0 # Keypoints in first image
|
||||
keypoints1 = outputs.keypoints1 # Keypoints in second image
|
||||
@ -112,7 +112,7 @@ processed_outputs = processor.post_process_keypoint_matching(outputs, image_size
|
||||
# Process outputs for visualization
|
||||
image_sizes = [[(image.height, image.width) for image in images]]
|
||||
processed_outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2)
|
||||
|
||||
|
||||
for i, output in enumerate(processed_outputs):
|
||||
print(f"For the image pair {i}")
|
||||
for keypoint0, keypoint1, matching_score in zip(
|
||||
@ -147,13 +147,6 @@ processed_outputs = processor.post_process_keypoint_matching(outputs, image_size
|
||||
- post_process_keypoint_matching
|
||||
- visualize_keypoint_matching
|
||||
|
||||
## LightGlueImageProcessorFast
|
||||
|
||||
[[autodoc]] LightGlueImageProcessorFast
|
||||
- preprocess
|
||||
- post_process_keypoint_matching
|
||||
- visualize_keypoint_matching
|
||||
|
||||
## LightGlueForKeypointMatching
|
||||
|
||||
[[autodoc]] LightGlueForKeypointMatching
|
||||
|
@ -33,7 +33,7 @@ Export a Transformers model to ONNX with the Optimum CLI or the `optimum.onnxrun
|
||||
Run the command below to install Optimum and the [exporters](https://huggingface.co/docs/optimum/exporters/overview) module.
|
||||
|
||||
```bash
|
||||
pip install optimum-onnx
|
||||
pip install optimum[exporters]
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
|
50
docs/source/ja/main_classes/onnx.md
Normal file
50
docs/source/ja/main_classes/onnx.md
Normal file
@ -0,0 +1,50 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Exporting 🤗 Transformers models to ONNX
|
||||
|
||||
🤗 Transformers は `transformers.onnx` パッケージを提供します。
|
||||
設定オブジェクトを利用することで、モデルのチェックポイントをONNXグラフに変換することができます。
|
||||
|
||||
詳細は[ガイド](../serialization) を参照してください。
|
||||
を参照してください。
|
||||
|
||||
## ONNX Configurations
|
||||
|
||||
以下の3つの抽象クラスを提供しています。
|
||||
エクスポートしたいモデルアーキテクチャのタイプに応じて、継承すべき3つの抽象クラスを提供します:
|
||||
|
||||
* エンコーダーベースのモデルは [`~onnx.config.OnnxConfig`] を継承します。
|
||||
* デコーダーベースのモデルは [`~onnx.config.OnnxConfigWithPast`] を継承します。
|
||||
* エンコーダー・デコーダーモデルは [`~onnx.config.OnnxSeq2SeqConfigWithPast`] を継承しています。
|
||||
|
||||
|
||||
### OnnxConfig
|
||||
|
||||
[[autodoc]] onnx.config.OnnxConfig
|
||||
|
||||
### OnnxConfigWithPast
|
||||
|
||||
[[autodoc]] onnx.config.OnnxConfigWithPast
|
||||
|
||||
### OnnxSeq2SeqConfigWithPast
|
||||
|
||||
[[autodoc]] onnx.config.OnnxSeq2SeqConfigWithPast
|
||||
|
||||
## ONNX Features
|
||||
|
||||
各 ONNX 構成は、次のことを可能にする一連の _機能_ に関連付けられています。
|
||||
さまざまなタイプのトポロジまたはタスクのモデルをエクスポートします。
|
@ -47,7 +47,7 @@ ONNX形式にエクスポートされたモデルは、以下のように使用
|
||||
🤗 TransformersモデルをONNXにエクスポートするには、まず追加の依存関係をインストールしてください:
|
||||
|
||||
```bash
|
||||
pip install optimum-onnx
|
||||
pip install optimum[exporters]
|
||||
```
|
||||
|
||||
すべての利用可能な引数を確認するには、[🤗 Optimumドキュメント](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli)を参照してください。または、コマンドラインでヘルプを表示することもできます:
|
||||
@ -128,3 +128,64 @@ CLIの代わりに、🤗 TransformersモデルをONNXにプログラム的に
|
||||
### Exporting a model for an unsupported architecture
|
||||
|
||||
現在エクスポートできないモデルをサポートするために貢献したい場合、まず[`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview)でサポートされているかどうかを確認し、サポートされていない場合は[🤗 Optimumに貢献](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute)してください。
|
||||
|
||||
### Exporting a model with `transformers.onnx`
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
`transformers.onnx`はもはやメンテナンスされていないため、モデルを上記で説明したように🤗 Optimumでエクスポートしてください。このセクションは将来のバージョンで削除されます。
|
||||
|
||||
</Tip>
|
||||
|
||||
🤗 TransformersモデルをONNXにエクスポートするには、追加の依存関係をインストールしてください:
|
||||
|
||||
|
||||
```bash
|
||||
pip install transformers[onnx]
|
||||
```
|
||||
|
||||
`transformers.onnx`パッケージをPythonモジュールとして使用して、事前に用意された設定を使用してチェックポイントをエクスポートする方法は以下の通りです:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
|
||||
```
|
||||
|
||||
この方法は、`--model`引数で定義されたチェックポイントのONNXグラフをエクスポートします。🤗 Hubのいずれかのチェックポイントまたはローカルに保存されたチェックポイントを渡すことができます。エクスポートされた`model.onnx`ファイルは、ONNX標準をサポートする多くのアクセラレータで実行できます。例えば、ONNX Runtimeを使用してモデルを読み込んで実行する方法は以下の通りです:
|
||||
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoTokenizer
|
||||
>>> from onnxruntime import InferenceSession
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
>>> session = InferenceSession("onnx/model.onnx")
|
||||
>>> # ONNX Runtime expects NumPy arrays as input
|
||||
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
|
||||
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
|
||||
```
|
||||
|
||||
必要な出力名(例: `["last_hidden_state"]`)は、各モデルのONNX構成を確認することで取得できます。例えば、DistilBERTの場合、次のようになります:
|
||||
|
||||
|
||||
```python
|
||||
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
|
||||
|
||||
>>> config = DistilBertConfig()
|
||||
>>> onnx_config = DistilBertOnnxConfig(config)
|
||||
>>> print(list(onnx_config.outputs.keys()))
|
||||
["last_hidden_state"]
|
||||
```
|
||||
|
||||
ハブから純粋なTensorFlowのチェックポイントをプログラム的にエクスポートするプロセスは、以下のように同様です:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
|
||||
```
|
||||
|
||||
ローカルに保存されたモデルをエクスポートする場合、モデルの重みとトークナイザのファイルを同じディレクトリに保存してください(例: `local-pt-checkpoint`)。その後、`transformers.onnx`パッケージの `--model`引数を希望するディレクトリに向けて設定して、ONNXにエクスポートします:
|
||||
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=local-pt-checkpoint onnx/
|
||||
```
|
||||
|
||||
|
45
docs/source/ko/main_classes/onnx.md
Normal file
45
docs/source/ko/main_classes/onnx.md
Normal file
@ -0,0 +1,45 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# 🤗 Transformers 모델을 ONNX로 내보내기[[exporting--transformers-models-to-onnx]]
|
||||
|
||||
🤗 트랜스포머는 `transformers.onnx` 패키지를 제공하며, 이 패키지는 설정 객체를 활용하여 모델 체크포인트를 ONNX 그래프로 변환할 수 있게 합니다.
|
||||
|
||||
🤗 Transformers에 대한 자세한 내용은 [이 가이드](../serialization)를 참조하세요.
|
||||
|
||||
## ONNX 설정[[onnx-configurations]]
|
||||
|
||||
내보내려는(export) 모델 아키텍처의 유형에 따라 상속받아야 할 세 가지 추상 클래스를 제공합니다:
|
||||
|
||||
* 인코더 기반 모델은 [`~onnx.config.OnnxConfig`]을 상속받습니다.
|
||||
* 디코더 기반 모델은 [`~onnx.config.OnnxConfigWithPast`]을 상속받습니다.
|
||||
* 인코더-디코더 기반 모델은 [`~onnx.config.OnnxSeq2SeqConfigWithPast`]을 상속받습니다.
|
||||
|
||||
### OnnxConfig[[transformers.onnx.OnnxConfig]]
|
||||
|
||||
[[autodoc]] onnx.config.OnnxConfig
|
||||
|
||||
### OnnxConfigWithPast[[transformers.onnx.OnnxConfigWithPast]]
|
||||
|
||||
[[autodoc]] onnx.config.OnnxConfigWithPast
|
||||
|
||||
### OnnxSeq2SeqConfigWithPast[[OnnxSeq2SeqConfigWithPast]]
|
||||
|
||||
[[autodoc]] onnx.config.OnnxSeq2SeqConfigWithPast
|
||||
|
||||
## ONNX 특징[[onnx-features]]
|
||||
|
||||
각 ONNX 설정은 다양한 유형의 토폴로지나 작업에 대해 모델을 내보낼 수 있게(exporting) 해주는 _features_ 세트와 연관되어 있습니다.
|
@ -47,7 +47,7 @@ ONNX 형식으로 내보낸 모델은 다음과 같이 사용할 수 있습니
|
||||
🤗 Transformers 모델을 ONNX로 내보내려면 먼저 추가 종속성을 설치하세요:
|
||||
|
||||
```bash
|
||||
pip install optimum-onnx
|
||||
pip install optimum[exporters]
|
||||
```
|
||||
|
||||
사용 가능한 모든 인수를 확인하려면 [🤗 Optimum 문서](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli)를 참조하거나 명령줄에서 도움말을 보세요.
|
||||
@ -123,3 +123,59 @@ CLI 대신에 `optimum.onnxruntime`을 사용하여 프로그래밍 방식으로
|
||||
### 지원되지 않는 아키텍처의 모델 내보내기 [[exporting-a-model-for-an-unsupported-architecture]]
|
||||
|
||||
현재 내보낼 수 없는 모델을 지원하기 위해 기여하려면, 먼저 [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview)에서 지원되는지 확인한 후 지원되지 않는 경우에는 [🤗 Optimum에 기여](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute)하세요.
|
||||
|
||||
### `transformers.onnx`를 사용하여 모델 내보내기 [[exporting-a-model-with-transformersonnx]]
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
`tranformers.onnx`는 더 이상 유지되지 않습니다. 위에서 설명한 대로 🤗 Optimum을 사용하여 모델을 내보내세요. 이 섹션은 향후 버전에서 제거될 예정입니다.
|
||||
|
||||
</Tip>
|
||||
|
||||
🤗 Transformers 모델을 ONNX로 내보내려면 추가 종속성을 설치하세요:
|
||||
|
||||
```bash
|
||||
pip install transformers[onnx]
|
||||
```
|
||||
|
||||
`transformers.onnx` 패키지를 Python 모듈로 사용하여 준비된 구성을 사용하여 체크포인트를 내보냅니다:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
|
||||
```
|
||||
|
||||
이렇게 하면 `--model` 인수에 정의된 체크포인트의 ONNX 그래프가 내보내집니다. 🤗 Hub에서 제공하는 체크포인트나 로컬에 저장된 체크포인트를 전달할 수 있습니다. 결과로 생성된 `model.onnx` 파일은 ONNX 표준을 지원하는 많은 가속기 중 하나에서 실행할 수 있습니다. 예를 들어, 다음과 같이 ONNX Runtime을 사용하여 모델을 로드하고 실행할 수 있습니다:
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoTokenizer
|
||||
>>> from onnxruntime import InferenceSession
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
>>> session = InferenceSession("onnx/model.onnx")
|
||||
>>> # ONNX Runtime expects NumPy arrays as input
|
||||
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
|
||||
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
|
||||
```
|
||||
|
||||
필요한 출력 이름(예: `["last_hidden_state"]`)은 각 모델의 ONNX 구성을 확인하여 얻을 수 있습니다. 예를 들어, DistilBERT의 경우 다음과 같습니다:
|
||||
|
||||
```python
|
||||
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
|
||||
|
||||
>>> config = DistilBertConfig()
|
||||
>>> onnx_config = DistilBertOnnxConfig(config)
|
||||
>>> print(list(onnx_config.outputs.keys()))
|
||||
["last_hidden_state"]
|
||||
```
|
||||
|
||||
Hub의 TensorFlow 체크포인트에 대해서도 동일한 프로세스가 적용됩니다. 예를 들어, 다음과 같이 순수한 TensorFlow 체크포인트를 내보냅니다:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
|
||||
```
|
||||
|
||||
로컬에 저장된 모델을 내보내려면 모델의 가중치 파일과 토크나이저 파일을 동일한 디렉토리에 저장한 다음, transformers.onnx 패키지의 --model 인수를 원하는 디렉토리로 지정하여 ONNX로 내보냅니다:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=local-pt-checkpoint onnx/
|
||||
```
|
45
docs/source/zh/main_classes/onnx.md
Normal file
45
docs/source/zh/main_classes/onnx.md
Normal file
@ -0,0 +1,45 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# 导出 🤗 Transformers 模型到 ONNX
|
||||
|
||||
🤗 Transformers提供了一个`transformers.onnx`包,通过利用配置对象,您可以将模型checkpoints转换为ONNX图。
|
||||
|
||||
有关更多详细信息,请参阅导出 🤗 Transformers 模型的[指南](../serialization)。
|
||||
|
||||
## ONNX Configurations
|
||||
|
||||
我们提供了三个抽象类,取决于您希望导出的模型架构类型:
|
||||
|
||||
* 基于编码器的模型继承 [`~onnx.config.OnnxConfig`]
|
||||
* 基于解码器的模型继承 [`~onnx.config.OnnxConfigWithPast`]
|
||||
* 编码器-解码器模型继承 [`~onnx.config.OnnxSeq2SeqConfigWithPast`]
|
||||
|
||||
### OnnxConfig
|
||||
|
||||
[[autodoc]] onnx.config.OnnxConfig
|
||||
|
||||
### OnnxConfigWithPast
|
||||
|
||||
[[autodoc]] onnx.config.OnnxConfigWithPast
|
||||
|
||||
### OnnxSeq2SeqConfigWithPast
|
||||
|
||||
[[autodoc]] onnx.config.OnnxSeq2SeqConfigWithPast
|
||||
|
||||
## ONNX Features
|
||||
|
||||
每个ONNX配置与一组 _特性_ 相关联,使您能够为不同类型的拓扑结构或任务导出模型。
|
@ -47,7 +47,7 @@ rendered properly in your Markdown viewer.
|
||||
要将 🤗 Transformers 模型导出为 ONNX,首先需要安装额外的依赖项:
|
||||
|
||||
```bash
|
||||
pip install optimum-onnx
|
||||
pip install optimum[exporters]
|
||||
```
|
||||
|
||||
请参阅 [🤗 Optimum 文档](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli) 以查看所有可用参数,或者在命令行中查看帮助:
|
||||
@ -117,3 +117,53 @@ optimum-cli export onnx --model local_path --task question-answering distilbert_
|
||||
### 导出尚未支持的架构的模型
|
||||
|
||||
如果你想要为当前无法导出的模型添加支持,请先检查 [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview) 是否支持该模型,如果不支持,你可以 [直接为 🤗 Optimum 贡献代码](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute)。
|
||||
|
||||
### 使用 `transformers.onnx` 导出模型
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
`transformers.onnx` 不再进行维护,请如上所述,使用 🤗 Optimum 导出模型。这部分内容将在未来版本中删除。
|
||||
|
||||
</Tip>
|
||||
|
||||
要使用 `transformers.onnx` 将 🤗 Transformers 模型导出为 ONNX,请安装额外的依赖项:
|
||||
|
||||
```bash
|
||||
pip install transformers[onnx]
|
||||
```
|
||||
|
||||
将 `transformers.onnx` 包作为 Python 模块使用,以使用现成的配置导出检查点:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
|
||||
```
|
||||
|
||||
以上代码将导出由 `--model` 参数定义的检查点的 ONNX 图。传入任何 🤗 Hub 上或者存储与本地的检查点。生成的 `model.onnx` 文件可以在支持 ONNX 标准的众多加速引擎上运行。例如,使用 ONNX Runtime 加载并运行模型,如下所示:
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoTokenizer
|
||||
>>> from onnxruntime import InferenceSession
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
>>> session = InferenceSession("onnx/model.onnx")
|
||||
>>> # ONNX Runtime expects NumPy arrays as input
|
||||
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
|
||||
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
|
||||
```
|
||||
|
||||
可以通过查看每个模型的 ONNX 配置来获取所需的输出名(例如 `["last_hidden_state"]`)。例如,对于 DistilBERT,可以用以下代码获取输出名称:
|
||||
|
||||
```python
|
||||
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
|
||||
|
||||
>>> config = DistilBertConfig()
|
||||
>>> onnx_config = DistilBertOnnxConfig(config)
|
||||
>>> print(list(onnx_config.outputs.keys()))
|
||||
["last_hidden_state"]
|
||||
```
|
||||
|
||||
要导出本地存储的模型,请将模型的权重和分词器文件保存在同一目录中(例如 `local-pt-checkpoint`),然后通过将 `transformers.onnx` 包的 `--model` 参数指向该目录,将其导出为 ONNX:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=local-pt-checkpoint onnx/
|
||||
```
|
||||
|
@ -125,23 +125,15 @@ def token_type_ids_mask_function(
|
||||
# If it's 1 for both query and key/value, we are in an image block
|
||||
# NOTE: static cache shape goes beyond input seq length, while token_type_ids.shape[1] == input seq length
|
||||
# Since vmap doesn't support `if statement` we workaround it with `torch.where`
|
||||
safe_q_idx = torch.where(q_idx < token_type_ids.shape[1], q_idx, 0)
|
||||
safe_kv_idx = torch.where(kv_idx < token_type_ids.shape[1], kv_idx, 0)
|
||||
|
||||
token_type_ids_at_q_idx = token_type_ids[batch_idx, safe_q_idx]
|
||||
token_type_ids_at_q_idx = torch.where(q_idx < token_type_ids.shape[1], token_type_ids_at_q_idx, 0)
|
||||
|
||||
token_type_ids_at_kv_idx = token_type_ids[batch_idx, safe_kv_idx]
|
||||
safe_idx = torch.where(kv_idx < token_type_ids.shape[1], kv_idx, 0)
|
||||
token_type_ids_at_kv_idx = token_type_ids[batch_idx, safe_idx]
|
||||
token_type_ids_at_kv_idx = torch.where(kv_idx < token_type_ids.shape[1], token_type_ids_at_kv_idx, 0)
|
||||
|
||||
image_group_ids_at_q_idx = image_group_ids[batch_idx, safe_q_idx]
|
||||
image_group_ids_at_q_idx = torch.where(q_idx < image_group_ids.shape[1], image_group_ids_at_q_idx, -1)
|
||||
|
||||
image_group_ids_at_kv_idx = image_group_ids[batch_idx, safe_kv_idx]
|
||||
image_group_ids_at_kv_idx = image_group_ids[batch_idx, safe_idx]
|
||||
image_group_ids_at_kv_idx = torch.where(kv_idx < image_group_ids.shape[1], image_group_ids_at_kv_idx, -1)
|
||||
|
||||
is_image_block = (token_type_ids_at_q_idx == 1) & (token_type_ids_at_kv_idx == 1)
|
||||
same_image_block = image_group_ids_at_q_idx == image_group_ids_at_kv_idx
|
||||
is_image_block = (token_type_ids[batch_idx, q_idx] == 1) & (token_type_ids_at_kv_idx == 1)
|
||||
same_image_block = image_group_ids[batch_idx, q_idx] == image_group_ids_at_kv_idx
|
||||
|
||||
# This is bidirectional attention whenever we are dealing with image tokens
|
||||
return is_image_block & same_image_block
|
||||
|
3
setup.py
3
setup.py
@ -117,7 +117,6 @@ _deps = [
|
||||
"importlib_metadata",
|
||||
"ipadic>=1.0.0,<2.0",
|
||||
"jinja2>=3.1.0",
|
||||
"jmespath>=1.0.1",
|
||||
"kenlm",
|
||||
"kernels>=0.10.2,<0.11",
|
||||
"librosa",
|
||||
@ -295,7 +294,7 @@ extras["num2words"] = deps_list("num2words")
|
||||
extras["sentencepiece"] = deps_list("sentencepiece", "protobuf")
|
||||
extras["tiktoken"] = deps_list("tiktoken", "blobfile")
|
||||
extras["mistral-common"] = deps_list("mistral-common[opencv]")
|
||||
extras["chat_template"] = deps_list("jinja2", "jmespath")
|
||||
extras["chat_template"] = deps_list("jinja2")
|
||||
extras["testing"] = (
|
||||
deps_list(
|
||||
"pytest",
|
||||
|
@ -129,6 +129,8 @@ _import_structure = {
|
||||
],
|
||||
"loss": [],
|
||||
"modelcard": ["ModelCard"],
|
||||
# Models
|
||||
"onnx": [],
|
||||
"pipelines": [
|
||||
"AudioClassificationPipeline",
|
||||
"AutomaticSpeechRecognitionPipeline",
|
||||
|
@ -27,7 +27,6 @@ deps = {
|
||||
"importlib_metadata": "importlib_metadata",
|
||||
"ipadic": "ipadic>=1.0.0,<2.0",
|
||||
"jinja2": "jinja2>=3.1.0",
|
||||
"jmespath": "jmespath>=1.0.1",
|
||||
"kenlm": "kenlm",
|
||||
"kernels": "kernels>=0.10.2,<0.11",
|
||||
"librosa": "librosa",
|
||||
|
@ -27,6 +27,7 @@ from ...utils.metrics import traced
|
||||
logger = logging.getLogger("ContinuousBatchingLogger")
|
||||
|
||||
|
||||
@staticmethod
|
||||
def get_device_and_memory_breakdown() -> tuple[torch.device, int, int, int]:
|
||||
if torch.cuda.is_available():
|
||||
device = torch.device("cuda")
|
||||
|
@ -442,6 +442,75 @@ def normalize(
|
||||
return image
|
||||
|
||||
|
||||
def unnormalize(
|
||||
image: Union[np.ndarray, "torch.Tensor"],
|
||||
mean: Union[float, Collection[float]],
|
||||
std: Union[float, Collection[float]],
|
||||
data_format: Optional[ChannelDimension] = None,
|
||||
input_data_format: Optional[Union[str, ChannelDimension]] = None,
|
||||
) -> np.ndarray:
|
||||
"""
|
||||
Inverse of `normalize`:
|
||||
|
||||
image = image * std + mean
|
||||
|
||||
Args:
|
||||
image (`np.ndarray` or `torch.Tensor`):
|
||||
The image to unnormalize.
|
||||
mean (`float` or `Collection[float]`):
|
||||
The mean to use for unnormalization.
|
||||
std (`float` or `Collection[float]`):
|
||||
The standard deviation to use for unnormalization.
|
||||
data_format (`ChannelDimension`, *optional*):
|
||||
The channel dimension format of the output image. If unset, will use the inferred format from the input.
|
||||
input_data_format (`ChannelDimension`, *optional*):
|
||||
The channel dimension format of the input image. If unset, will use the inferred format from the input.
|
||||
|
||||
Returns:
|
||||
`np.ndarray`: The unnormalized image.
|
||||
"""
|
||||
is_torch_input = isinstance(image, torch.Tensor)
|
||||
if is_torch_input:
|
||||
image = image.detach().cpu().numpy()
|
||||
elif not isinstance(image, np.ndarray):
|
||||
raise TypeError("image must be a numpy array or a torch tensor")
|
||||
|
||||
if input_data_format is None:
|
||||
input_data_format = infer_channel_dimension_format(image)
|
||||
|
||||
if not np.issubdtype(image.dtype, np.floating):
|
||||
image = image.astype(np.float32)
|
||||
|
||||
channel_axis = get_channel_dimension_axis(image, input_data_format=input_data_format)
|
||||
num_channels = image.shape[channel_axis]
|
||||
|
||||
if isinstance(mean, Collection):
|
||||
if len(mean) != num_channels:
|
||||
raise ValueError(f"mean must have {num_channels} elements if it is an iterable, got {len(mean)}")
|
||||
else:
|
||||
mean = [mean] * num_channels
|
||||
mean = np.array(mean, dtype=image.dtype)
|
||||
|
||||
if isinstance(std, Collection):
|
||||
if len(std) != num_channels:
|
||||
raise ValueError(f"std must have {num_channels} elements if it is an iterable, got {len(std)}")
|
||||
else:
|
||||
std = [std] * num_channels
|
||||
std = np.array(std, dtype=image.dtype)
|
||||
|
||||
if input_data_format == ChannelDimension.LAST:
|
||||
image = image * std + mean
|
||||
else:
|
||||
shape = [1] * image.ndim
|
||||
shape[channel_axis] = num_channels
|
||||
mean = mean.reshape(shape)
|
||||
std = std.reshape(shape)
|
||||
image = image * std + mean
|
||||
|
||||
image = to_channel_dimension_format(image, data_format, input_data_format) if data_format is not None else image
|
||||
return image
|
||||
|
||||
|
||||
def center_crop(
|
||||
image: np.ndarray,
|
||||
size: tuple[int, int],
|
||||
|
@ -15,7 +15,11 @@
|
||||
# limitations under the License.
|
||||
"""ALBERT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
|
||||
|
||||
class AlbertConfig(PreTrainedConfig):
|
||||
@ -138,4 +142,21 @@ class AlbertConfig(PreTrainedConfig):
|
||||
self.classifier_dropout_prob = classifier_dropout_prob
|
||||
|
||||
|
||||
__all__ = ["AlbertConfig"]
|
||||
# Copied from transformers.models.bert.configuration_bert.BertOnnxConfig with Roberta->Albert
|
||||
class AlbertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["AlbertConfig", "AlbertOnnxConfig"]
|
||||
|
@ -121,7 +121,7 @@ else:
|
||||
("layoutlmv3", ("LayoutLMv3ImageProcessor", "LayoutLMv3ImageProcessorFast")),
|
||||
("levit", ("LevitImageProcessor", "LevitImageProcessorFast")),
|
||||
("lfm2_vl", (None, "Lfm2VlImageProcessorFast")),
|
||||
("lightglue", ("LightGlueImageProcessor", "LightGlueImageProcessorFast")),
|
||||
("lightglue", ("LightGlueImageProcessor", None)),
|
||||
("llama4", ("Llama4ImageProcessor", "Llama4ImageProcessorFast")),
|
||||
("llava", ("LlavaImageProcessor", "LlavaImageProcessorFast")),
|
||||
("llava_next", ("LlavaNextImageProcessor", "LlavaNextImageProcessorFast")),
|
||||
|
@ -1318,7 +1318,7 @@ class BarkFineModel(BarkPreTrainedModel):
|
||||
output sound according to specific predefined voice.
|
||||
"""
|
||||
)
|
||||
class BarkModel(BarkPreTrainedModel, GenerationMixin):
|
||||
class BarkModel(BarkPreTrainedModel):
|
||||
config: BarkConfig
|
||||
|
||||
def __init__(self, config):
|
||||
|
@ -15,9 +15,15 @@
|
||||
"""BART model configuration"""
|
||||
|
||||
import warnings
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any
|
||||
|
||||
from ... import PreTrainedTokenizer
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...utils import logging
|
||||
from ...onnx import OnnxConfig, OnnxConfigWithPast, OnnxSeq2SeqConfigWithPast
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...utils import is_torch_available, logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
@ -174,4 +180,223 @@ class BartConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BartConfig"]
|
||||
class BartOnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
elif self.task == "causal-lm":
|
||||
# TODO: figure this case out.
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_inputs[f"past_key_values.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_inputs[f"past_key_values.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
("decoder_input_ids", {0: "batch", 1: "decoder_sequence"}),
|
||||
("decoder_attention_mask", {0: "batch", 1: "decoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_outputs = super().outputs
|
||||
else:
|
||||
common_outputs = super(OnnxConfigWithPast, self).outputs
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_outputs[f"present.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_outputs[f"present.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
return common_outputs
|
||||
|
||||
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
encoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
# Generate decoder inputs
|
||||
decoder_seq_length = seq_length if not self.use_past else 1
|
||||
decoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, decoder_seq_length, is_pair
|
||||
)
|
||||
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
|
||||
common_inputs = dict(**encoder_inputs, **decoder_inputs)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, encoder_seq_length = common_inputs["input_ids"].shape
|
||||
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
|
||||
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
|
||||
encoder_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
encoder_seq_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
decoder_past_length = decoder_seq_length + 3
|
||||
decoder_shape = (
|
||||
batch,
|
||||
num_decoder_attention_heads,
|
||||
decoder_past_length,
|
||||
self._config.hidden_size // num_decoder_attention_heads,
|
||||
)
|
||||
|
||||
common_inputs["decoder_attention_mask"] = torch.cat(
|
||||
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
|
||||
)
|
||||
|
||||
common_inputs["past_key_values"] = []
|
||||
# If the number of encoder and decoder layers are present in the model configuration, both are considered
|
||||
num_encoder_layers, num_decoder_layers = self.num_layers
|
||||
min_num_layers = min(num_encoder_layers, num_decoder_layers)
|
||||
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
|
||||
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
|
||||
|
||||
for _ in range(min_num_layers):
|
||||
common_inputs["past_key_values"].append(
|
||||
(
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
)
|
||||
)
|
||||
# TODO: test this.
|
||||
shape = encoder_shape if remaining_side_name == "encoder" else decoder_shape
|
||||
for _ in range(min_num_layers, max_num_layers):
|
||||
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_causal_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
num_encoder_attention_heads, _ = self.num_attention_heads
|
||||
past_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
|
||||
mask_dtype = common_inputs["attention_mask"].dtype
|
||||
common_inputs["attention_mask"] = torch.cat(
|
||||
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
common_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(num_encoder_layers)
|
||||
]
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
# Copied from OnnxConfig.generate_dummy_inputs
|
||||
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
|
||||
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
|
||||
return common_inputs
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = self._generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
elif self.task == "causal-lm":
|
||||
common_inputs = self._generate_dummy_inputs_for_causal_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
else:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
flattened_output = super()._flatten_past_key_values_(flattened_output, name, idx, t)
|
||||
else:
|
||||
flattened_output = super(OnnxSeq2SeqConfigWithPast, self)._flatten_past_key_values_(
|
||||
flattened_output, name, idx, t
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BartConfig", "BartOnnxConfig"]
|
||||
|
@ -15,8 +15,13 @@
|
||||
"""BEiT model configuration"""
|
||||
|
||||
import warnings
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils.backbone_utils import BackboneConfigMixin, get_aligned_output_features_output_indices
|
||||
|
||||
|
||||
@ -204,4 +209,21 @@ class BeitConfig(BackboneConfigMixin, PreTrainedConfig):
|
||||
self.reshape_hidden_states = reshape_hidden_states
|
||||
|
||||
|
||||
__all__ = ["BeitConfig"]
|
||||
# Copied from transformers.models.vit.configuration_vit.ViTOnnxConfig
|
||||
class BeitOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["BeitConfig", "BeitOnnxConfig"]
|
||||
|
@ -15,7 +15,11 @@
|
||||
# limitations under the License.
|
||||
"""BERT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -123,4 +127,20 @@ class BertConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["BertConfig"]
|
||||
class BertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BertConfig", "BertOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""BigBird model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -154,4 +158,19 @@ class BigBirdConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["BigBirdConfig"]
|
||||
class BigBirdOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BigBirdConfig", "BigBirdOnnxConfig"]
|
||||
|
@ -14,8 +14,15 @@
|
||||
# limitations under the License.
|
||||
"""BigBirdPegasus model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any
|
||||
|
||||
from ... import PreTrainedTokenizer
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...utils import logging
|
||||
from ...onnx import OnnxConfig, OnnxConfigWithPast, OnnxSeq2SeqConfigWithPast
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...utils import is_torch_available, logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
@ -179,4 +186,224 @@ class BigBirdPegasusConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BigBirdPegasusConfig"]
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig with Bart->BigBirdPegasus
|
||||
class BigBirdPegasusOnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
elif self.task == "causal-lm":
|
||||
# TODO: figure this case out.
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_inputs[f"past_key_values.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_inputs[f"past_key_values.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
("decoder_input_ids", {0: "batch", 1: "decoder_sequence"}),
|
||||
("decoder_attention_mask", {0: "batch", 1: "decoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_outputs = super().outputs
|
||||
else:
|
||||
common_outputs = super(OnnxConfigWithPast, self).outputs
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_outputs[f"present.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_outputs[f"present.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
return common_outputs
|
||||
|
||||
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
encoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
# Generate decoder inputs
|
||||
decoder_seq_length = seq_length if not self.use_past else 1
|
||||
decoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, decoder_seq_length, is_pair
|
||||
)
|
||||
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
|
||||
common_inputs = dict(**encoder_inputs, **decoder_inputs)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, encoder_seq_length = common_inputs["input_ids"].shape
|
||||
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
|
||||
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
|
||||
encoder_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
encoder_seq_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
decoder_past_length = decoder_seq_length + 3
|
||||
decoder_shape = (
|
||||
batch,
|
||||
num_decoder_attention_heads,
|
||||
decoder_past_length,
|
||||
self._config.hidden_size // num_decoder_attention_heads,
|
||||
)
|
||||
|
||||
common_inputs["decoder_attention_mask"] = torch.cat(
|
||||
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
|
||||
)
|
||||
|
||||
common_inputs["past_key_values"] = []
|
||||
# If the number of encoder and decoder layers are present in the model configuration, both are considered
|
||||
num_encoder_layers, num_decoder_layers = self.num_layers
|
||||
min_num_layers = min(num_encoder_layers, num_decoder_layers)
|
||||
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
|
||||
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
|
||||
|
||||
for _ in range(min_num_layers):
|
||||
common_inputs["past_key_values"].append(
|
||||
(
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
)
|
||||
)
|
||||
# TODO: test this.
|
||||
shape = encoder_shape if remaining_side_name == "encoder" else decoder_shape
|
||||
for _ in range(min_num_layers, max_num_layers):
|
||||
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_causal_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
num_encoder_attention_heads, _ = self.num_attention_heads
|
||||
past_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
|
||||
mask_dtype = common_inputs["attention_mask"].dtype
|
||||
common_inputs["attention_mask"] = torch.cat(
|
||||
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
common_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(num_encoder_layers)
|
||||
]
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
# Copied from OnnxConfig.generate_dummy_inputs
|
||||
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
|
||||
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
|
||||
return common_inputs
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = self._generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
elif self.task == "causal-lm":
|
||||
common_inputs = self._generate_dummy_inputs_for_causal_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
else:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
flattened_output = super()._flatten_past_key_values_(flattened_output, name, idx, t)
|
||||
else:
|
||||
flattened_output = super(OnnxSeq2SeqConfigWithPast, self)._flatten_past_key_values_(
|
||||
flattened_output, name, idx, t
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BigBirdPegasusConfig", "BigBirdPegasusOnnxConfig"]
|
||||
|
@ -14,7 +14,15 @@
|
||||
# limitations under the License.
|
||||
"""Blenderbot model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any
|
||||
|
||||
from ... import PreTrainedTokenizer
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...file_utils import is_torch_available
|
||||
from ...onnx import OnnxConfig, OnnxConfigWithPast, OnnxSeq2SeqConfigWithPast
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -158,4 +166,227 @@ class BlenderbotConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BlenderbotConfig"]
|
||||
class BlenderbotOnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
if self.use_past:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
elif self.task == "causal-lm":
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
if self.use_past:
|
||||
_, num_decoder_layers = self.num_layers
|
||||
for i in range(num_decoder_layers):
|
||||
common_inputs[f"past_key_values.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_inputs[f"past_key_values.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
("decoder_input_ids", {0: "batch", 1: "decoder_sequence"}),
|
||||
("decoder_attention_mask", {0: "batch", 1: "decoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig.outputs
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_outputs = super().outputs
|
||||
else:
|
||||
common_outputs = super(OnnxConfigWithPast, self).outputs
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_outputs[f"present.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_outputs[f"present.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
return common_outputs
|
||||
|
||||
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
encoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
# Generate decoder inputs
|
||||
decoder_seq_length = seq_length if not self.use_past else 1
|
||||
decoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, decoder_seq_length, is_pair
|
||||
)
|
||||
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
|
||||
common_inputs = dict(**encoder_inputs, **decoder_inputs)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, encoder_seq_length = common_inputs["input_ids"].shape
|
||||
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
|
||||
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
|
||||
encoder_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
encoder_seq_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
decoder_past_length = decoder_seq_length
|
||||
decoder_shape = (
|
||||
batch,
|
||||
num_decoder_attention_heads,
|
||||
decoder_past_length,
|
||||
self._config.hidden_size // num_decoder_attention_heads,
|
||||
)
|
||||
common_inputs["decoder_attention_mask"] = torch.cat(
|
||||
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
|
||||
)
|
||||
common_inputs["past_key_values"] = []
|
||||
_, num_decoder_layers = self.num_layers
|
||||
|
||||
for _ in range(num_decoder_layers):
|
||||
common_inputs["past_key_values"].append(
|
||||
(
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
)
|
||||
)
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_causal_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
past_key_values_length = seqlen
|
||||
_, num_decoder_layers = self.num_layers
|
||||
num_encoder_attention_heads, _ = self.num_attention_heads
|
||||
past_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
mask_dtype = common_inputs["attention_mask"].dtype
|
||||
common_inputs["attention_mask"] = torch.cat(
|
||||
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
common_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(num_decoder_layers)
|
||||
]
|
||||
return common_inputs
|
||||
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig._generate_dummy_inputs_for_sequence_classification_and_question_answering
|
||||
def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
# Copied from OnnxConfig.generate_dummy_inputs
|
||||
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
|
||||
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
|
||||
return common_inputs
|
||||
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig.generate_dummy_inputs
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = self._generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
elif self.task == "causal-lm":
|
||||
common_inputs = self._generate_dummy_inputs_for_causal_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
else:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig._flatten_past_key_values_
|
||||
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
flattened_output = super()._flatten_past_key_values_(flattened_output, name, idx, t)
|
||||
else:
|
||||
flattened_output = super(OnnxSeq2SeqConfigWithPast, self)._flatten_past_key_values_(
|
||||
flattened_output, name, idx, t
|
||||
)
|
||||
|
||||
def fill_with_past_key_values_(self, inputs_or_outputs: Mapping[str, Mapping[int, str]], direction: str):
|
||||
if direction not in ["inputs", "outputs"]:
|
||||
raise ValueError(f'direction must either be "inputs" or "outputs", but {direction} was given')
|
||||
|
||||
name = "past_key_values" if direction == "inputs" else "present"
|
||||
_, num_decoder_layers = self.num_layers
|
||||
|
||||
encoder_sequence = "past_encoder_sequence"
|
||||
decoder_sequence = "past_decoder_sequence" if direction == "inputs" else "past_decoder_sequence + sequence"
|
||||
|
||||
for i in range(num_decoder_layers):
|
||||
inputs_or_outputs[f"{name}.{i}.decoder.key"] = {0: "batch", 2: decoder_sequence}
|
||||
inputs_or_outputs[f"{name}.{i}.decoder.value"] = {0: "batch", 2: decoder_sequence}
|
||||
inputs_or_outputs[f"{name}.{i}.encoder.key"] = {0: "batch", 2: encoder_sequence}
|
||||
inputs_or_outputs[f"{name}.{i}.encoder.value"] = {0: "batch", 2: encoder_sequence}
|
||||
|
||||
|
||||
__all__ = ["BlenderbotConfig", "BlenderbotOnnxConfig"]
|
||||
|
@ -14,7 +14,15 @@
|
||||
# limitations under the License.
|
||||
"""BlenderbotSmall model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any
|
||||
|
||||
from ... import PreTrainedTokenizer
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...file_utils import is_torch_available
|
||||
from ...onnx import OnnxConfig, OnnxConfigWithPast, OnnxSeq2SeqConfigWithPast
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -156,4 +164,224 @@ class BlenderbotSmallConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BlenderbotSmallConfig"]
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig with Bart->BlenderbotSmall
|
||||
class BlenderbotSmallOnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
elif self.task == "causal-lm":
|
||||
# TODO: figure this case out.
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_inputs[f"past_key_values.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_inputs[f"past_key_values.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
("decoder_input_ids", {0: "batch", 1: "decoder_sequence"}),
|
||||
("decoder_attention_mask", {0: "batch", 1: "decoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_outputs = super().outputs
|
||||
else:
|
||||
common_outputs = super(OnnxConfigWithPast, self).outputs
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_outputs[f"present.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_outputs[f"present.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
return common_outputs
|
||||
|
||||
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
encoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
# Generate decoder inputs
|
||||
decoder_seq_length = seq_length if not self.use_past else 1
|
||||
decoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, decoder_seq_length, is_pair
|
||||
)
|
||||
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
|
||||
common_inputs = dict(**encoder_inputs, **decoder_inputs)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, encoder_seq_length = common_inputs["input_ids"].shape
|
||||
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
|
||||
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
|
||||
encoder_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
encoder_seq_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
decoder_past_length = decoder_seq_length + 3
|
||||
decoder_shape = (
|
||||
batch,
|
||||
num_decoder_attention_heads,
|
||||
decoder_past_length,
|
||||
self._config.hidden_size // num_decoder_attention_heads,
|
||||
)
|
||||
|
||||
common_inputs["decoder_attention_mask"] = torch.cat(
|
||||
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
|
||||
)
|
||||
|
||||
common_inputs["past_key_values"] = []
|
||||
# If the number of encoder and decoder layers are present in the model configuration, both are considered
|
||||
num_encoder_layers, num_decoder_layers = self.num_layers
|
||||
min_num_layers = min(num_encoder_layers, num_decoder_layers)
|
||||
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
|
||||
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
|
||||
|
||||
for _ in range(min_num_layers):
|
||||
common_inputs["past_key_values"].append(
|
||||
(
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
)
|
||||
)
|
||||
# TODO: test this.
|
||||
shape = encoder_shape if remaining_side_name == "encoder" else decoder_shape
|
||||
for _ in range(min_num_layers, max_num_layers):
|
||||
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_causal_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
num_encoder_attention_heads, _ = self.num_attention_heads
|
||||
past_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
|
||||
mask_dtype = common_inputs["attention_mask"].dtype
|
||||
common_inputs["attention_mask"] = torch.cat(
|
||||
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
common_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(num_encoder_layers)
|
||||
]
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
# Copied from OnnxConfig.generate_dummy_inputs
|
||||
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
|
||||
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
|
||||
return common_inputs
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = self._generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
elif self.task == "causal-lm":
|
||||
common_inputs = self._generate_dummy_inputs_for_causal_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
else:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
flattened_output = super()._flatten_past_key_values_(flattened_output, name, idx, t)
|
||||
else:
|
||||
flattened_output = super(OnnxSeq2SeqConfigWithPast, self)._flatten_past_key_values_(
|
||||
flattened_output, name, idx, t
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BlenderbotSmallConfig", "BlenderbotSmallOnnxConfig"]
|
||||
|
@ -14,8 +14,19 @@
|
||||
# limitations under the License.
|
||||
"""Bloom configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any, Optional
|
||||
|
||||
from packaging import version
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ... import PreTrainedTokenizer
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...utils import logging
|
||||
from ...onnx import OnnxConfigWithPast, PatchingSpec
|
||||
from ...utils import is_torch_available, logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
@ -131,4 +142,99 @@ class BloomConfig(PreTrainedConfig):
|
||||
super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
|
||||
|
||||
|
||||
__all__ = ["BloomConfig"]
|
||||
class BloomOnnxConfig(OnnxConfigWithPast):
|
||||
torch_onnx_minimum_version = version.parse("1.12")
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
config: PreTrainedConfig,
|
||||
task: str = "default",
|
||||
patching_specs: Optional[list[PatchingSpec]] = None,
|
||||
use_past: bool = False,
|
||||
):
|
||||
super().__init__(config, task=task, patching_specs=patching_specs, use_past=use_past)
|
||||
if not getattr(self._config, "pad_token_id", None):
|
||||
# TODO: how to do that better?
|
||||
self._config.pad_token_id = 0
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = OrderedDict({"input_ids": {0: "batch", 1: "sequence"}})
|
||||
if self.use_past:
|
||||
# BLOOM stores values on dynamic axis 2. For more details see: https://github.com/huggingface/transformers/pull/18344
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs", inverted_values_shape=True)
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "sequence"}
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def num_layers(self) -> int:
|
||||
return self._config.n_layer
|
||||
|
||||
@property
|
||||
def num_attention_heads(self) -> int:
|
||||
return self._config.n_head
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-3
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: "PreTrainedTokenizer",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = super(OnnxConfigWithPast, self).generate_dummy_inputs(
|
||||
tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
|
||||
# We need to order the input in the way they appears in the forward()
|
||||
ordered_inputs = OrderedDict({"input_ids": common_inputs["input_ids"]})
|
||||
|
||||
# Need to add the past_keys
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
head_dim = self._config.hidden_size // self.num_attention_heads
|
||||
past_key_shape = (
|
||||
batch * self.num_attention_heads,
|
||||
head_dim,
|
||||
past_key_values_length,
|
||||
)
|
||||
past_value_shape = (
|
||||
batch * self.num_attention_heads,
|
||||
past_key_values_length,
|
||||
head_dim,
|
||||
)
|
||||
ordered_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_key_shape), torch.zeros(past_value_shape)) for _ in range(self.num_layers)
|
||||
]
|
||||
|
||||
ordered_inputs["attention_mask"] = common_inputs["attention_mask"]
|
||||
if self.use_past:
|
||||
mask_dtype = ordered_inputs["attention_mask"].dtype
|
||||
ordered_inputs["attention_mask"] = torch.cat(
|
||||
[ordered_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
|
||||
return ordered_inputs
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
|
||||
__all__ = ["BloomConfig", "BloomOnnxConfig"]
|
||||
|
@ -15,7 +15,11 @@
|
||||
# limitations under the License.
|
||||
"""CamemBERT configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -125,4 +129,19 @@ class CamembertConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["CamembertConfig"]
|
||||
class CamembertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["CamembertConfig", "CamembertOnnxConfig"]
|
||||
|
@ -14,7 +14,16 @@
|
||||
# limitations under the License.
|
||||
"""Chinese-CLIP model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ...processing_utils import ProcessorMixin
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -359,4 +368,52 @@ class ChineseCLIPConfig(PreTrainedConfig):
|
||||
super().__init__(**kwargs)
|
||||
|
||||
|
||||
__all__ = ["ChineseCLIPConfig", "ChineseCLIPTextConfig", "ChineseCLIPVisionConfig"]
|
||||
class ChineseCLIPOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
("attention_mask", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("logits_per_image", {0: "batch"}),
|
||||
("logits_per_text", {0: "batch"}),
|
||||
("text_embeds", {0: "batch"}),
|
||||
("image_embeds", {0: "batch"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
processor: "ProcessorMixin",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
) -> Mapping[str, Any]:
|
||||
text_input_dict = super().generate_dummy_inputs(
|
||||
processor.tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
)
|
||||
image_input_dict = super().generate_dummy_inputs(
|
||||
processor.image_processor,
|
||||
batch_size=batch_size,
|
||||
)
|
||||
return {**text_input_dict, **image_input_dict}
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 14
|
||||
|
||||
|
||||
__all__ = ["ChineseCLIPConfig", "ChineseCLIPOnnxConfig", "ChineseCLIPTextConfig", "ChineseCLIPVisionConfig"]
|
||||
|
@ -14,7 +14,16 @@
|
||||
# limitations under the License.
|
||||
"""CLIP model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ...processing_utils import ProcessorMixin
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -355,4 +364,52 @@ class CLIPConfig(PreTrainedConfig):
|
||||
super().__init__(**kwargs)
|
||||
|
||||
|
||||
__all__ = ["CLIPConfig", "CLIPTextConfig", "CLIPVisionConfig"]
|
||||
class CLIPOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
("attention_mask", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("logits_per_image", {0: "batch"}),
|
||||
("logits_per_text", {0: "batch"}),
|
||||
("text_embeds", {0: "batch"}),
|
||||
("image_embeds", {0: "batch"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
processor: "ProcessorMixin",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
) -> Mapping[str, Any]:
|
||||
text_input_dict = super().generate_dummy_inputs(
|
||||
processor.tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
)
|
||||
image_input_dict = super().generate_dummy_inputs(
|
||||
processor.image_processor,
|
||||
batch_size=batch_size,
|
||||
)
|
||||
return {**text_input_dict, **image_input_dict}
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 14
|
||||
|
||||
|
||||
__all__ = ["CLIPConfig", "CLIPOnnxConfig", "CLIPTextConfig", "CLIPVisionConfig"]
|
||||
|
@ -26,17 +26,7 @@ from ...modeling_attn_mask_utils import _create_4d_causal_attention_mask, _prepa
|
||||
from ...modeling_layers import GradientCheckpointingLayer
|
||||
from ...modeling_outputs import BaseModelOutput, BaseModelOutputWithPooling, ImageClassifierOutput
|
||||
from ...modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel
|
||||
from ...processing_utils import Unpack
|
||||
from ...utils import (
|
||||
ModelOutput,
|
||||
TransformersKwargs,
|
||||
auto_docstring,
|
||||
can_return_tuple,
|
||||
filter_out_non_signature_kwargs,
|
||||
logging,
|
||||
torch_int,
|
||||
)
|
||||
from ...utils.generic import check_model_inputs
|
||||
from ...utils import ModelOutput, auto_docstring, can_return_tuple, filter_out_non_signature_kwargs, logging, torch_int
|
||||
from .configuration_clip import CLIPConfig, CLIPTextConfig, CLIPVisionConfig
|
||||
|
||||
|
||||
@ -270,7 +260,8 @@ def eager_attention_forward(
|
||||
attention_mask: Optional[torch.Tensor],
|
||||
scaling: float,
|
||||
dropout: float = 0.0,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
output_attentions: bool = True,
|
||||
**kwargs,
|
||||
):
|
||||
attn_weights = torch.matmul(query, key.transpose(-1, -2)) * scaling
|
||||
if attention_mask is not None:
|
||||
@ -280,6 +271,8 @@ def eager_attention_forward(
|
||||
|
||||
attn_output = torch.matmul(attn_weights, value)
|
||||
attn_output = attn_output.transpose(1, 2).contiguous()
|
||||
if not output_attentions:
|
||||
attn_weights = None
|
||||
return attn_output, attn_weights
|
||||
|
||||
|
||||
@ -311,7 +304,7 @@ class CLIPAttention(nn.Module):
|
||||
hidden_states: torch.Tensor,
|
||||
attention_mask: Optional[torch.Tensor] = None,
|
||||
causal_attention_mask: Optional[torch.Tensor] = None,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
output_attentions: Optional[bool] = False,
|
||||
) -> tuple[torch.Tensor, Optional[torch.Tensor]]:
|
||||
"""Input shape: Batch x Time x Channel"""
|
||||
|
||||
@ -347,12 +340,14 @@ class CLIPAttention(nn.Module):
|
||||
is_causal=self.is_causal,
|
||||
scaling=self.scale,
|
||||
dropout=0.0 if not self.training else self.dropout,
|
||||
**kwargs,
|
||||
output_attentions=output_attentions,
|
||||
)
|
||||
|
||||
attn_output = attn_output.reshape(batch_size, seq_length, embed_dim).contiguous()
|
||||
attn_output = self.out_proj(attn_output)
|
||||
|
||||
if not output_attentions:
|
||||
attn_weights = None
|
||||
return attn_output, attn_weights
|
||||
|
||||
|
||||
@ -385,8 +380,18 @@ class CLIPEncoderLayer(GradientCheckpointingLayer):
|
||||
hidden_states: torch.Tensor,
|
||||
attention_mask: torch.Tensor,
|
||||
causal_attention_mask: torch.Tensor,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
) -> torch.FloatTensor:
|
||||
output_attentions: Optional[bool] = False,
|
||||
) -> tuple[torch.FloatTensor]:
|
||||
"""
|
||||
Args:
|
||||
hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
|
||||
attention_mask (`torch.FloatTensor`): attention mask of size
|
||||
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
|
||||
`(config.encoder_attention_heads,)`.
|
||||
output_attentions (`bool`, *optional*):
|
||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||
returned tensors for more detail.
|
||||
"""
|
||||
residual = hidden_states
|
||||
|
||||
hidden_states = self.layer_norm1(hidden_states)
|
||||
@ -394,7 +399,7 @@ class CLIPEncoderLayer(GradientCheckpointingLayer):
|
||||
hidden_states=hidden_states,
|
||||
attention_mask=attention_mask,
|
||||
causal_attention_mask=causal_attention_mask,
|
||||
**kwargs,
|
||||
output_attentions=output_attentions,
|
||||
)
|
||||
hidden_states = residual + hidden_states
|
||||
|
||||
@ -403,7 +408,12 @@ class CLIPEncoderLayer(GradientCheckpointingLayer):
|
||||
hidden_states = self.mlp(hidden_states)
|
||||
hidden_states = residual + hidden_states
|
||||
|
||||
return hidden_states
|
||||
outputs = (hidden_states,)
|
||||
|
||||
if output_attentions:
|
||||
outputs += (attn_weights,)
|
||||
|
||||
return outputs
|
||||
|
||||
|
||||
@auto_docstring
|
||||
@ -416,10 +426,6 @@ class CLIPPreTrainedModel(PreTrainedModel):
|
||||
_supports_flash_attn = True
|
||||
_supports_flex_attn = True
|
||||
_supports_attention_backend = True
|
||||
_can_record_outputs = {
|
||||
"hidden_states": CLIPEncoderLayer,
|
||||
"attentions": CLIPAttention,
|
||||
}
|
||||
|
||||
def _init_weights(self, module):
|
||||
"""Initialize the weights"""
|
||||
@ -498,7 +504,8 @@ class CLIPEncoder(nn.Module):
|
||||
inputs_embeds,
|
||||
attention_mask: Optional[torch.Tensor] = None,
|
||||
causal_attention_mask: Optional[torch.Tensor] = None,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
) -> BaseModelOutput:
|
||||
r"""
|
||||
Args:
|
||||
@ -520,18 +527,46 @@ class CLIPEncoder(nn.Module):
|
||||
- 0 for tokens that are **masked**.
|
||||
|
||||
[What are attention masks?](../glossary#attention-mask)
|
||||
output_attentions (`bool`, *optional*):
|
||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||
returned tensors for more detail.
|
||||
output_hidden_states (`bool`, *optional*):
|
||||
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
|
||||
for more detail.
|
||||
return_dict (`bool`, *optional*):
|
||||
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
|
||||
"""
|
||||
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
|
||||
output_hidden_states = (
|
||||
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
|
||||
)
|
||||
|
||||
encoder_states = () if output_hidden_states else None
|
||||
all_attentions = () if output_attentions else None
|
||||
|
||||
hidden_states = inputs_embeds
|
||||
for encoder_layer in self.layers:
|
||||
hidden_states = encoder_layer(
|
||||
for idx, encoder_layer in enumerate(self.layers):
|
||||
if output_hidden_states:
|
||||
encoder_states = encoder_states + (hidden_states,)
|
||||
layer_outputs = encoder_layer(
|
||||
hidden_states,
|
||||
attention_mask,
|
||||
causal_attention_mask,
|
||||
**kwargs,
|
||||
output_attentions=output_attentions,
|
||||
)
|
||||
|
||||
hidden_states = layer_outputs[0]
|
||||
|
||||
if output_attentions:
|
||||
all_attentions = all_attentions + (layer_outputs[1],)
|
||||
|
||||
if output_hidden_states:
|
||||
encoder_states = encoder_states + (hidden_states,)
|
||||
|
||||
return BaseModelOutput(
|
||||
last_hidden_state=hidden_states,
|
||||
hidden_states=encoder_states,
|
||||
attentions=all_attentions,
|
||||
)
|
||||
|
||||
|
||||
@ -553,8 +588,14 @@ class CLIPTextTransformer(nn.Module):
|
||||
input_ids: Optional[torch.Tensor] = None,
|
||||
attention_mask: Optional[torch.Tensor] = None,
|
||||
position_ids: Optional[torch.Tensor] = None,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
) -> BaseModelOutputWithPooling:
|
||||
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
|
||||
output_hidden_states = (
|
||||
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
|
||||
)
|
||||
|
||||
if input_ids is None:
|
||||
raise ValueError("You have to specify input_ids")
|
||||
|
||||
@ -563,18 +604,23 @@ class CLIPTextTransformer(nn.Module):
|
||||
|
||||
hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
|
||||
|
||||
# CLIP's text model uses causal mask, prepare it here.
|
||||
# https://github.com/openai/CLIP/blob/cfcffb90e69f37bf2ff1e988237a0fbe41f33c04/clip/model.py#L324
|
||||
causal_attention_mask = _create_4d_causal_attention_mask(
|
||||
input_shape, hidden_states.dtype, device=hidden_states.device
|
||||
)
|
||||
|
||||
# expand attention_mask
|
||||
if attention_mask is not None and self.config._attn_implementation != "flash_attention_2":
|
||||
# [batch_size, seq_len] -> [batch_size, 1, tgt_seq_len, src_seq_len]
|
||||
attention_mask = _prepare_4d_attention_mask(attention_mask, hidden_states.dtype)
|
||||
|
||||
encoder_outputs: BaseModelOutput = self.encoder(
|
||||
inputs_embeds=hidden_states,
|
||||
attention_mask=attention_mask,
|
||||
causal_attention_mask=causal_attention_mask,
|
||||
**kwargs,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
)
|
||||
|
||||
last_hidden_state = encoder_outputs.last_hidden_state
|
||||
@ -605,6 +651,8 @@ class CLIPTextTransformer(nn.Module):
|
||||
return BaseModelOutputWithPooling(
|
||||
last_hidden_state=last_hidden_state,
|
||||
pooler_output=pooled_output,
|
||||
hidden_states=encoder_outputs.hidden_states,
|
||||
attentions=encoder_outputs.attentions,
|
||||
)
|
||||
|
||||
|
||||
@ -632,7 +680,6 @@ class CLIPTextModel(CLIPPreTrainedModel):
|
||||
def set_input_embeddings(self, value):
|
||||
self.text_model.embeddings.token_embedding = value
|
||||
|
||||
@check_model_inputs()
|
||||
@can_return_tuple
|
||||
@auto_docstring
|
||||
def forward(
|
||||
@ -640,7 +687,8 @@ class CLIPTextModel(CLIPPreTrainedModel):
|
||||
input_ids: Optional[torch.Tensor] = None,
|
||||
attention_mask: Optional[torch.Tensor] = None,
|
||||
position_ids: Optional[torch.Tensor] = None,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
) -> BaseModelOutputWithPooling:
|
||||
r"""
|
||||
Examples:
|
||||
@ -662,7 +710,8 @@ class CLIPTextModel(CLIPPreTrainedModel):
|
||||
input_ids=input_ids,
|
||||
attention_mask=attention_mask,
|
||||
position_ids=position_ids,
|
||||
**kwargs,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
)
|
||||
|
||||
|
||||
@ -681,9 +730,15 @@ class CLIPVisionTransformer(nn.Module):
|
||||
def forward(
|
||||
self,
|
||||
pixel_values: Optional[torch.FloatTensor] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
interpolate_pos_encoding: Optional[bool] = False,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
) -> BaseModelOutputWithPooling:
|
||||
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
|
||||
output_hidden_states = (
|
||||
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
|
||||
)
|
||||
|
||||
if pixel_values is None:
|
||||
raise ValueError("You have to specify pixel_values")
|
||||
|
||||
@ -692,7 +747,8 @@ class CLIPVisionTransformer(nn.Module):
|
||||
|
||||
encoder_outputs: BaseModelOutput = self.encoder(
|
||||
inputs_embeds=hidden_states,
|
||||
**kwargs,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
)
|
||||
|
||||
last_hidden_state = encoder_outputs.last_hidden_state
|
||||
@ -702,6 +758,8 @@ class CLIPVisionTransformer(nn.Module):
|
||||
return BaseModelOutputWithPooling(
|
||||
last_hidden_state=last_hidden_state,
|
||||
pooler_output=pooled_output,
|
||||
hidden_states=encoder_outputs.hidden_states,
|
||||
attentions=encoder_outputs.attentions,
|
||||
)
|
||||
|
||||
|
||||
@ -725,14 +783,14 @@ class CLIPVisionModel(CLIPPreTrainedModel):
|
||||
def get_input_embeddings(self) -> nn.Module:
|
||||
return self.vision_model.embeddings.patch_embedding
|
||||
|
||||
@check_model_inputs(tie_last_hidden_states=False)
|
||||
@can_return_tuple
|
||||
@auto_docstring
|
||||
def forward(
|
||||
self,
|
||||
pixel_values: Optional[torch.FloatTensor] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
interpolate_pos_encoding: bool = False,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
) -> BaseModelOutputWithPooling:
|
||||
r"""
|
||||
Example:
|
||||
@ -757,8 +815,9 @@ class CLIPVisionModel(CLIPPreTrainedModel):
|
||||
|
||||
return self.vision_model(
|
||||
pixel_values=pixel_values,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
interpolate_pos_encoding=interpolate_pos_encoding,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
|
||||
@ -888,8 +947,9 @@ class CLIPModel(CLIPPreTrainedModel):
|
||||
attention_mask: Optional[torch.Tensor] = None,
|
||||
position_ids: Optional[torch.LongTensor] = None,
|
||||
return_loss: Optional[bool] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
interpolate_pos_encoding: bool = False,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
) -> CLIPOutput:
|
||||
r"""
|
||||
return_loss (`bool`, *optional*):
|
||||
@ -917,17 +977,25 @@ class CLIPModel(CLIPPreTrainedModel):
|
||||
>>> logits_per_image = outputs.logits_per_image # this is the image-text similarity score
|
||||
>>> probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
|
||||
```"""
|
||||
# Use CLIP model's config for some fields (if specified) instead of those of vision & text components.
|
||||
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
|
||||
output_hidden_states = (
|
||||
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
|
||||
)
|
||||
|
||||
vision_outputs: BaseModelOutputWithPooling = self.vision_model(
|
||||
pixel_values=pixel_values,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
interpolate_pos_encoding=interpolate_pos_encoding,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
text_outputs: BaseModelOutputWithPooling = self.text_model(
|
||||
input_ids=input_ids,
|
||||
attention_mask=attention_mask,
|
||||
position_ids=position_ids,
|
||||
**kwargs,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
)
|
||||
|
||||
image_embeds = vision_outputs.pooler_output
|
||||
@ -986,7 +1054,6 @@ class CLIPTextModelWithProjection(CLIPPreTrainedModel):
|
||||
def set_input_embeddings(self, value):
|
||||
self.text_model.embeddings.token_embedding = value
|
||||
|
||||
@check_model_inputs()
|
||||
@can_return_tuple
|
||||
@auto_docstring
|
||||
def forward(
|
||||
@ -994,7 +1061,8 @@ class CLIPTextModelWithProjection(CLIPPreTrainedModel):
|
||||
input_ids: Optional[torch.Tensor] = None,
|
||||
attention_mask: Optional[torch.Tensor] = None,
|
||||
position_ids: Optional[torch.Tensor] = None,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
) -> CLIPTextModelOutput:
|
||||
r"""
|
||||
Examples:
|
||||
@ -1017,7 +1085,8 @@ class CLIPTextModelWithProjection(CLIPPreTrainedModel):
|
||||
input_ids=input_ids,
|
||||
attention_mask=attention_mask,
|
||||
position_ids=position_ids,
|
||||
**kwargs,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
)
|
||||
pooled_output = text_outputs.pooler_output
|
||||
text_embeds = self.text_projection(pooled_output)
|
||||
@ -1025,6 +1094,8 @@ class CLIPTextModelWithProjection(CLIPPreTrainedModel):
|
||||
return CLIPTextModelOutput(
|
||||
text_embeds=text_embeds,
|
||||
last_hidden_state=text_outputs.last_hidden_state,
|
||||
hidden_states=text_outputs.hidden_states,
|
||||
attentions=text_outputs.attentions,
|
||||
)
|
||||
|
||||
|
||||
@ -1048,14 +1119,14 @@ class CLIPVisionModelWithProjection(CLIPPreTrainedModel):
|
||||
def get_input_embeddings(self) -> nn.Module:
|
||||
return self.vision_model.embeddings.patch_embedding
|
||||
|
||||
@check_model_inputs(tie_last_hidden_states=False)
|
||||
@can_return_tuple
|
||||
@auto_docstring
|
||||
def forward(
|
||||
self,
|
||||
pixel_values: Optional[torch.FloatTensor] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
interpolate_pos_encoding: bool = False,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
) -> CLIPVisionModelOutput:
|
||||
r"""
|
||||
Examples:
|
||||
@ -1080,8 +1151,9 @@ class CLIPVisionModelWithProjection(CLIPPreTrainedModel):
|
||||
|
||||
vision_outputs: BaseModelOutputWithPooling = self.vision_model(
|
||||
pixel_values=pixel_values,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
interpolate_pos_encoding=interpolate_pos_encoding,
|
||||
**kwargs,
|
||||
)
|
||||
pooled_output = vision_outputs.pooler_output
|
||||
image_embeds = self.visual_projection(pooled_output)
|
||||
@ -1089,6 +1161,8 @@ class CLIPVisionModelWithProjection(CLIPPreTrainedModel):
|
||||
return CLIPVisionModelOutput(
|
||||
image_embeds=image_embeds,
|
||||
last_hidden_state=vision_outputs.last_hidden_state,
|
||||
hidden_states=vision_outputs.hidden_states,
|
||||
attentions=vision_outputs.attentions,
|
||||
)
|
||||
|
||||
|
||||
@ -1117,14 +1191,14 @@ class CLIPForImageClassification(CLIPPreTrainedModel):
|
||||
# Initialize weights and apply final processing
|
||||
self.post_init()
|
||||
|
||||
@check_model_inputs()
|
||||
@can_return_tuple
|
||||
@auto_docstring
|
||||
def forward(
|
||||
self,
|
||||
pixel_values: Optional[torch.Tensor] = None,
|
||||
labels: Optional[torch.Tensor] = None,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
) -> ImageClassifierOutput:
|
||||
r"""
|
||||
labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
|
||||
@ -1132,14 +1206,22 @@ class CLIPForImageClassification(CLIPPreTrainedModel):
|
||||
config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
|
||||
`config.num_labels > 1` a classification loss is computed (Cross-Entropy).
|
||||
"""
|
||||
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
|
||||
output_hidden_states = (
|
||||
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
|
||||
)
|
||||
|
||||
outputs: BaseModelOutputWithPooling = self.vision_model(
|
||||
pixel_values,
|
||||
**kwargs,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
)
|
||||
|
||||
sequence_output = outputs.last_hidden_state
|
||||
|
||||
# average pool the patch tokens
|
||||
sequence_output = torch.mean(sequence_output[:, 1:, :], dim=1)
|
||||
# apply classifier
|
||||
logits = self.classifier(sequence_output)
|
||||
|
||||
loss = None
|
||||
@ -1149,6 +1231,8 @@ class CLIPForImageClassification(CLIPPreTrainedModel):
|
||||
return ImageClassifierOutput(
|
||||
loss=loss,
|
||||
logits=logits,
|
||||
hidden_states=outputs.hidden_states,
|
||||
attentions=outputs.attentions,
|
||||
)
|
||||
|
||||
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""CodeGen model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any, Optional
|
||||
|
||||
from ... import PreTrainedTokenizer, is_torch_available
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfigWithPast, PatchingSpec
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -140,4 +146,85 @@ class CodeGenConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["CodeGenConfig"]
|
||||
# Copied from transformers.models.gpt2.configuration_gpt2.GPT2OnnxConfig with GPT2->CodeGen
|
||||
class CodeGenOnnxConfig(OnnxConfigWithPast):
|
||||
def __init__(
|
||||
self,
|
||||
config: PreTrainedConfig,
|
||||
task: str = "default",
|
||||
patching_specs: Optional[list[PatchingSpec]] = None,
|
||||
use_past: bool = False,
|
||||
):
|
||||
super().__init__(config, task=task, patching_specs=patching_specs, use_past=use_past)
|
||||
if not getattr(self._config, "pad_token_id", None):
|
||||
# TODO: how to do that better?
|
||||
self._config.pad_token_id = 0
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = OrderedDict({"input_ids": {0: "batch", 1: "sequence"}})
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "sequence"}
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def num_layers(self) -> int:
|
||||
return self._config.n_layer
|
||||
|
||||
@property
|
||||
def num_attention_heads(self) -> int:
|
||||
return self._config.n_head
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = super(OnnxConfigWithPast, self).generate_dummy_inputs(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
# We need to order the input in the way they appears in the forward()
|
||||
ordered_inputs = OrderedDict({"input_ids": common_inputs["input_ids"]})
|
||||
|
||||
# Need to add the past_keys
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
past_shape = (
|
||||
batch,
|
||||
self.num_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // self.num_attention_heads,
|
||||
)
|
||||
ordered_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(self.num_layers)
|
||||
]
|
||||
|
||||
ordered_inputs["attention_mask"] = common_inputs["attention_mask"]
|
||||
if self.use_past:
|
||||
mask_dtype = ordered_inputs["attention_mask"].dtype
|
||||
ordered_inputs["attention_mask"] = torch.cat(
|
||||
[ordered_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
|
||||
return ordered_inputs
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
|
||||
__all__ = ["CodeGenConfig", "CodeGenOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""Conditional DETR model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
from ...utils.backbone_utils import verify_backbone_config_arguments
|
||||
from ..auto import CONFIG_MAPPING, AutoConfig
|
||||
@ -241,4 +247,25 @@ class ConditionalDetrConfig(PreTrainedConfig):
|
||||
super().__init__(is_encoder_decoder=is_encoder_decoder, **kwargs)
|
||||
|
||||
|
||||
__all__ = ["ConditionalDetrConfig"]
|
||||
class ConditionalDetrOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
("pixel_mask", {0: "batch"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-5
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
|
||||
__all__ = ["ConditionalDetrConfig", "ConditionalDetrOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""ConvBERT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -136,4 +140,21 @@ class ConvBertConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["ConvBertConfig"]
|
||||
# Copied from transformers.models.bert.configuration_bert.BertOnnxConfig
|
||||
class ConvBertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["ConvBertConfig", "ConvBertOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""ConvNeXT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
from ...utils.backbone_utils import BackboneConfigMixin, get_aligned_output_features_output_indices
|
||||
|
||||
@ -117,4 +123,20 @@ class ConvNextConfig(BackboneConfigMixin, PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["ConvNextConfig"]
|
||||
class ConvNextOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-5
|
||||
|
||||
|
||||
__all__ = ["ConvNextConfig", "ConvNextOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""Data2VecText configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -124,4 +128,19 @@ class Data2VecTextConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["Data2VecTextConfig"]
|
||||
class Data2VecTextOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["Data2VecTextConfig", "Data2VecTextOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""Data2VecVision model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -168,4 +174,21 @@ class Data2VecVisionConfig(PreTrainedConfig):
|
||||
self.semantic_loss_ignore_index = semantic_loss_ignore_index
|
||||
|
||||
|
||||
__all__ = ["Data2VecVisionConfig"]
|
||||
# Copied from transformers.models.vit.configuration_vit.ViTOnnxConfig
|
||||
class Data2VecVisionOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["Data2VecVisionConfig", "Data2VecVisionOnnxConfig"]
|
||||
|
@ -14,10 +14,19 @@
|
||||
# limitations under the License.
|
||||
"""DeBERTa model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any, Union
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ... import FeatureExtractionMixin, PreTrainedTokenizerBase
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
@ -150,4 +159,41 @@ class DebertaConfig(PreTrainedConfig):
|
||||
self.legacy = legacy
|
||||
|
||||
|
||||
__all__ = ["DebertaConfig"]
|
||||
# Copied from transformers.models.deberta_v2.configuration_deberta_v2.DebertaV2OnnxConfig
|
||||
class DebertaOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
if self._config.type_vocab_size > 0:
|
||||
return OrderedDict(
|
||||
[("input_ids", dynamic_axis), ("attention_mask", dynamic_axis), ("token_type_ids", dynamic_axis)]
|
||||
)
|
||||
else:
|
||||
return OrderedDict([("input_ids", dynamic_axis), ("attention_mask", dynamic_axis)])
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
preprocessor: Union["PreTrainedTokenizerBase", "FeatureExtractionMixin"],
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
num_choices: int = -1,
|
||||
is_pair: bool = False,
|
||||
num_channels: int = 3,
|
||||
image_width: int = 40,
|
||||
image_height: int = 40,
|
||||
tokenizer: "PreTrainedTokenizerBase" = None,
|
||||
) -> Mapping[str, Any]:
|
||||
dummy_inputs = super().generate_dummy_inputs(preprocessor=preprocessor)
|
||||
if self._config.type_vocab_size == 0 and "token_type_ids" in dummy_inputs:
|
||||
del dummy_inputs["token_type_ids"]
|
||||
return dummy_inputs
|
||||
|
||||
|
||||
__all__ = ["DebertaConfig", "DebertaOnnxConfig"]
|
||||
|
@ -14,10 +14,19 @@
|
||||
# limitations under the License.
|
||||
"""DeBERTa-v2 model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any, Union
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ... import FeatureExtractionMixin, PreTrainedTokenizerBase
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
@ -150,4 +159,40 @@ class DebertaV2Config(PreTrainedConfig):
|
||||
self.legacy = legacy
|
||||
|
||||
|
||||
__all__ = ["DebertaV2Config"]
|
||||
class DebertaV2OnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
if self._config.type_vocab_size > 0:
|
||||
return OrderedDict(
|
||||
[("input_ids", dynamic_axis), ("attention_mask", dynamic_axis), ("token_type_ids", dynamic_axis)]
|
||||
)
|
||||
else:
|
||||
return OrderedDict([("input_ids", dynamic_axis), ("attention_mask", dynamic_axis)])
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
preprocessor: Union["PreTrainedTokenizerBase", "FeatureExtractionMixin"],
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
num_choices: int = -1,
|
||||
is_pair: bool = False,
|
||||
num_channels: int = 3,
|
||||
image_width: int = 40,
|
||||
image_height: int = 40,
|
||||
tokenizer: "PreTrainedTokenizerBase" = None,
|
||||
) -> Mapping[str, Any]:
|
||||
dummy_inputs = super().generate_dummy_inputs(preprocessor=preprocessor)
|
||||
if self._config.type_vocab_size == 0 and "token_type_ids" in dummy_inputs:
|
||||
del dummy_inputs["token_type_ids"]
|
||||
return dummy_inputs
|
||||
|
||||
|
||||
__all__ = ["DebertaV2Config", "DebertaV2OnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""DeiT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -125,4 +131,20 @@ class DeiTConfig(PreTrainedConfig):
|
||||
self.pooler_act = pooler_act
|
||||
|
||||
|
||||
__all__ = ["DeiTConfig"]
|
||||
class DeiTOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["DeiTConfig", "DeiTOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""MEGA configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ....configuration_utils import PreTrainedConfig
|
||||
from ....onnx import OnnxConfig
|
||||
from ....utils import logging
|
||||
|
||||
|
||||
@ -221,4 +225,19 @@ class MegaConfig(PreTrainedConfig):
|
||||
self.num_attention_heads = 1 # not used but required by Hugging Face
|
||||
|
||||
|
||||
__all__ = ["MegaConfig"]
|
||||
class MegaOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["MegaConfig", "MegaOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""DETR model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
from ...utils.backbone_utils import verify_backbone_config_arguments
|
||||
from ..auto import CONFIG_MAPPING, AutoConfig
|
||||
@ -240,4 +246,25 @@ class DetrConfig(PreTrainedConfig):
|
||||
super().__init__(is_encoder_decoder=is_encoder_decoder, **kwargs)
|
||||
|
||||
|
||||
__all__ = ["DetrConfig"]
|
||||
class DetrOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
("pixel_mask", {0: "batch"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-5
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
|
||||
__all__ = ["DetrConfig", "DetrOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""DINOv2 model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
from ...utils.backbone_utils import BackboneConfigMixin, get_aligned_output_features_output_indices
|
||||
|
||||
@ -154,4 +160,20 @@ class Dinov2Config(BackboneConfigMixin, PreTrainedConfig):
|
||||
self.use_mask_token = use_mask_token
|
||||
|
||||
|
||||
__all__ = ["Dinov2Config"]
|
||||
class Dinov2OnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["Dinov2Config", "Dinov2OnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""DistilBERT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -119,4 +123,19 @@ class DistilBertConfig(PreTrainedConfig):
|
||||
super().__init__(**kwargs, pad_token_id=pad_token_id)
|
||||
|
||||
|
||||
__all__ = ["DistilBertConfig"]
|
||||
class DistilBertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["DistilBertConfig", "DistilBertOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""EfficientNet model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -144,4 +150,20 @@ class EfficientNetConfig(PreTrainedConfig):
|
||||
self.num_hidden_layers = sum(num_block_repeats) * 4
|
||||
|
||||
|
||||
__all__ = ["EfficientNetConfig"]
|
||||
class EfficientNetOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-5
|
||||
|
||||
|
||||
__all__ = ["EfficientNetConfig", "EfficientNetOnnxConfig"]
|
||||
|
@ -15,7 +15,11 @@
|
||||
# limitations under the License.
|
||||
"""ELECTRA model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -156,4 +160,20 @@ class ElectraConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["ElectraConfig"]
|
||||
class ElectraOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["ElectraConfig", "ElectraOnnxConfig"]
|
||||
|
@ -15,7 +15,11 @@
|
||||
# limitations under the License.
|
||||
"""ERNIE model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -131,4 +135,21 @@ class ErnieConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["ErnieConfig"]
|
||||
class ErnieOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
("task_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["ErnieConfig", "ErnieOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""Flaubert configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -213,4 +217,19 @@ class FlaubertConfig(PreTrainedConfig):
|
||||
super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, **kwargs)
|
||||
|
||||
|
||||
__all__ = ["FlaubertConfig"]
|
||||
class FlaubertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["FlaubertConfig", "FlaubertOnnxConfig"]
|
||||
|
@ -447,7 +447,7 @@ def convert_transformer_weights(
|
||||
return zip([], [])
|
||||
else:
|
||||
raise ValueError(f"Unexpected member, {prop}, in Embedder.")
|
||||
elif f"{_TRANSFORMER_EMBEDDER}/mm_" in path:
|
||||
elif path.startswith(f"{_TRANSFORMER_EMBEDDER}/mm"):
|
||||
if not _INCLUDE_VISION_ENCODER.value:
|
||||
return zip([], [])
|
||||
|
||||
@ -553,7 +553,7 @@ def convert(
|
||||
continue
|
||||
|
||||
path, weights = convert_siglip_weight(config=config.vision_config, paths=paths, weights=value)
|
||||
update_tree(f"model.{path}", weights, config.vision_config.dtype)
|
||||
update_tree(path, weights, config.vision_config.dtype)
|
||||
else:
|
||||
for path, weights in convert_transformer_weights(config=config.text_config, paths=paths, weights=value):
|
||||
if not _INCLUDE_VISION_ENCODER.value:
|
||||
|
@ -768,23 +768,15 @@ def token_type_ids_mask_function(
|
||||
# If it's 1 for both query and key/value, we are in an image block
|
||||
# NOTE: static cache shape goes beyond input seq length, while token_type_ids.shape[1] == input seq length
|
||||
# Since vmap doesn't support `if statement` we workaround it with `torch.where`
|
||||
safe_q_idx = torch.where(q_idx < token_type_ids.shape[1], q_idx, 0)
|
||||
safe_kv_idx = torch.where(kv_idx < token_type_ids.shape[1], kv_idx, 0)
|
||||
|
||||
token_type_ids_at_q_idx = token_type_ids[batch_idx, safe_q_idx]
|
||||
token_type_ids_at_q_idx = torch.where(q_idx < token_type_ids.shape[1], token_type_ids_at_q_idx, 0)
|
||||
|
||||
token_type_ids_at_kv_idx = token_type_ids[batch_idx, safe_kv_idx]
|
||||
safe_idx = torch.where(kv_idx < token_type_ids.shape[1], kv_idx, 0)
|
||||
token_type_ids_at_kv_idx = token_type_ids[batch_idx, safe_idx]
|
||||
token_type_ids_at_kv_idx = torch.where(kv_idx < token_type_ids.shape[1], token_type_ids_at_kv_idx, 0)
|
||||
|
||||
image_group_ids_at_q_idx = image_group_ids[batch_idx, safe_q_idx]
|
||||
image_group_ids_at_q_idx = torch.where(q_idx < image_group_ids.shape[1], image_group_ids_at_q_idx, -1)
|
||||
|
||||
image_group_ids_at_kv_idx = image_group_ids[batch_idx, safe_kv_idx]
|
||||
image_group_ids_at_kv_idx = image_group_ids[batch_idx, safe_idx]
|
||||
image_group_ids_at_kv_idx = torch.where(kv_idx < image_group_ids.shape[1], image_group_ids_at_kv_idx, -1)
|
||||
|
||||
is_image_block = (token_type_ids_at_q_idx == 1) & (token_type_ids_at_kv_idx == 1)
|
||||
same_image_block = image_group_ids_at_q_idx == image_group_ids_at_kv_idx
|
||||
is_image_block = (token_type_ids[batch_idx, q_idx] == 1) & (token_type_ids_at_kv_idx == 1)
|
||||
same_image_block = image_group_ids[batch_idx, q_idx] == image_group_ids_at_kv_idx
|
||||
|
||||
# This is bidirectional attention whenever we are dealing with image tokens
|
||||
return is_image_block & same_image_block
|
||||
|
@ -15,7 +15,13 @@
|
||||
# limitations under the License.
|
||||
"""OpenAI GPT-2 configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any, Optional
|
||||
|
||||
from ... import PreTrainedTokenizer, is_torch_available
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfigWithPast, PatchingSpec
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -184,4 +190,84 @@ class GPT2Config(PreTrainedConfig):
|
||||
super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
|
||||
|
||||
|
||||
__all__ = ["GPT2Config"]
|
||||
class GPT2OnnxConfig(OnnxConfigWithPast):
|
||||
def __init__(
|
||||
self,
|
||||
config: PreTrainedConfig,
|
||||
task: str = "default",
|
||||
patching_specs: Optional[list[PatchingSpec]] = None,
|
||||
use_past: bool = False,
|
||||
):
|
||||
super().__init__(config, task=task, patching_specs=patching_specs, use_past=use_past)
|
||||
if not getattr(self._config, "pad_token_id", None):
|
||||
# TODO: how to do that better?
|
||||
self._config.pad_token_id = 0
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = OrderedDict({"input_ids": {0: "batch", 1: "sequence"}})
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "sequence"}
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def num_layers(self) -> int:
|
||||
return self._config.n_layer
|
||||
|
||||
@property
|
||||
def num_attention_heads(self) -> int:
|
||||
return self._config.n_head
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = super(OnnxConfigWithPast, self).generate_dummy_inputs(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
# We need to order the input in the way they appears in the forward()
|
||||
ordered_inputs = OrderedDict({"input_ids": common_inputs["input_ids"]})
|
||||
|
||||
# Need to add the past_keys
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
past_shape = (
|
||||
batch,
|
||||
self.num_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // self.num_attention_heads,
|
||||
)
|
||||
ordered_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(self.num_layers)
|
||||
]
|
||||
|
||||
ordered_inputs["attention_mask"] = common_inputs["attention_mask"]
|
||||
if self.use_past:
|
||||
mask_dtype = ordered_inputs["attention_mask"].dtype
|
||||
ordered_inputs["attention_mask"] = torch.cat(
|
||||
[ordered_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
|
||||
return ordered_inputs
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
|
||||
__all__ = ["GPT2Config", "GPT2OnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""GPT Neo model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any
|
||||
|
||||
from ... import PreTrainedTokenizer, is_torch_available
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfigWithPast
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -199,4 +205,71 @@ def custom_get_block_length_and_num_blocks(seq_length, window_size):
|
||||
return largest_divisor, torch.div(seq_length, largest_divisor, rounding_mode="floor")
|
||||
|
||||
|
||||
__all__ = ["GPTNeoConfig"]
|
||||
class GPTNeoOnnxConfig(OnnxConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = OrderedDict({"input_ids": {0: "batch", 1: "sequence"}})
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "sequence"}
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def num_attention_heads(self) -> int:
|
||||
return self._config.num_heads
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = super(OnnxConfigWithPast, self).generate_dummy_inputs(
|
||||
tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
|
||||
# We need to order the input in the way they appears in the forward()
|
||||
ordered_inputs = OrderedDict({"input_ids": common_inputs["input_ids"]})
|
||||
|
||||
# Need to add the past_keys
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
past_shape = (
|
||||
batch,
|
||||
self.num_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // self.num_attention_heads,
|
||||
)
|
||||
ordered_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(self.num_layers)
|
||||
]
|
||||
|
||||
ordered_inputs["attention_mask"] = common_inputs["attention_mask"]
|
||||
if self.use_past:
|
||||
mask_dtype = ordered_inputs["attention_mask"].dtype
|
||||
ordered_inputs["attention_mask"] = torch.cat(
|
||||
[ordered_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
|
||||
return ordered_inputs
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
|
||||
__all__ = ["GPTNeoConfig", "GPTNeoOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""GPT-J model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any, Optional
|
||||
|
||||
from ... import PreTrainedTokenizer, is_torch_available
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfigWithPast, PatchingSpec
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -129,4 +135,85 @@ class GPTJConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["GPTJConfig"]
|
||||
# Copied from transformers.models.gpt2.configuration_gpt2.GPT2OnnxConfig
|
||||
class GPTJOnnxConfig(OnnxConfigWithPast):
|
||||
def __init__(
|
||||
self,
|
||||
config: PreTrainedConfig,
|
||||
task: str = "default",
|
||||
patching_specs: Optional[list[PatchingSpec]] = None,
|
||||
use_past: bool = False,
|
||||
):
|
||||
super().__init__(config, task=task, patching_specs=patching_specs, use_past=use_past)
|
||||
if not getattr(self._config, "pad_token_id", None):
|
||||
# TODO: how to do that better?
|
||||
self._config.pad_token_id = 0
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = OrderedDict({"input_ids": {0: "batch", 1: "sequence"}})
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "sequence"}
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def num_layers(self) -> int:
|
||||
return self._config.n_layer
|
||||
|
||||
@property
|
||||
def num_attention_heads(self) -> int:
|
||||
return self._config.n_head
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = super(OnnxConfigWithPast, self).generate_dummy_inputs(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
# We need to order the input in the way they appears in the forward()
|
||||
ordered_inputs = OrderedDict({"input_ids": common_inputs["input_ids"]})
|
||||
|
||||
# Need to add the past_keys
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
past_shape = (
|
||||
batch,
|
||||
self.num_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // self.num_attention_heads,
|
||||
)
|
||||
ordered_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(self.num_layers)
|
||||
]
|
||||
|
||||
ordered_inputs["attention_mask"] = common_inputs["attention_mask"]
|
||||
if self.use_past:
|
||||
mask_dtype = ordered_inputs["attention_mask"].dtype
|
||||
ordered_inputs["attention_mask"] = torch.cat(
|
||||
[ordered_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
|
||||
return ordered_inputs
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
|
||||
__all__ = ["GPTJConfig", "GPTJOnnxConfig"]
|
||||
|
@ -14,10 +14,19 @@
|
||||
# limitations under the License.
|
||||
"""GroupViT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ...processing_utils import ProcessorMixin
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
@ -351,4 +360,52 @@ class GroupViTConfig(PreTrainedConfig):
|
||||
super().__init__(**kwargs)
|
||||
|
||||
|
||||
__all__ = ["GroupViTConfig", "GroupViTTextConfig", "GroupViTVisionConfig"]
|
||||
class GroupViTOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
("attention_mask", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("logits_per_image", {0: "batch"}),
|
||||
("logits_per_text", {0: "batch"}),
|
||||
("text_embeds", {0: "batch"}),
|
||||
("image_embeds", {0: "batch"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
processor: "ProcessorMixin",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
) -> Mapping[str, Any]:
|
||||
text_input_dict = super().generate_dummy_inputs(
|
||||
processor.tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
)
|
||||
image_input_dict = super().generate_dummy_inputs(
|
||||
processor.image_processor,
|
||||
batch_size=batch_size,
|
||||
)
|
||||
return {**text_input_dict, **image_input_dict}
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 14
|
||||
|
||||
|
||||
__all__ = ["GroupViTConfig", "GroupViTOnnxConfig", "GroupViTTextConfig", "GroupViTVisionConfig"]
|
||||
|
@ -16,7 +16,11 @@
|
||||
# limitations under the License.
|
||||
"""I-BERT configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -112,4 +116,19 @@ class IBertConfig(PreTrainedConfig):
|
||||
self.force_dequant = force_dequant
|
||||
|
||||
|
||||
__all__ = ["IBertConfig"]
|
||||
class IBertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["IBertConfig", "IBertOnnxConfig"]
|
||||
|
@ -14,10 +14,18 @@
|
||||
# limitations under the License.
|
||||
"""OpenAI ImageGPT configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ... import FeatureExtractionMixin
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
@ -136,4 +144,54 @@ class ImageGPTConfig(PreTrainedConfig):
|
||||
super().__init__(tie_word_embeddings=tie_word_embeddings, **kwargs)
|
||||
|
||||
|
||||
__all__ = ["ImageGPTConfig"]
|
||||
class ImageGPTOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
preprocessor: "FeatureExtractionMixin",
|
||||
batch_size: int = 1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
num_channels: int = 3,
|
||||
image_width: int = 32,
|
||||
image_height: int = 32,
|
||||
) -> Mapping[str, Any]:
|
||||
"""
|
||||
Generate inputs to provide to the ONNX exporter.
|
||||
|
||||
Args:
|
||||
preprocessor ([`PreTrainedTokenizerBase`] or [`FeatureExtractionMixin`]):
|
||||
The preprocessor associated with this model configuration.
|
||||
batch_size (`int`, *optional*, defaults to -1):
|
||||
The batch size to export the model for (-1 means dynamic axis).
|
||||
num_choices (`int`, *optional*, defaults to -1):
|
||||
The number of candidate answers provided for multiple choice task (-1 means dynamic axis).
|
||||
seq_length (`int`, *optional*, defaults to -1):
|
||||
The sequence length to export the model for (-1 means dynamic axis).
|
||||
is_pair (`bool`, *optional*, defaults to `False`):
|
||||
Indicate if the input is a pair (sentence 1, sentence 2)
|
||||
num_channels (`int`, *optional*, defaults to 3):
|
||||
The number of channels of the generated images.
|
||||
image_width (`int`, *optional*, defaults to 40):
|
||||
The width of the generated images.
|
||||
image_height (`int`, *optional*, defaults to 40):
|
||||
The height of the generated images.
|
||||
|
||||
Returns:
|
||||
Mapping[str, Tensor] holding the kwargs to provide to the model's forward function
|
||||
"""
|
||||
|
||||
input_image = self._generate_dummy_images(batch_size, num_channels, image_height, image_width)
|
||||
inputs = dict(preprocessor(images=input_image, return_tensors="pt"))
|
||||
|
||||
return inputs
|
||||
|
||||
|
||||
__all__ = ["ImageGPTConfig", "ImageGPTOnnxConfig"]
|
||||
|
@ -14,8 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""LayoutLM model configuration"""
|
||||
|
||||
from ... import PreTrainedConfig
|
||||
from ...utils import logging
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any, Optional
|
||||
|
||||
from ... import PreTrainedConfig, PreTrainedTokenizer
|
||||
from ...onnx import OnnxConfig, PatchingSpec
|
||||
from ...utils import is_torch_available, logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
@ -122,4 +127,64 @@ class LayoutLMConfig(PreTrainedConfig):
|
||||
self.max_2d_position_embeddings = max_2d_position_embeddings
|
||||
|
||||
|
||||
__all__ = ["LayoutLMConfig"]
|
||||
class LayoutLMOnnxConfig(OnnxConfig):
|
||||
def __init__(
|
||||
self,
|
||||
config: PreTrainedConfig,
|
||||
task: str = "default",
|
||||
patching_specs: Optional[list[PatchingSpec]] = None,
|
||||
):
|
||||
super().__init__(config, task=task, patching_specs=patching_specs)
|
||||
self.max_2d_positions = config.max_2d_position_embeddings - 1
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
("bbox", {0: "batch", 1: "sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "sequence"}),
|
||||
("token_type_ids", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
"""
|
||||
Generate inputs to provide to the ONNX exporter
|
||||
|
||||
Args:
|
||||
tokenizer: The tokenizer associated with this model configuration
|
||||
batch_size: The batch size (int) to export the model for (-1 means dynamic axis)
|
||||
seq_length: The sequence length (int) to export the model for (-1 means dynamic axis)
|
||||
is_pair: Indicate if the input is a pair (sentence 1, sentence 2)
|
||||
|
||||
Returns:
|
||||
Mapping[str, Tensor] holding the kwargs to provide to the model's forward function
|
||||
"""
|
||||
|
||||
input_dict = super().generate_dummy_inputs(
|
||||
tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
|
||||
# Generate a dummy bbox
|
||||
box = [48, 84, 73, 128]
|
||||
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy inputs without PyTorch installed.")
|
||||
import torch
|
||||
|
||||
batch_size, seq_length = input_dict["input_ids"].shape
|
||||
input_dict["bbox"] = torch.tensor([*[box] * seq_length]).tile(batch_size, 1, 1)
|
||||
return input_dict
|
||||
|
||||
|
||||
__all__ = ["LayoutLMConfig", "LayoutLMOnnxConfig"]
|
||||
|
@ -14,10 +14,22 @@
|
||||
# limitations under the License.
|
||||
"""LayoutLMv3 model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ...processing_utils import ProcessorMixin
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
@ -175,4 +187,104 @@ class LayoutLMv3Config(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["LayoutLMv3Config"]
|
||||
class LayoutLMv3OnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.12")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
# The order of inputs is different for question answering and sequence classification
|
||||
if self.task in ["question-answering", "sequence-classification"]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "sequence"}),
|
||||
("bbox", {0: "batch", 1: "sequence"}),
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
else:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
("bbox", {0: "batch", 1: "sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "sequence"}),
|
||||
("pixel_values", {0: "batch", 1: "num_channels"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-5
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
processor: "ProcessorMixin",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
num_channels: int = 3,
|
||||
image_width: int = 40,
|
||||
image_height: int = 40,
|
||||
) -> Mapping[str, Any]:
|
||||
"""
|
||||
Generate inputs to provide to the ONNX exporter
|
||||
|
||||
Args:
|
||||
processor ([`ProcessorMixin`]):
|
||||
The processor associated with this model configuration.
|
||||
batch_size (`int`, *optional*, defaults to -1):
|
||||
The batch size to export the model for (-1 means dynamic axis).
|
||||
seq_length (`int`, *optional*, defaults to -1):
|
||||
The sequence length to export the model for (-1 means dynamic axis).
|
||||
is_pair (`bool`, *optional*, defaults to `False`):
|
||||
Indicate if the input is a pair (sentence 1, sentence 2).
|
||||
num_channels (`int`, *optional*, defaults to 3):
|
||||
The number of channels of the generated images.
|
||||
image_width (`int`, *optional*, defaults to 40):
|
||||
The width of the generated images.
|
||||
image_height (`int`, *optional*, defaults to 40):
|
||||
The height of the generated images.
|
||||
|
||||
Returns:
|
||||
Mapping[str, Any]: holding the kwargs to provide to the model's forward function
|
||||
"""
|
||||
|
||||
# A dummy image is used so OCR should not be applied
|
||||
setattr(processor.image_processor, "apply_ocr", False)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = processor.tokenizer.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_text = [[" ".join([processor.tokenizer.unk_token]) * seq_length]] * batch_size
|
||||
|
||||
# Generate dummy bounding boxes
|
||||
dummy_bboxes = [[[48, 84, 73, 128]]] * batch_size
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
# batch_size = compute_effective_axis_dimension(batch_size, fixed_dimension=OnnxConfig.default_fixed_batch)
|
||||
dummy_image = self._generate_dummy_images(batch_size, num_channels, image_height, image_width)
|
||||
|
||||
inputs = dict(
|
||||
processor(
|
||||
dummy_image,
|
||||
text=dummy_text,
|
||||
boxes=dummy_bboxes,
|
||||
return_tensors="pt",
|
||||
)
|
||||
)
|
||||
|
||||
return inputs
|
||||
|
||||
|
||||
__all__ = ["LayoutLMv3Config", "LayoutLMv3OnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""LeViT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -118,4 +124,21 @@ class LevitConfig(PreTrainedConfig):
|
||||
]
|
||||
|
||||
|
||||
__all__ = ["LevitConfig"]
|
||||
# Copied from transformers.models.vit.configuration_vit.ViTOnnxConfig
|
||||
class LevitOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["LevitConfig", "LevitOnnxConfig"]
|
||||
|
@ -20,7 +20,6 @@ from ...utils.import_utils import define_import_structure
|
||||
if TYPE_CHECKING:
|
||||
from .configuration_lightglue import *
|
||||
from .image_processing_lightglue import *
|
||||
from .image_processing_lightglue_fast import *
|
||||
from .modeling_lightglue import *
|
||||
else:
|
||||
import sys
|
||||
|
@ -17,6 +17,7 @@
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import warnings
|
||||
from typing import Optional, Union
|
||||
|
||||
import numpy as np
|
||||
@ -39,28 +40,20 @@ from ...image_utils import (
|
||||
valid_images,
|
||||
validate_preprocess_arguments,
|
||||
)
|
||||
from ...processing_utils import ImagesKwargs
|
||||
from ...utils import TensorType, logging, requires_backends
|
||||
from ...utils import TensorType, is_matplotlib_available, logging, requires_backends
|
||||
from ...utils.import_utils import requires
|
||||
from .modeling_lightglue import LightGlueKeypointMatchingOutput
|
||||
|
||||
|
||||
if is_vision_available():
|
||||
import PIL
|
||||
from PIL import Image, ImageDraw
|
||||
|
||||
if is_vision_available():
|
||||
import PIL
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
class LightGlueImageProcessorKwargs(ImagesKwargs, total=False):
|
||||
r"""
|
||||
do_grayscale (`bool`, *optional*, defaults to `True`):
|
||||
Whether to convert the image to grayscale. Can be overridden by `do_grayscale` in the `preprocess` method.
|
||||
"""
|
||||
|
||||
do_grayscale: bool
|
||||
|
||||
|
||||
def is_grayscale(
|
||||
image: np.ndarray,
|
||||
input_data_format: Optional[Union[str, ChannelDimension]] = None,
|
||||
@ -468,5 +461,60 @@ class LightGlueImageProcessor(BaseImageProcessor):
|
||||
b = 0
|
||||
return (r, g, b)
|
||||
|
||||
def plot_keypoint_matching(self, images: ImageInput, keypoint_matching_output: LightGlueKeypointMatchingOutput):
|
||||
"""
|
||||
Plots the image pairs side by side with the detected keypoints as well as the matching between them. Requires
|
||||
matplotlib to be installed.
|
||||
|
||||
.. deprecated::
|
||||
`plot_keypoint_matching` is deprecated and will be removed in a future version. Use `visualize_keypoint_matching` instead.
|
||||
|
||||
Args:
|
||||
images (`ImageInput`):
|
||||
Image pairs to plot. Same as `LightGlueImageProcessor.preprocess`. Expects either a list of 2 images or
|
||||
a list of list of 2 images list with pixel values ranging from 0 to 255.
|
||||
keypoint_matching_output ([`LightGlueKeypointMatchingOutput`]):
|
||||
Raw outputs of the model.
|
||||
"""
|
||||
warnings.warn(
|
||||
"`plot_keypoint_matching` is deprecated and will be removed in transformers v. "
|
||||
"Use `visualize_keypoint_matching` instead.",
|
||||
FutureWarning,
|
||||
)
|
||||
|
||||
if is_matplotlib_available():
|
||||
import matplotlib.pyplot as plt
|
||||
else:
|
||||
raise ImportError("Please install matplotlib to use `plot_keypoint_matching` method")
|
||||
|
||||
images = validate_and_format_image_pairs(images)
|
||||
images = [to_numpy_array(image) for image in images]
|
||||
image_pairs = [images[i : i + 2] for i in range(0, len(images), 2)]
|
||||
|
||||
for image_pair, pair_output in zip(image_pairs, keypoint_matching_output):
|
||||
height0, width0 = image_pair[0].shape[:2]
|
||||
height1, width1 = image_pair[1].shape[:2]
|
||||
plot_image = np.zeros((max(height0, height1), width0 + width1, 3))
|
||||
plot_image[:height0, :width0] = image_pair[0] / 255.0
|
||||
plot_image[:height1, width0:] = image_pair[1] / 255.0
|
||||
plt.imshow(plot_image)
|
||||
plt.axis("off")
|
||||
|
||||
keypoints0_x, keypoints0_y = pair_output["keypoints0"].unbind(1)
|
||||
keypoints1_x, keypoints1_y = pair_output["keypoints1"].unbind(1)
|
||||
for keypoint0_x, keypoint0_y, keypoint1_x, keypoint1_y, matching_score in zip(
|
||||
keypoints0_x, keypoints0_y, keypoints1_x, keypoints1_y, pair_output["matching_scores"]
|
||||
):
|
||||
plt.plot(
|
||||
[keypoint0_x, keypoint1_x + width0],
|
||||
[keypoint0_y, keypoint1_y],
|
||||
color=plt.get_cmap("RdYlGn")(matching_score.item()),
|
||||
alpha=0.9,
|
||||
linewidth=0.5,
|
||||
)
|
||||
plt.scatter(keypoint0_x, keypoint0_y, c="black", s=2)
|
||||
plt.scatter(keypoint1_x + width0, keypoint1_y, c="black", s=2)
|
||||
plt.show()
|
||||
|
||||
|
||||
__all__ = ["LightGlueImageProcessor"]
|
||||
|
@ -1,302 +0,0 @@
|
||||
# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
|
||||
# This file was automatically generated from src/transformers/models/lightglue/modular_lightglue.py.
|
||||
# Do NOT edit this file manually as any edits will be overwritten by the generation of
|
||||
# the file from the modular. If any change should be done, please apply the change to the
|
||||
# modular_lightglue.py file directly. One of our CI enforces this.
|
||||
# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
|
||||
# Copyright 2025 The HuggingFace Team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
from typing import Optional, Union
|
||||
|
||||
import torch
|
||||
from torchvision.transforms.v2 import functional as F
|
||||
|
||||
from ...image_processing_utils import BatchFeature
|
||||
from ...image_processing_utils_fast import BaseImageProcessorFast
|
||||
from ...image_transforms import group_images_by_shape, reorder_images
|
||||
from ...image_utils import (
|
||||
ImageInput,
|
||||
ImageType,
|
||||
PILImageResampling,
|
||||
SizeDict,
|
||||
get_image_type,
|
||||
is_pil_image,
|
||||
is_valid_image,
|
||||
is_vision_available,
|
||||
to_numpy_array,
|
||||
)
|
||||
from ...processing_utils import Unpack
|
||||
from ...utils import TensorType, auto_docstring
|
||||
from .image_processing_lightglue import LightGlueImageProcessorKwargs
|
||||
from .modeling_lightglue import LightGlueKeypointMatchingOutput
|
||||
|
||||
|
||||
if is_vision_available():
|
||||
from PIL import Image, ImageDraw
|
||||
|
||||
|
||||
def _is_valid_image(image):
|
||||
return is_pil_image(image) or (
|
||||
is_valid_image(image) and get_image_type(image) != ImageType.PIL and len(image.shape) == 3
|
||||
)
|
||||
|
||||
|
||||
def flatten_pair_images(images):
|
||||
# Handle the pair validation and flattening similar to slow processor
|
||||
if isinstance(images, list):
|
||||
if len(images) == 2 and all((_is_valid_image(image) or isinstance(image, torch.Tensor)) for image in images):
|
||||
# Single pair of images - keep as is, they'll be processed by the base class
|
||||
return images
|
||||
elif all(
|
||||
isinstance(image_pair, list)
|
||||
and len(image_pair) == 2
|
||||
and all(_is_valid_image(image) or isinstance(image, torch.Tensor) for image in image_pair)
|
||||
for image_pair in images
|
||||
):
|
||||
# Multiple pairs - flatten them
|
||||
images = [image for image_pair in images for image in image_pair]
|
||||
return images
|
||||
raise ValueError(
|
||||
"Input images must be a one of the following :",
|
||||
" - A pair of PIL images.",
|
||||
" - A pair of 3D arrays.",
|
||||
" - A list of pairs of PIL images.",
|
||||
" - A list of pairs of 3D arrays.",
|
||||
)
|
||||
|
||||
|
||||
def is_grayscale(
|
||||
image: "torch.Tensor",
|
||||
):
|
||||
"""Checks if an image is grayscale (all RGB channels are identical)."""
|
||||
if image.ndim < 3 or image.shape[0 if image.ndim == 3 else 1] == 1:
|
||||
return True
|
||||
return torch.all(image[..., 0, :, :] == image[..., 1, :, :]) and torch.all(
|
||||
image[..., 1, :, :] == image[..., 2, :, :]
|
||||
)
|
||||
|
||||
|
||||
def convert_to_grayscale(
|
||||
image: "torch.Tensor",
|
||||
) -> "torch.Tensor":
|
||||
"""
|
||||
Converts an image to grayscale format using the NTSC formula. Only support torch.Tensor.
|
||||
|
||||
This function is supposed to return a 1-channel image, but it returns a 3-channel image with the same value in each
|
||||
channel, because of an issue that is discussed in :
|
||||
https://github.com/huggingface/transformers/pull/25786#issuecomment-1730176446
|
||||
|
||||
Args:
|
||||
image (torch.Tensor):
|
||||
The image to convert.
|
||||
"""
|
||||
if is_grayscale(image):
|
||||
return image
|
||||
return F.rgb_to_grayscale(image, num_output_channels=3)
|
||||
|
||||
|
||||
@auto_docstring
|
||||
class LightGlueImageProcessorFast(BaseImageProcessorFast):
|
||||
resample = PILImageResampling.BILINEAR
|
||||
size = {"height": 480, "width": 640}
|
||||
default_to_square = False
|
||||
do_resize = True
|
||||
do_rescale = True
|
||||
rescale_factor = 1 / 255
|
||||
do_normalize = None
|
||||
valid_kwargs = LightGlueImageProcessorKwargs
|
||||
|
||||
def __init__(self, **kwargs: Unpack[LightGlueImageProcessorKwargs]):
|
||||
super().__init__(**kwargs)
|
||||
|
||||
@auto_docstring
|
||||
def preprocess(self, images: ImageInput, **kwargs: Unpack[LightGlueImageProcessorKwargs]) -> BatchFeature:
|
||||
return super().preprocess(images, **kwargs)
|
||||
|
||||
def _prepare_images_structure(
|
||||
self,
|
||||
images: ImageInput,
|
||||
**kwargs,
|
||||
) -> ImageInput:
|
||||
# we need to handle image pairs validation and flattening
|
||||
return flatten_pair_images(images)
|
||||
|
||||
def _preprocess(
|
||||
self,
|
||||
images: list["torch.Tensor"],
|
||||
size: Union[dict[str, int], SizeDict],
|
||||
rescale_factor: float,
|
||||
do_rescale: bool,
|
||||
do_resize: bool,
|
||||
interpolation: Optional["F.InterpolationMode"],
|
||||
do_grayscale: bool,
|
||||
disable_grouping: bool,
|
||||
return_tensors: Union[str, TensorType],
|
||||
**kwargs,
|
||||
) -> BatchFeature:
|
||||
grouped_images, grouped_images_index = group_images_by_shape(images, disable_grouping=disable_grouping)
|
||||
processed_images_grouped = {}
|
||||
|
||||
for shape, stacked_images in grouped_images.items():
|
||||
if do_resize:
|
||||
stacked_images = self.resize(stacked_images, size=size, interpolation=interpolation)
|
||||
processed_images_grouped[shape] = stacked_images
|
||||
resized_images = reorder_images(processed_images_grouped, grouped_images_index)
|
||||
|
||||
grouped_images, grouped_images_index = group_images_by_shape(resized_images, disable_grouping=disable_grouping)
|
||||
processed_images_grouped = {}
|
||||
for shape, stacked_images in grouped_images.items():
|
||||
if do_rescale:
|
||||
stacked_images = self.rescale(stacked_images, rescale_factor)
|
||||
if do_grayscale:
|
||||
stacked_images = convert_to_grayscale(stacked_images)
|
||||
processed_images_grouped[shape] = stacked_images
|
||||
|
||||
processed_images = reorder_images(processed_images_grouped, grouped_images_index)
|
||||
|
||||
# Convert back to pairs format
|
||||
image_pairs = [processed_images[i : i + 2] for i in range(0, len(processed_images), 2)]
|
||||
|
||||
# Stack each pair into a single tensor to match slow processor format
|
||||
stacked_pairs = [torch.stack(pair, dim=0) for pair in image_pairs]
|
||||
|
||||
# Return in same format as slow processor
|
||||
image_pairs = torch.stack(stacked_pairs, dim=0) if return_tensors else stacked_pairs
|
||||
|
||||
return BatchFeature(data={"pixel_values": image_pairs})
|
||||
|
||||
def post_process_keypoint_matching(
|
||||
self,
|
||||
outputs: LightGlueKeypointMatchingOutput,
|
||||
target_sizes: Union[TensorType, list[tuple]],
|
||||
threshold: float = 0.0,
|
||||
) -> list[dict[str, torch.Tensor]]:
|
||||
"""
|
||||
Converts the raw output of [`KeypointMatchingOutput`] into lists of keypoints, scores and descriptors
|
||||
with coordinates absolute to the original image sizes.
|
||||
Args:
|
||||
outputs ([`KeypointMatchingOutput`]):
|
||||
Raw outputs of the model.
|
||||
target_sizes (`torch.Tensor` or `List[Tuple[Tuple[int, int]]]`, *optional*):
|
||||
Tensor of shape `(batch_size, 2, 2)` or list of tuples of tuples (`Tuple[int, int]`) containing the
|
||||
target size `(height, width)` of each image in the batch. This must be the original image size (before
|
||||
any processing).
|
||||
threshold (`float`, *optional*, defaults to 0.0):
|
||||
Threshold to filter out the matches with low scores.
|
||||
Returns:
|
||||
`List[Dict]`: A list of dictionaries, each dictionary containing the keypoints in the first and second image
|
||||
of the pair, the matching scores and the matching indices.
|
||||
"""
|
||||
if outputs.matches.shape[0] != len(target_sizes):
|
||||
raise ValueError("Make sure that you pass in as many target sizes as the batch dimension of the mask")
|
||||
if not all(len(target_size) == 2 for target_size in target_sizes):
|
||||
raise ValueError("Each element of target_sizes must contain the size (h, w) of each image of the batch")
|
||||
|
||||
if isinstance(target_sizes, list):
|
||||
image_pair_sizes = torch.tensor(target_sizes, device=outputs.matches.device)
|
||||
else:
|
||||
if target_sizes.shape[1] != 2 or target_sizes.shape[2] != 2:
|
||||
raise ValueError(
|
||||
"Each element of target_sizes must contain the size (h, w) of each image of the batch"
|
||||
)
|
||||
image_pair_sizes = target_sizes
|
||||
|
||||
keypoints = outputs.keypoints.clone()
|
||||
keypoints = keypoints * image_pair_sizes.flip(-1).reshape(-1, 2, 1, 2)
|
||||
keypoints = keypoints.to(torch.int32)
|
||||
|
||||
results = []
|
||||
for keypoints_pair, matches, scores in zip(keypoints, outputs.matches, outputs.matching_scores):
|
||||
# Filter out matches with low scores
|
||||
valid_matches = torch.logical_and(scores > threshold, matches > -1)
|
||||
|
||||
matched_keypoints0 = keypoints_pair[0][valid_matches[0]]
|
||||
matched_keypoints1 = keypoints_pair[1][valid_matches[1]]
|
||||
matching_scores = scores[0][valid_matches[0]]
|
||||
|
||||
results.append(
|
||||
{
|
||||
"keypoints0": matched_keypoints0,
|
||||
"keypoints1": matched_keypoints1,
|
||||
"matching_scores": matching_scores,
|
||||
}
|
||||
)
|
||||
|
||||
return results
|
||||
|
||||
def visualize_keypoint_matching(
|
||||
self,
|
||||
images,
|
||||
keypoint_matching_output: list[dict[str, torch.Tensor]],
|
||||
) -> list["Image.Image"]:
|
||||
"""
|
||||
Plots the image pairs side by side with the detected keypoints as well as the matching between them.
|
||||
|
||||
Args:
|
||||
images:
|
||||
Image pairs to plot. Same as `EfficientLoFTRImageProcessor.preprocess`. Expects either a list of 2
|
||||
images or a list of list of 2 images list with pixel values ranging from 0 to 255.
|
||||
keypoint_matching_output (List[Dict[str, torch.Tensor]]]):
|
||||
A post processed keypoint matching output
|
||||
|
||||
Returns:
|
||||
`List[PIL.Image.Image]`: A list of PIL images, each containing the image pairs side by side with the detected
|
||||
keypoints as well as the matching between them.
|
||||
"""
|
||||
from .image_processing_lightglue import validate_and_format_image_pairs
|
||||
|
||||
images = validate_and_format_image_pairs(images)
|
||||
images = [to_numpy_array(image) for image in images]
|
||||
image_pairs = [images[i : i + 2] for i in range(0, len(images), 2)]
|
||||
|
||||
results = []
|
||||
for image_pair, pair_output in zip(image_pairs, keypoint_matching_output):
|
||||
height0, width0 = image_pair[0].shape[:2]
|
||||
height1, width1 = image_pair[1].shape[:2]
|
||||
plot_image = torch.zeros((max(height0, height1), width0 + width1, 3), dtype=torch.uint8)
|
||||
plot_image[:height0, :width0] = torch.from_numpy(image_pair[0])
|
||||
plot_image[:height1, width0:] = torch.from_numpy(image_pair[1])
|
||||
|
||||
plot_image_pil = Image.fromarray(plot_image.numpy())
|
||||
draw = ImageDraw.Draw(plot_image_pil)
|
||||
|
||||
keypoints0_x, keypoints0_y = pair_output["keypoints0"].unbind(1)
|
||||
keypoints1_x, keypoints1_y = pair_output["keypoints1"].unbind(1)
|
||||
for keypoint0_x, keypoint0_y, keypoint1_x, keypoint1_y, matching_score in zip(
|
||||
keypoints0_x, keypoints0_y, keypoints1_x, keypoints1_y, pair_output["matching_scores"]
|
||||
):
|
||||
color = self._get_color(matching_score)
|
||||
draw.line(
|
||||
(keypoint0_x, keypoint0_y, keypoint1_x + width0, keypoint1_y),
|
||||
fill=color,
|
||||
width=3,
|
||||
)
|
||||
draw.ellipse((keypoint0_x - 2, keypoint0_y - 2, keypoint0_x + 2, keypoint0_y + 2), fill="black")
|
||||
draw.ellipse(
|
||||
(keypoint1_x + width0 - 2, keypoint1_y - 2, keypoint1_x + width0 + 2, keypoint1_y + 2),
|
||||
fill="black",
|
||||
)
|
||||
|
||||
results.append(plot_image_pil)
|
||||
return results
|
||||
|
||||
def _get_color(self, score):
|
||||
"""Maps a score to a color."""
|
||||
r = int(255 * (1 - score))
|
||||
g = int(255 * score)
|
||||
b = 0
|
||||
return r, g, b
|
||||
|
||||
|
||||
__all__ = ["LightGlueImageProcessorFast"]
|
@ -11,6 +11,7 @@
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import warnings
|
||||
from collections.abc import Callable
|
||||
from dataclasses import dataclass
|
||||
from typing import Optional, Union
|
||||
@ -21,24 +22,25 @@ from torch import nn
|
||||
from torch.nn.utils.rnn import pad_sequence
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...image_utils import ImageInput, is_vision_available, to_numpy_array
|
||||
from ...modeling_flash_attention_utils import FlashAttentionKwargs
|
||||
from ...modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel
|
||||
from ...processing_utils import Unpack
|
||||
from ...utils import ModelOutput, TensorType, auto_docstring, logging
|
||||
from ...utils import ModelOutput, TensorType, auto_docstring, is_matplotlib_available, logging
|
||||
from ...utils.generic import can_return_tuple
|
||||
from ..auto import CONFIG_MAPPING, AutoConfig
|
||||
from ..auto.modeling_auto import AutoModelForKeypointDetection
|
||||
from ..clip.modeling_clip import CLIPMLP
|
||||
from ..cohere.modeling_cohere import apply_rotary_pos_emb
|
||||
from ..llama.modeling_llama import LlamaAttention, eager_attention_forward
|
||||
from ..superglue.image_processing_superglue import (
|
||||
SuperGlueImageProcessor,
|
||||
SuperGlueImageProcessorKwargs,
|
||||
)
|
||||
from ..superglue.image_processing_superglue_fast import SuperGlueImageProcessorFast
|
||||
from ..superglue.image_processing_superglue import SuperGlueImageProcessor, validate_and_format_image_pairs
|
||||
from ..superpoint import SuperPointConfig
|
||||
|
||||
|
||||
if is_vision_available():
|
||||
from PIL import Image, ImageDraw
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
@ -215,10 +217,6 @@ class LightGlueKeypointMatchingOutput(ModelOutput):
|
||||
attentions: Optional[tuple[torch.FloatTensor]] = None
|
||||
|
||||
|
||||
class LightGlueImageProcessorKwargs(SuperGlueImageProcessorKwargs):
|
||||
pass
|
||||
|
||||
|
||||
class LightGlueImageProcessor(SuperGlueImageProcessor):
|
||||
def post_process_keypoint_matching(
|
||||
self,
|
||||
@ -228,15 +226,123 @@ class LightGlueImageProcessor(SuperGlueImageProcessor):
|
||||
) -> list[dict[str, torch.Tensor]]:
|
||||
return super().post_process_keypoint_matching(outputs, target_sizes, threshold)
|
||||
|
||||
|
||||
class LightGlueImageProcessorFast(SuperGlueImageProcessorFast):
|
||||
def post_process_keypoint_matching(
|
||||
# Copied from transformers.models.efficientloftr.image_processing_efficientloftr.EfficientLoFTRImageProcessor.visualize_keypoint_matching with EfficientLoFTR->LightGlue
|
||||
def visualize_keypoint_matching(
|
||||
self,
|
||||
outputs: LightGlueKeypointMatchingOutput,
|
||||
target_sizes: Union[TensorType, list[tuple]],
|
||||
threshold: float = 0.0,
|
||||
) -> list[dict[str, torch.Tensor]]:
|
||||
return super().post_process_keypoint_matching(outputs, target_sizes, threshold)
|
||||
images: ImageInput,
|
||||
keypoint_matching_output: list[dict[str, torch.Tensor]],
|
||||
) -> list["Image.Image"]:
|
||||
"""
|
||||
Plots the image pairs side by side with the detected keypoints as well as the matching between them.
|
||||
|
||||
Args:
|
||||
images (`ImageInput`):
|
||||
Image pairs to plot. Same as `LightGlueImageProcessor.preprocess`. Expects either a list of 2
|
||||
images or a list of list of 2 images list with pixel values ranging from 0 to 255.
|
||||
keypoint_matching_output (List[Dict[str, torch.Tensor]]]):
|
||||
A post processed keypoint matching output
|
||||
|
||||
Returns:
|
||||
`List[PIL.Image.Image]`: A list of PIL images, each containing the image pairs side by side with the detected
|
||||
keypoints as well as the matching between them.
|
||||
"""
|
||||
images = validate_and_format_image_pairs(images)
|
||||
images = [to_numpy_array(image) for image in images]
|
||||
image_pairs = [images[i : i + 2] for i in range(0, len(images), 2)]
|
||||
|
||||
results = []
|
||||
for image_pair, pair_output in zip(image_pairs, keypoint_matching_output):
|
||||
height0, width0 = image_pair[0].shape[:2]
|
||||
height1, width1 = image_pair[1].shape[:2]
|
||||
plot_image = np.zeros((max(height0, height1), width0 + width1, 3), dtype=np.uint8)
|
||||
plot_image[:height0, :width0] = image_pair[0]
|
||||
plot_image[:height1, width0:] = image_pair[1]
|
||||
|
||||
plot_image_pil = Image.fromarray(plot_image)
|
||||
draw = ImageDraw.Draw(plot_image_pil)
|
||||
|
||||
keypoints0_x, keypoints0_y = pair_output["keypoints0"].unbind(1)
|
||||
keypoints1_x, keypoints1_y = pair_output["keypoints1"].unbind(1)
|
||||
for keypoint0_x, keypoint0_y, keypoint1_x, keypoint1_y, matching_score in zip(
|
||||
keypoints0_x, keypoints0_y, keypoints1_x, keypoints1_y, pair_output["matching_scores"]
|
||||
):
|
||||
color = self._get_color(matching_score)
|
||||
draw.line(
|
||||
(keypoint0_x, keypoint0_y, keypoint1_x + width0, keypoint1_y),
|
||||
fill=color,
|
||||
width=3,
|
||||
)
|
||||
draw.ellipse((keypoint0_x - 2, keypoint0_y - 2, keypoint0_x + 2, keypoint0_y + 2), fill="black")
|
||||
draw.ellipse(
|
||||
(keypoint1_x + width0 - 2, keypoint1_y - 2, keypoint1_x + width0 + 2, keypoint1_y + 2),
|
||||
fill="black",
|
||||
)
|
||||
|
||||
results.append(plot_image_pil)
|
||||
return results
|
||||
|
||||
# Copied from transformers.models.efficientloftr.image_processing_efficientloftr.EfficientLoFTRImageProcessor._get_color
|
||||
def _get_color(self, score):
|
||||
"""Maps a score to a color."""
|
||||
r = int(255 * (1 - score))
|
||||
g = int(255 * score)
|
||||
b = 0
|
||||
return (r, g, b)
|
||||
|
||||
def plot_keypoint_matching(self, images: ImageInput, keypoint_matching_output: LightGlueKeypointMatchingOutput):
|
||||
"""
|
||||
Plots the image pairs side by side with the detected keypoints as well as the matching between them. Requires
|
||||
matplotlib to be installed.
|
||||
|
||||
.. deprecated::
|
||||
`plot_keypoint_matching` is deprecated and will be removed in a future version. Use `visualize_keypoint_matching` instead.
|
||||
|
||||
Args:
|
||||
images (`ImageInput`):
|
||||
Image pairs to plot. Same as `LightGlueImageProcessor.preprocess`. Expects either a list of 2 images or
|
||||
a list of list of 2 images list with pixel values ranging from 0 to 255.
|
||||
keypoint_matching_output ([`LightGlueKeypointMatchingOutput`]):
|
||||
Raw outputs of the model.
|
||||
"""
|
||||
warnings.warn(
|
||||
"`plot_keypoint_matching` is deprecated and will be removed in transformers v. "
|
||||
"Use `visualize_keypoint_matching` instead.",
|
||||
FutureWarning,
|
||||
)
|
||||
|
||||
if is_matplotlib_available():
|
||||
import matplotlib.pyplot as plt
|
||||
else:
|
||||
raise ImportError("Please install matplotlib to use `plot_keypoint_matching` method")
|
||||
|
||||
images = validate_and_format_image_pairs(images)
|
||||
images = [to_numpy_array(image) for image in images]
|
||||
image_pairs = [images[i : i + 2] for i in range(0, len(images), 2)]
|
||||
|
||||
for image_pair, pair_output in zip(image_pairs, keypoint_matching_output):
|
||||
height0, width0 = image_pair[0].shape[:2]
|
||||
height1, width1 = image_pair[1].shape[:2]
|
||||
plot_image = np.zeros((max(height0, height1), width0 + width1, 3))
|
||||
plot_image[:height0, :width0] = image_pair[0] / 255.0
|
||||
plot_image[:height1, width0:] = image_pair[1] / 255.0
|
||||
plt.imshow(plot_image)
|
||||
plt.axis("off")
|
||||
|
||||
keypoints0_x, keypoints0_y = pair_output["keypoints0"].unbind(1)
|
||||
keypoints1_x, keypoints1_y = pair_output["keypoints1"].unbind(1)
|
||||
for keypoint0_x, keypoint0_y, keypoint1_x, keypoint1_y, matching_score in zip(
|
||||
keypoints0_x, keypoints0_y, keypoints1_x, keypoints1_y, pair_output["matching_scores"]
|
||||
):
|
||||
plt.plot(
|
||||
[keypoint0_x, keypoint1_x + width0],
|
||||
[keypoint0_y, keypoint1_y],
|
||||
color=plt.get_cmap("RdYlGn")(matching_score.item()),
|
||||
alpha=0.9,
|
||||
linewidth=0.5,
|
||||
)
|
||||
plt.scatter(keypoint0_x, keypoint0_y, c="black", s=2)
|
||||
plt.scatter(keypoint1_x + width0, keypoint1_y, c="black", s=2)
|
||||
plt.show()
|
||||
|
||||
|
||||
class LightGluePositionalEncoder(nn.Module):
|
||||
@ -975,10 +1081,4 @@ class LightGlueForKeypointMatching(LightGluePreTrainedModel):
|
||||
)
|
||||
|
||||
|
||||
__all__ = [
|
||||
"LightGluePreTrainedModel",
|
||||
"LightGlueForKeypointMatching",
|
||||
"LightGlueConfig",
|
||||
"LightGlueImageProcessor",
|
||||
"LightGlueImageProcessorFast",
|
||||
]
|
||||
__all__ = ["LightGluePreTrainedModel", "LightGlueForKeypointMatching", "LightGlueConfig", "LightGlueImageProcessor"]
|
||||
|
@ -14,12 +14,20 @@
|
||||
# limitations under the License.
|
||||
"""Longformer configuration"""
|
||||
|
||||
from typing import Union
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any, Optional, Union
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ...onnx.config import PatchingSpec
|
||||
from ...tokenization_utils_base import PreTrainedTokenizerBase
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
@ -131,4 +139,71 @@ class LongformerConfig(PreTrainedConfig):
|
||||
self.onnx_export = onnx_export
|
||||
|
||||
|
||||
__all__ = ["LongformerConfig"]
|
||||
class LongformerOnnxConfig(OnnxConfig):
|
||||
def __init__(
|
||||
self, config: "PreTrainedConfig", task: str = "default", patching_specs: "Optional[list[PatchingSpec]]" = None
|
||||
):
|
||||
super().__init__(config, task, patching_specs)
|
||||
config.onnx_export = True
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("global_attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
outputs = super().outputs
|
||||
if self.task == "default":
|
||||
outputs["pooler_output"] = {0: "batch"}
|
||||
return outputs
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
"""
|
||||
What absolute tolerance value to use during model conversion validation.
|
||||
|
||||
Returns:
|
||||
Float absolute tolerance value.
|
||||
"""
|
||||
return 1e-4
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
# needs to be >= 14 to support tril operator
|
||||
return max(super().default_onnx_opset, 14)
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: "PreTrainedTokenizerBase",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
inputs = super().generate_dummy_inputs(
|
||||
preprocessor=tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
import torch
|
||||
|
||||
# for some reason, replacing this code by inputs["global_attention_mask"] = torch.randint(2, inputs["input_ids"].shape, dtype=torch.int64)
|
||||
# makes the export fail randomly
|
||||
inputs["global_attention_mask"] = torch.zeros_like(inputs["input_ids"])
|
||||
# make every second token global
|
||||
inputs["global_attention_mask"][:, ::2] = 1
|
||||
|
||||
return inputs
|
||||
|
||||
|
||||
__all__ = ["LongformerConfig", "LongformerOnnxConfig"]
|
||||
|
@ -14,7 +14,10 @@
|
||||
# limitations under the License.
|
||||
"""LongT5 model configuration"""
|
||||
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxSeq2SeqConfigWithPast
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -149,4 +152,29 @@ class LongT5Config(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["LongT5Config"]
|
||||
class LongT5OnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = {
|
||||
"input_ids": {0: "batch", 1: "encoder_sequence"},
|
||||
"attention_mask": {0: "batch", 1: "encoder_sequence"},
|
||||
}
|
||||
if self.use_past:
|
||||
common_inputs["attention_mask"][1] = "past_encoder_sequence + sequence"
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
|
||||
__all__ = ["LongT5Config", "LongT5OnnxConfig"]
|
||||
|
@ -14,8 +14,15 @@
|
||||
# limitations under the License.
|
||||
"""M2M100 model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any
|
||||
|
||||
from ... import PreTrainedTokenizer
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...utils import logging
|
||||
from ...onnx import OnnxConfig, OnnxSeq2SeqConfigWithPast
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...utils import is_torch_available, logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
@ -151,4 +158,125 @@ class M2M100Config(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["M2M100Config"]
|
||||
class M2M100OnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
return common_inputs
|
||||
|
||||
# Copied from BartOnnxConfig._generate_dummy_inputs_for_sequence_classification_and_question_answering
|
||||
# A better name would be _generate_dummy_inputs_for_encoder_and_decoder because sequence classification and question
|
||||
# answering are not supported for M2M100, but this name is preserved to be able to check that the copy matches what
|
||||
# was done for BART so that it can be updated if need be.
|
||||
def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
# Copied from OnnxConfig.generate_dummy_inputs
|
||||
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
|
||||
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
|
||||
return common_inputs
|
||||
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig._generate_dummy_inputs_for_default_and_seq2seq_lm
|
||||
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
encoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
# Generate decoder inputs
|
||||
decoder_seq_length = seq_length if not self.use_past else 1
|
||||
decoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, decoder_seq_length, is_pair
|
||||
)
|
||||
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
|
||||
common_inputs = dict(**encoder_inputs, **decoder_inputs)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, encoder_seq_length = common_inputs["input_ids"].shape
|
||||
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
|
||||
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
|
||||
encoder_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
encoder_seq_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
decoder_past_length = decoder_seq_length + 3
|
||||
decoder_shape = (
|
||||
batch,
|
||||
num_decoder_attention_heads,
|
||||
decoder_past_length,
|
||||
self._config.hidden_size // num_decoder_attention_heads,
|
||||
)
|
||||
|
||||
common_inputs["decoder_attention_mask"] = torch.cat(
|
||||
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
|
||||
)
|
||||
|
||||
common_inputs["past_key_values"] = []
|
||||
# If the number of encoder and decoder layers are present in the model configuration, both are considered
|
||||
num_encoder_layers, num_decoder_layers = self.num_layers
|
||||
min_num_layers = min(num_encoder_layers, num_decoder_layers)
|
||||
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
|
||||
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
|
||||
|
||||
for _ in range(min_num_layers):
|
||||
common_inputs["past_key_values"].append(
|
||||
(
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
)
|
||||
)
|
||||
# TODO: test this.
|
||||
shape = encoder_shape if remaining_side_name == "encoder" else decoder_shape
|
||||
for _ in range(min_num_layers, max_num_layers):
|
||||
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
|
||||
return common_inputs
|
||||
|
||||
generate_dummy_inputs = _generate_dummy_inputs_for_default_and_seq2seq_lm
|
||||
|
||||
|
||||
__all__ = ["M2M100Config", "M2M100OnnxConfig"]
|
||||
|
@ -14,8 +14,15 @@
|
||||
# limitations under the License.
|
||||
"""Marian model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any
|
||||
|
||||
from ... import PreTrainedTokenizer
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...utils import logging
|
||||
from ...onnx import OnnxConfig, OnnxConfigWithPast, OnnxSeq2SeqConfigWithPast
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...utils import is_torch_available, logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
@ -157,4 +164,243 @@ class MarianConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["MarianConfig"]
|
||||
class MarianOnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig.inputs
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
elif self.task == "causal-lm":
|
||||
# TODO: figure this case out.
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_inputs[f"past_key_values.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_inputs[f"past_key_values.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
("decoder_input_ids", {0: "batch", 1: "decoder_sequence"}),
|
||||
("decoder_attention_mask", {0: "batch", 1: "decoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig.outputs
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_outputs = super().outputs
|
||||
else:
|
||||
common_outputs = super(OnnxConfigWithPast, self).outputs
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_outputs[f"present.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_outputs[f"present.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
return common_outputs
|
||||
|
||||
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
encoder_inputs = self._generate_dummy_inputs_for_encoder_and_decoder(
|
||||
tokenizer,
|
||||
batch_size,
|
||||
seq_length,
|
||||
is_pair,
|
||||
)
|
||||
|
||||
# Generate decoder inputs
|
||||
decoder_seq_length = seq_length if not self.use_past else 1
|
||||
decoder_inputs = self._generate_dummy_inputs_for_encoder_and_decoder(
|
||||
tokenizer,
|
||||
batch_size,
|
||||
decoder_seq_length,
|
||||
is_pair,
|
||||
)
|
||||
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
|
||||
common_inputs = dict(**encoder_inputs, **decoder_inputs)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, encoder_seq_length = common_inputs["input_ids"].shape
|
||||
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
|
||||
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
|
||||
encoder_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
encoder_seq_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
decoder_past_length = decoder_seq_length + 3
|
||||
decoder_shape = (
|
||||
batch,
|
||||
num_decoder_attention_heads,
|
||||
decoder_past_length,
|
||||
self._config.hidden_size // num_decoder_attention_heads,
|
||||
)
|
||||
|
||||
common_inputs["decoder_attention_mask"] = torch.cat(
|
||||
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
|
||||
)
|
||||
|
||||
common_inputs["past_key_values"] = []
|
||||
# If the number of encoder and decoder layers are present in the model configuration, both are considered
|
||||
num_encoder_layers, num_decoder_layers = self.num_layers
|
||||
min_num_layers = min(num_encoder_layers, num_decoder_layers)
|
||||
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
|
||||
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
|
||||
|
||||
for _ in range(min_num_layers):
|
||||
common_inputs["past_key_values"].append(
|
||||
(
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
)
|
||||
)
|
||||
# TODO: test this.
|
||||
shape = encoder_shape if remaining_side_name == "encoder" else decoder_shape
|
||||
for _ in range(min_num_layers, max_num_layers):
|
||||
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_causal_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = self._generate_dummy_inputs_for_encoder_and_decoder(
|
||||
tokenizer,
|
||||
batch_size,
|
||||
seq_length,
|
||||
is_pair,
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
num_encoder_attention_heads, _ = self.num_attention_heads
|
||||
past_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
|
||||
mask_dtype = common_inputs["attention_mask"].dtype
|
||||
common_inputs["attention_mask"] = torch.cat(
|
||||
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
common_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(num_encoder_layers)
|
||||
]
|
||||
return common_inputs
|
||||
|
||||
# Copied from BartOnnxConfig._generate_dummy_inputs_for_sequence_classification_and_question_answering
|
||||
# We renamed this function because Marian models do not have a sequence classification or question answering head
|
||||
def _generate_dummy_inputs_for_encoder_and_decoder(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
# Copied from OnnxConfig.generate_dummy_inputs
|
||||
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
|
||||
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
|
||||
return common_inputs
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = self._generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
|
||||
else:
|
||||
common_inputs = self._generate_dummy_inputs_for_causal_lm(
|
||||
tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig._flatten_past_key_values_
|
||||
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
flattened_output = super()._flatten_past_key_values_(flattened_output, name, idx, t)
|
||||
else:
|
||||
flattened_output = super(OnnxSeq2SeqConfigWithPast, self)._flatten_past_key_values_(
|
||||
flattened_output, name, idx, t
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["MarianConfig", "MarianOnnxConfig"]
|
||||
|
@ -14,8 +14,15 @@
|
||||
# limitations under the License.
|
||||
"""MBART model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any
|
||||
|
||||
from ... import PreTrainedTokenizer
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...utils import logging
|
||||
from ...onnx import OnnxConfig, OnnxConfigWithPast, OnnxSeq2SeqConfigWithPast
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...utils import is_torch_available, logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
@ -157,4 +164,224 @@ class MBartConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["MBartConfig"]
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig with Bart->MBart
|
||||
class MBartOnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
elif self.task == "causal-lm":
|
||||
# TODO: figure this case out.
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_inputs[f"past_key_values.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_inputs[f"past_key_values.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
("decoder_input_ids", {0: "batch", 1: "decoder_sequence"}),
|
||||
("decoder_attention_mask", {0: "batch", 1: "decoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_outputs = super().outputs
|
||||
else:
|
||||
common_outputs = super(OnnxConfigWithPast, self).outputs
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_outputs[f"present.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_outputs[f"present.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
return common_outputs
|
||||
|
||||
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
encoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
# Generate decoder inputs
|
||||
decoder_seq_length = seq_length if not self.use_past else 1
|
||||
decoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, decoder_seq_length, is_pair
|
||||
)
|
||||
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
|
||||
common_inputs = dict(**encoder_inputs, **decoder_inputs)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, encoder_seq_length = common_inputs["input_ids"].shape
|
||||
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
|
||||
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
|
||||
encoder_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
encoder_seq_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
decoder_past_length = decoder_seq_length + 3
|
||||
decoder_shape = (
|
||||
batch,
|
||||
num_decoder_attention_heads,
|
||||
decoder_past_length,
|
||||
self._config.hidden_size // num_decoder_attention_heads,
|
||||
)
|
||||
|
||||
common_inputs["decoder_attention_mask"] = torch.cat(
|
||||
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
|
||||
)
|
||||
|
||||
common_inputs["past_key_values"] = []
|
||||
# If the number of encoder and decoder layers are present in the model configuration, both are considered
|
||||
num_encoder_layers, num_decoder_layers = self.num_layers
|
||||
min_num_layers = min(num_encoder_layers, num_decoder_layers)
|
||||
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
|
||||
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
|
||||
|
||||
for _ in range(min_num_layers):
|
||||
common_inputs["past_key_values"].append(
|
||||
(
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
)
|
||||
)
|
||||
# TODO: test this.
|
||||
shape = encoder_shape if remaining_side_name == "encoder" else decoder_shape
|
||||
for _ in range(min_num_layers, max_num_layers):
|
||||
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_causal_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
num_encoder_attention_heads, _ = self.num_attention_heads
|
||||
past_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
|
||||
mask_dtype = common_inputs["attention_mask"].dtype
|
||||
common_inputs["attention_mask"] = torch.cat(
|
||||
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
common_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(num_encoder_layers)
|
||||
]
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
# Copied from OnnxConfig.generate_dummy_inputs
|
||||
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
|
||||
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
|
||||
return common_inputs
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = self._generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
elif self.task == "causal-lm":
|
||||
common_inputs = self._generate_dummy_inputs_for_causal_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
else:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
flattened_output = super()._flatten_past_key_values_(flattened_output, name, idx, t)
|
||||
else:
|
||||
flattened_output = super(OnnxSeq2SeqConfigWithPast, self)._flatten_past_key_values_(
|
||||
flattened_output, name, idx, t
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["MBartConfig", "MBartOnnxConfig"]
|
||||
|
@ -4,6 +4,7 @@
|
||||
# the file from the modular. If any change should be done, please apply the change to the
|
||||
# modular_metaclip_2.py file directly. One of our CI enforces this.
|
||||
# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
@ -160,7 +160,8 @@ def eager_attention_forward(
|
||||
attention_mask: Optional[torch.Tensor],
|
||||
scaling: float,
|
||||
dropout: float = 0.0,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
output_attentions: bool = True,
|
||||
**kwargs,
|
||||
):
|
||||
attn_weights = torch.matmul(query, key.transpose(-1, -2)) * scaling
|
||||
if attention_mask is not None:
|
||||
@ -170,6 +171,8 @@ def eager_attention_forward(
|
||||
|
||||
attn_output = torch.matmul(attn_weights, value)
|
||||
attn_output = attn_output.transpose(1, 2).contiguous()
|
||||
if not output_attentions:
|
||||
attn_weights = None
|
||||
return attn_output, attn_weights
|
||||
|
||||
|
||||
@ -201,7 +204,7 @@ class MetaClip2Attention(nn.Module):
|
||||
hidden_states: torch.Tensor,
|
||||
attention_mask: Optional[torch.Tensor] = None,
|
||||
causal_attention_mask: Optional[torch.Tensor] = None,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
output_attentions: Optional[bool] = False,
|
||||
) -> tuple[torch.Tensor, Optional[torch.Tensor]]:
|
||||
"""Input shape: Batch x Time x Channel"""
|
||||
|
||||
@ -237,12 +240,14 @@ class MetaClip2Attention(nn.Module):
|
||||
is_causal=self.is_causal,
|
||||
scaling=self.scale,
|
||||
dropout=0.0 if not self.training else self.dropout,
|
||||
**kwargs,
|
||||
output_attentions=output_attentions,
|
||||
)
|
||||
|
||||
attn_output = attn_output.reshape(batch_size, seq_length, embed_dim).contiguous()
|
||||
attn_output = self.out_proj(attn_output)
|
||||
|
||||
if not output_attentions:
|
||||
attn_weights = None
|
||||
return attn_output, attn_weights
|
||||
|
||||
|
||||
@ -261,41 +266,6 @@ class MetaClip2MLP(nn.Module):
|
||||
return hidden_states
|
||||
|
||||
|
||||
class MetaClip2EncoderLayer(GradientCheckpointingLayer):
|
||||
def __init__(self, config: Union[MetaClip2VisionConfig, MetaClip2TextConfig]):
|
||||
super().__init__()
|
||||
self.embed_dim = config.hidden_size
|
||||
self.self_attn = MetaClip2Attention(config)
|
||||
self.layer_norm1 = nn.LayerNorm(self.embed_dim, eps=config.layer_norm_eps)
|
||||
self.mlp = MetaClip2MLP(config)
|
||||
self.layer_norm2 = nn.LayerNorm(self.embed_dim, eps=config.layer_norm_eps)
|
||||
|
||||
def forward(
|
||||
self,
|
||||
hidden_states: torch.Tensor,
|
||||
attention_mask: torch.Tensor,
|
||||
causal_attention_mask: torch.Tensor,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
) -> torch.FloatTensor:
|
||||
residual = hidden_states
|
||||
|
||||
hidden_states = self.layer_norm1(hidden_states)
|
||||
hidden_states, attn_weights = self.self_attn(
|
||||
hidden_states=hidden_states,
|
||||
attention_mask=attention_mask,
|
||||
causal_attention_mask=causal_attention_mask,
|
||||
**kwargs,
|
||||
)
|
||||
hidden_states = residual + hidden_states
|
||||
|
||||
residual = hidden_states
|
||||
hidden_states = self.layer_norm2(hidden_states)
|
||||
hidden_states = self.mlp(hidden_states)
|
||||
hidden_states = residual + hidden_states
|
||||
|
||||
return hidden_states
|
||||
|
||||
|
||||
@auto_docstring
|
||||
class MetaClip2PreTrainedModel(PreTrainedModel):
|
||||
config: MetaClip2Config
|
||||
@ -306,10 +276,6 @@ class MetaClip2PreTrainedModel(PreTrainedModel):
|
||||
_supports_flash_attn = True
|
||||
_supports_flex_attn = True
|
||||
_supports_attention_backend = True
|
||||
_can_record_outputs = {
|
||||
"hidden_states": MetaClip2EncoderLayer,
|
||||
"attentions": MetaClip2Attention,
|
||||
}
|
||||
|
||||
def _init_weights(self, module):
|
||||
"""Initialize the weights"""
|
||||
@ -368,6 +334,56 @@ class MetaClip2PreTrainedModel(PreTrainedModel):
|
||||
module.bias.data.zero_()
|
||||
|
||||
|
||||
class MetaClip2EncoderLayer(GradientCheckpointingLayer):
|
||||
def __init__(self, config: Union[MetaClip2VisionConfig, MetaClip2TextConfig]):
|
||||
super().__init__()
|
||||
self.embed_dim = config.hidden_size
|
||||
self.self_attn = MetaClip2Attention(config)
|
||||
self.layer_norm1 = nn.LayerNorm(self.embed_dim, eps=config.layer_norm_eps)
|
||||
self.mlp = MetaClip2MLP(config)
|
||||
self.layer_norm2 = nn.LayerNorm(self.embed_dim, eps=config.layer_norm_eps)
|
||||
|
||||
def forward(
|
||||
self,
|
||||
hidden_states: torch.Tensor,
|
||||
attention_mask: torch.Tensor,
|
||||
causal_attention_mask: torch.Tensor,
|
||||
output_attentions: Optional[bool] = False,
|
||||
) -> tuple[torch.FloatTensor]:
|
||||
"""
|
||||
Args:
|
||||
hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)`
|
||||
attention_mask (`torch.FloatTensor`): attention mask of size
|
||||
`(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values.
|
||||
`(config.encoder_attention_heads,)`.
|
||||
output_attentions (`bool`, *optional*):
|
||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||
returned tensors for more detail.
|
||||
"""
|
||||
residual = hidden_states
|
||||
|
||||
hidden_states = self.layer_norm1(hidden_states)
|
||||
hidden_states, attn_weights = self.self_attn(
|
||||
hidden_states=hidden_states,
|
||||
attention_mask=attention_mask,
|
||||
causal_attention_mask=causal_attention_mask,
|
||||
output_attentions=output_attentions,
|
||||
)
|
||||
hidden_states = residual + hidden_states
|
||||
|
||||
residual = hidden_states
|
||||
hidden_states = self.layer_norm2(hidden_states)
|
||||
hidden_states = self.mlp(hidden_states)
|
||||
hidden_states = residual + hidden_states
|
||||
|
||||
outputs = (hidden_states,)
|
||||
|
||||
if output_attentions:
|
||||
outputs += (attn_weights,)
|
||||
|
||||
return outputs
|
||||
|
||||
|
||||
class MetaClip2Encoder(nn.Module):
|
||||
"""
|
||||
Transformer encoder consisting of `config.num_hidden_layers` self attention layers. Each layer is a
|
||||
@ -388,7 +404,8 @@ class MetaClip2Encoder(nn.Module):
|
||||
inputs_embeds,
|
||||
attention_mask: Optional[torch.Tensor] = None,
|
||||
causal_attention_mask: Optional[torch.Tensor] = None,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
) -> BaseModelOutput:
|
||||
r"""
|
||||
Args:
|
||||
@ -410,18 +427,46 @@ class MetaClip2Encoder(nn.Module):
|
||||
- 0 for tokens that are **masked**.
|
||||
|
||||
[What are attention masks?](../glossary#attention-mask)
|
||||
output_attentions (`bool`, *optional*):
|
||||
Whether or not to return the attentions tensors of all attention layers. See `attentions` under
|
||||
returned tensors for more detail.
|
||||
output_hidden_states (`bool`, *optional*):
|
||||
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors
|
||||
for more detail.
|
||||
return_dict (`bool`, *optional*):
|
||||
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
|
||||
"""
|
||||
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
|
||||
output_hidden_states = (
|
||||
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
|
||||
)
|
||||
|
||||
encoder_states = () if output_hidden_states else None
|
||||
all_attentions = () if output_attentions else None
|
||||
|
||||
hidden_states = inputs_embeds
|
||||
for encoder_layer in self.layers:
|
||||
hidden_states = encoder_layer(
|
||||
for idx, encoder_layer in enumerate(self.layers):
|
||||
if output_hidden_states:
|
||||
encoder_states = encoder_states + (hidden_states,)
|
||||
layer_outputs = encoder_layer(
|
||||
hidden_states,
|
||||
attention_mask,
|
||||
causal_attention_mask,
|
||||
**kwargs,
|
||||
output_attentions=output_attentions,
|
||||
)
|
||||
|
||||
hidden_states = layer_outputs[0]
|
||||
|
||||
if output_attentions:
|
||||
all_attentions = all_attentions + (layer_outputs[1],)
|
||||
|
||||
if output_hidden_states:
|
||||
encoder_states = encoder_states + (hidden_states,)
|
||||
|
||||
return BaseModelOutput(
|
||||
last_hidden_state=hidden_states,
|
||||
hidden_states=encoder_states,
|
||||
attentions=all_attentions,
|
||||
)
|
||||
|
||||
|
||||
@ -541,7 +586,6 @@ class MetaClip2TextModel(MetaClip2PreTrainedModel):
|
||||
def set_input_embeddings(self, value):
|
||||
self.text_model.embeddings.token_embedding = value
|
||||
|
||||
@check_model_inputs()
|
||||
@can_return_tuple
|
||||
@auto_docstring
|
||||
def forward(
|
||||
@ -551,7 +595,6 @@ class MetaClip2TextModel(MetaClip2PreTrainedModel):
|
||||
position_ids: Optional[torch.Tensor] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
) -> BaseModelOutputWithPooling:
|
||||
r"""
|
||||
Examples:
|
||||
@ -573,7 +616,8 @@ class MetaClip2TextModel(MetaClip2PreTrainedModel):
|
||||
input_ids=input_ids,
|
||||
attention_mask=attention_mask,
|
||||
position_ids=position_ids,
|
||||
**kwargs,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
)
|
||||
|
||||
|
||||
@ -650,7 +694,6 @@ class MetaClip2TextModelWithProjection(MetaClip2PreTrainedModel):
|
||||
def set_input_embeddings(self, value):
|
||||
self.text_model.embeddings.token_embedding = value
|
||||
|
||||
@check_model_inputs()
|
||||
@can_return_tuple
|
||||
@auto_docstring
|
||||
def forward(
|
||||
@ -660,7 +703,6 @@ class MetaClip2TextModelWithProjection(MetaClip2PreTrainedModel):
|
||||
position_ids: Optional[torch.Tensor] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
) -> MetaClip2TextModelOutput:
|
||||
r"""
|
||||
Examples:
|
||||
@ -681,7 +723,8 @@ class MetaClip2TextModelWithProjection(MetaClip2PreTrainedModel):
|
||||
input_ids=input_ids,
|
||||
attention_mask=attention_mask,
|
||||
position_ids=position_ids,
|
||||
**kwargs,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
)
|
||||
pooled_output = text_outputs.pooler_output
|
||||
text_embeds = self.text_projection(pooled_output)
|
||||
@ -689,6 +732,8 @@ class MetaClip2TextModelWithProjection(MetaClip2PreTrainedModel):
|
||||
return MetaClip2TextModelOutput(
|
||||
text_embeds=text_embeds,
|
||||
last_hidden_state=text_outputs.last_hidden_state,
|
||||
hidden_states=text_outputs.hidden_states,
|
||||
attentions=text_outputs.attentions,
|
||||
)
|
||||
|
||||
|
||||
@ -836,6 +881,8 @@ class MetaClip2Model(MetaClip2PreTrainedModel):
|
||||
input_ids: Optional[torch.Tensor] = None,
|
||||
attention_mask: Optional[torch.Tensor] = None,
|
||||
position_ids: Optional[torch.Tensor] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
) -> torch.FloatTensor:
|
||||
r"""
|
||||
Returns:
|
||||
@ -868,6 +915,8 @@ class MetaClip2Model(MetaClip2PreTrainedModel):
|
||||
def get_image_features(
|
||||
self,
|
||||
pixel_values: Optional[torch.FloatTensor] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
interpolate_pos_encoding: bool = False,
|
||||
) -> torch.FloatTensor:
|
||||
r"""
|
||||
@ -910,8 +959,9 @@ class MetaClip2Model(MetaClip2PreTrainedModel):
|
||||
attention_mask: Optional[torch.Tensor] = None,
|
||||
position_ids: Optional[torch.LongTensor] = None,
|
||||
return_loss: Optional[bool] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
interpolate_pos_encoding: bool = False,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
) -> MetaClip2Output:
|
||||
r"""
|
||||
return_loss (`bool`, *optional*):
|
||||
@ -938,17 +988,25 @@ class MetaClip2Model(MetaClip2PreTrainedModel):
|
||||
>>> logits_per_image = outputs.logits_per_image # this is the image-text similarity score
|
||||
>>> probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
|
||||
```"""
|
||||
# Use METACLIP_2 model's config for some fields (if specified) instead of those of vision & text components.
|
||||
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
|
||||
output_hidden_states = (
|
||||
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
|
||||
)
|
||||
|
||||
vision_outputs: BaseModelOutputWithPooling = self.vision_model(
|
||||
pixel_values=pixel_values,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
interpolate_pos_encoding=interpolate_pos_encoding,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
text_outputs: BaseModelOutputWithPooling = self.text_model(
|
||||
input_ids=input_ids,
|
||||
attention_mask=attention_mask,
|
||||
position_ids=position_ids,
|
||||
**kwargs,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
)
|
||||
|
||||
image_embeds = vision_outputs.pooler_output
|
||||
@ -997,9 +1055,15 @@ class MetaClip2VisionTransformer(nn.Module):
|
||||
def forward(
|
||||
self,
|
||||
pixel_values: Optional[torch.FloatTensor] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
interpolate_pos_encoding: Optional[bool] = False,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
) -> BaseModelOutputWithPooling:
|
||||
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
|
||||
output_hidden_states = (
|
||||
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
|
||||
)
|
||||
|
||||
if pixel_values is None:
|
||||
raise ValueError("You have to specify pixel_values")
|
||||
|
||||
@ -1008,7 +1072,8 @@ class MetaClip2VisionTransformer(nn.Module):
|
||||
|
||||
encoder_outputs: BaseModelOutput = self.encoder(
|
||||
inputs_embeds=hidden_states,
|
||||
**kwargs,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
)
|
||||
|
||||
last_hidden_state = encoder_outputs.last_hidden_state
|
||||
@ -1018,6 +1083,8 @@ class MetaClip2VisionTransformer(nn.Module):
|
||||
return BaseModelOutputWithPooling(
|
||||
last_hidden_state=last_hidden_state,
|
||||
pooler_output=pooled_output,
|
||||
hidden_states=encoder_outputs.hidden_states,
|
||||
attentions=encoder_outputs.attentions,
|
||||
)
|
||||
|
||||
|
||||
@ -1077,13 +1144,14 @@ class MetaClip2VisionModel(MetaClip2PreTrainedModel):
|
||||
def get_input_embeddings(self) -> nn.Module:
|
||||
return self.vision_model.embeddings.patch_embedding
|
||||
|
||||
@check_model_inputs(tie_last_hidden_states=False)
|
||||
@can_return_tuple
|
||||
@auto_docstring
|
||||
def forward(
|
||||
self,
|
||||
pixel_values: Optional[torch.FloatTensor] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
interpolate_pos_encoding: bool = False,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
) -> BaseModelOutputWithPooling:
|
||||
r"""
|
||||
Examples:
|
||||
@ -1108,8 +1176,9 @@ class MetaClip2VisionModel(MetaClip2PreTrainedModel):
|
||||
|
||||
return self.vision_model(
|
||||
pixel_values=pixel_values,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
interpolate_pos_encoding=interpolate_pos_encoding,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
|
||||
@ -1186,14 +1255,14 @@ class MetaClip2VisionModelWithProjection(MetaClip2PreTrainedModel):
|
||||
def get_input_embeddings(self) -> nn.Module:
|
||||
return self.vision_model.embeddings.patch_embedding
|
||||
|
||||
@check_model_inputs(tie_last_hidden_states=False)
|
||||
@can_return_tuple
|
||||
@auto_docstring
|
||||
def forward(
|
||||
self,
|
||||
pixel_values: Optional[torch.FloatTensor] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
interpolate_pos_encoding: bool = False,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
) -> MetaClip2VisionModelOutput:
|
||||
r"""
|
||||
Examples:
|
||||
@ -1217,8 +1286,9 @@ class MetaClip2VisionModelWithProjection(MetaClip2PreTrainedModel):
|
||||
|
||||
vision_outputs: BaseModelOutputWithPooling = self.vision_model(
|
||||
pixel_values=pixel_values,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
interpolate_pos_encoding=interpolate_pos_encoding,
|
||||
**kwargs,
|
||||
)
|
||||
pooled_output = vision_outputs.pooler_output
|
||||
image_embeds = self.visual_projection(pooled_output)
|
||||
@ -1226,6 +1296,8 @@ class MetaClip2VisionModelWithProjection(MetaClip2PreTrainedModel):
|
||||
return MetaClip2VisionModelOutput(
|
||||
image_embeds=image_embeds,
|
||||
last_hidden_state=vision_outputs.last_hidden_state,
|
||||
hidden_states=vision_outputs.hidden_states,
|
||||
attentions=vision_outputs.attentions,
|
||||
)
|
||||
|
||||
|
||||
@ -1254,14 +1326,14 @@ class MetaClip2ForImageClassification(MetaClip2PreTrainedModel):
|
||||
# Initialize weights and apply final processing
|
||||
self.post_init()
|
||||
|
||||
@check_model_inputs()
|
||||
@can_return_tuple
|
||||
@auto_docstring
|
||||
def forward(
|
||||
self,
|
||||
pixel_values: Optional[torch.Tensor] = None,
|
||||
labels: Optional[torch.Tensor] = None,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
) -> ImageClassifierOutput:
|
||||
r"""
|
||||
labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
|
||||
@ -1269,14 +1341,22 @@ class MetaClip2ForImageClassification(MetaClip2PreTrainedModel):
|
||||
config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
|
||||
`config.num_labels > 1` a classification loss is computed (Cross-Entropy).
|
||||
"""
|
||||
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
|
||||
output_hidden_states = (
|
||||
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
|
||||
)
|
||||
|
||||
outputs: BaseModelOutputWithPooling = self.vision_model(
|
||||
pixel_values,
|
||||
**kwargs,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
)
|
||||
|
||||
sequence_output = outputs.last_hidden_state
|
||||
|
||||
# average pool the patch tokens
|
||||
sequence_output = torch.mean(sequence_output[:, 1:, :], dim=1)
|
||||
# apply classifier
|
||||
logits = self.classifier(sequence_output)
|
||||
|
||||
loss = None
|
||||
@ -1286,6 +1366,8 @@ class MetaClip2ForImageClassification(MetaClip2PreTrainedModel):
|
||||
return ImageClassifierOutput(
|
||||
loss=loss,
|
||||
logits=logits,
|
||||
hidden_states=outputs.hidden_states,
|
||||
attentions=outputs.attentions,
|
||||
)
|
||||
|
||||
|
||||
|
@ -7,13 +7,12 @@ from ...modeling_attn_mask_utils import _create_4d_causal_attention_mask, _prepa
|
||||
from ...modeling_outputs import BaseModelOutput, BaseModelOutputWithPooling
|
||||
from ...modeling_utils import PreTrainedModel
|
||||
from ...processing_utils import Unpack
|
||||
from ...utils import TransformersKwargs, auto_docstring, can_return_tuple, logging
|
||||
from ...utils import TransformersKwargs, auto_docstring, logging
|
||||
from ...utils.generic import check_model_inputs
|
||||
from ..clip.configuration_clip import CLIPConfig, CLIPTextConfig, CLIPVisionConfig
|
||||
from ..clip.modeling_clip import (
|
||||
CLIPMLP,
|
||||
CLIPAttention,
|
||||
CLIPEncoderLayer,
|
||||
CLIPForImageClassification,
|
||||
CLIPModel,
|
||||
CLIPTextEmbeddings,
|
||||
@ -214,10 +213,6 @@ class MetaClip2MLP(CLIPMLP):
|
||||
pass
|
||||
|
||||
|
||||
class MetaClip2EncoderLayer(CLIPEncoderLayer):
|
||||
pass
|
||||
|
||||
|
||||
@auto_docstring
|
||||
class MetaClip2PreTrainedModel(PreTrainedModel):
|
||||
config: MetaClip2Config
|
||||
@ -228,10 +223,6 @@ class MetaClip2PreTrainedModel(PreTrainedModel):
|
||||
_supports_flash_attn = True
|
||||
_supports_flex_attn = True
|
||||
_supports_attention_backend = True
|
||||
_can_record_outputs = {
|
||||
"hidden_states": MetaClip2EncoderLayer,
|
||||
"attentions": MetaClip2Attention,
|
||||
}
|
||||
|
||||
def _init_weights(self, module):
|
||||
"""Initialize the weights"""
|
||||
@ -378,9 +369,6 @@ class MetaClip2TextModel(CLIPTextModel):
|
||||
# Initialize weights and apply final processing
|
||||
self.post_init()
|
||||
|
||||
@check_model_inputs()
|
||||
@can_return_tuple
|
||||
@auto_docstring
|
||||
def forward(
|
||||
self,
|
||||
input_ids: Optional[torch.Tensor] = None,
|
||||
@ -388,7 +376,6 @@ class MetaClip2TextModel(CLIPTextModel):
|
||||
position_ids: Optional[torch.Tensor] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
):
|
||||
r"""
|
||||
Examples:
|
||||
@ -411,7 +398,6 @@ class MetaClip2TextModel(CLIPTextModel):
|
||||
position_ids=position_ids,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
|
||||
@ -464,7 +450,6 @@ class MetaClip2TextModelWithProjection(CLIPTextModelWithProjection):
|
||||
position_ids: Optional[torch.Tensor] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
):
|
||||
r"""
|
||||
Examples:
|
||||
@ -486,7 +471,6 @@ class MetaClip2TextModelWithProjection(CLIPTextModelWithProjection):
|
||||
position_ids=position_ids,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
|
||||
@ -557,8 +541,9 @@ class MetaClip2Model(CLIPModel):
|
||||
attention_mask: Optional[torch.Tensor] = None,
|
||||
position_ids: Optional[torch.LongTensor] = None,
|
||||
return_loss: Optional[bool] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
interpolate_pos_encoding: bool = False,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
):
|
||||
r"""
|
||||
return_loss (`bool`, *optional*):
|
||||
@ -591,8 +576,9 @@ class MetaClip2Model(CLIPModel):
|
||||
attention_mask=attention_mask,
|
||||
position_ids=position_ids,
|
||||
return_loss=return_loss,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
interpolate_pos_encoding=interpolate_pos_encoding,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
def get_text_features(
|
||||
@ -600,6 +586,8 @@ class MetaClip2Model(CLIPModel):
|
||||
input_ids: Optional[torch.Tensor] = None,
|
||||
attention_mask: Optional[torch.Tensor] = None,
|
||||
position_ids: Optional[torch.Tensor] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
):
|
||||
r"""
|
||||
Returns:
|
||||
@ -621,11 +609,15 @@ class MetaClip2Model(CLIPModel):
|
||||
input_ids=input_ids,
|
||||
attention_mask=attention_mask,
|
||||
position_ids=position_ids,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
)
|
||||
|
||||
def get_image_features(
|
||||
self,
|
||||
pixel_values: Optional[torch.FloatTensor] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
interpolate_pos_encoding: bool = False,
|
||||
):
|
||||
r"""
|
||||
@ -652,6 +644,8 @@ class MetaClip2Model(CLIPModel):
|
||||
```"""
|
||||
return super().get_image_features(
|
||||
pixel_values=pixel_values,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
interpolate_pos_encoding=interpolate_pos_encoding,
|
||||
)
|
||||
|
||||
@ -693,13 +687,12 @@ class MetaClip2VisionModel(CLIPVisionModel):
|
||||
>>> pooled_output = outputs.pooler_output # pooled CLS states
|
||||
```"""
|
||||
|
||||
@check_model_inputs(tie_last_hidden_states=False)
|
||||
@can_return_tuple
|
||||
def forward(
|
||||
self,
|
||||
pixel_values: Optional[torch.FloatTensor] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
interpolate_pos_encoding: bool = False,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
):
|
||||
r"""
|
||||
Examples:
|
||||
@ -723,8 +716,9 @@ class MetaClip2VisionModel(CLIPVisionModel):
|
||||
```"""
|
||||
return super().forward(
|
||||
pixel_values=pixel_values,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
interpolate_pos_encoding=interpolate_pos_encoding,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
|
||||
@ -767,8 +761,9 @@ class MetaClip2VisionModelWithProjection(CLIPVisionModelWithProjection):
|
||||
def forward(
|
||||
self,
|
||||
pixel_values: Optional[torch.FloatTensor] = None,
|
||||
output_attentions: Optional[bool] = None,
|
||||
output_hidden_states: Optional[bool] = None,
|
||||
interpolate_pos_encoding: bool = False,
|
||||
**kwargs: Unpack[TransformersKwargs],
|
||||
):
|
||||
r"""
|
||||
Examples:
|
||||
@ -791,8 +786,9 @@ class MetaClip2VisionModelWithProjection(CLIPVisionModelWithProjection):
|
||||
```"""
|
||||
return super().forward(
|
||||
pixel_values=pixel_values,
|
||||
output_attentions=output_attentions,
|
||||
output_hidden_states=output_hidden_states,
|
||||
interpolate_pos_encoding=interpolate_pos_encoding,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""MobileBERT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -160,4 +164,21 @@ class MobileBertConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["MobileBertConfig"]
|
||||
# Copied from transformers.models.bert.configuration_bert.BertOnnxConfig with Bert->MobileBert
|
||||
class MobileBertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["MobileBertConfig", "MobileBertOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""MobileNetV1 model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -98,4 +104,23 @@ class MobileNetV1Config(PreTrainedConfig):
|
||||
self.layer_norm_eps = layer_norm_eps
|
||||
|
||||
|
||||
__all__ = ["MobileNetV1Config"]
|
||||
class MobileNetV1OnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict([("pixel_values", {0: "batch"})])
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "image-classification":
|
||||
return OrderedDict([("logits", {0: "batch"})])
|
||||
else:
|
||||
return OrderedDict([("last_hidden_state", {0: "batch"}), ("pooler_output", {0: "batch"})])
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["MobileNetV1Config", "MobileNetV1OnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""MobileNetV2 model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -126,4 +132,23 @@ class MobileNetV2Config(PreTrainedConfig):
|
||||
self.semantic_loss_ignore_index = semantic_loss_ignore_index
|
||||
|
||||
|
||||
__all__ = ["MobileNetV2Config"]
|
||||
class MobileNetV2OnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict([("pixel_values", {0: "batch"})])
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "image-classification":
|
||||
return OrderedDict([("logits", {0: "batch"})])
|
||||
else:
|
||||
return OrderedDict([("last_hidden_state", {0: "batch"}), ("pooler_output", {0: "batch"})])
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["MobileNetV2Config", "MobileNetV2OnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""MobileViT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -144,4 +150,23 @@ class MobileViTConfig(PreTrainedConfig):
|
||||
self.semantic_loss_ignore_index = semantic_loss_ignore_index
|
||||
|
||||
|
||||
__all__ = ["MobileViTConfig"]
|
||||
class MobileViTOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict([("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"})])
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "image-classification":
|
||||
return OrderedDict([("logits", {0: "batch"})])
|
||||
else:
|
||||
return OrderedDict([("last_hidden_state", {0: "batch"}), ("pooler_output", {0: "batch"})])
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["MobileViTConfig", "MobileViTOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""MobileViTV2 model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -140,4 +146,23 @@ class MobileViTV2Config(PreTrainedConfig):
|
||||
self.semantic_loss_ignore_index = semantic_loss_ignore_index
|
||||
|
||||
|
||||
__all__ = ["MobileViTV2Config"]
|
||||
class MobileViTV2OnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict([("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"})])
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "image-classification":
|
||||
return OrderedDict([("logits", {0: "batch"})])
|
||||
else:
|
||||
return OrderedDict([("last_hidden_state", {0: "batch"}), ("pooler_output", {0: "batch"})])
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["MobileViTV2Config", "MobileViTV2OnnxConfig"]
|
||||
|
@ -905,8 +905,6 @@ class ModernBertModel(ModernBertPreTrainedModel):
|
||||
inputs_embeds, indices, cu_seqlens, max_seqlen, *_ = _unpad_modernbert_input(
|
||||
inputs=inputs_embeds, attention_mask=attention_mask
|
||||
)
|
||||
if position_ids is None:
|
||||
position_ids = indices.unsqueeze(0)
|
||||
else:
|
||||
if position_ids is None:
|
||||
position_ids = torch.arange(seq_len, device=device).unsqueeze(0)
|
||||
|
@ -1014,8 +1014,6 @@ class ModernBertModel(ModernBertPreTrainedModel):
|
||||
inputs_embeds, indices, cu_seqlens, max_seqlen, *_ = _unpad_modernbert_input(
|
||||
inputs=inputs_embeds, attention_mask=attention_mask
|
||||
)
|
||||
if position_ids is None:
|
||||
position_ids = indices.unsqueeze(0)
|
||||
else:
|
||||
if position_ids is None:
|
||||
position_ids = torch.arange(seq_len, device=device).unsqueeze(0)
|
||||
|
@ -14,7 +14,10 @@
|
||||
# limitations under the License.
|
||||
"""mT5 model configuration"""
|
||||
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxSeq2SeqConfigWithPast
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -145,4 +148,35 @@ class MT5Config(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["MT5Config"]
|
||||
class MT5OnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
# Copied from transformers.models.t5.configuration_t5.T5OnnxConfig.inputs
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = {
|
||||
"input_ids": {0: "batch", 1: "encoder_sequence"},
|
||||
"attention_mask": {0: "batch", 1: "encoder_sequence"},
|
||||
}
|
||||
if self.use_past:
|
||||
common_inputs["attention_mask"][1] = "past_encoder_sequence + sequence"
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
# Copied from transformers.models.t5.configuration_t5.T5OnnxConfig.default_onnx_opset
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 5e-4
|
||||
|
||||
|
||||
__all__ = ["MT5Config", "MT5OnnxConfig"]
|
||||
|
@ -14,7 +14,16 @@
|
||||
# limitations under the License.
|
||||
"""OWL-ViT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ...processing_utils import ProcessorMixin
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -265,4 +274,52 @@ class OwlViTConfig(PreTrainedConfig):
|
||||
super().__init__(**kwargs)
|
||||
|
||||
|
||||
__all__ = ["OwlViTConfig", "OwlViTTextConfig", "OwlViTVisionConfig"]
|
||||
class OwlViTOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
("attention_mask", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("logits_per_image", {0: "batch"}),
|
||||
("logits_per_text", {0: "batch"}),
|
||||
("text_embeds", {0: "batch"}),
|
||||
("image_embeds", {0: "batch"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
processor: "ProcessorMixin",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
) -> Mapping[str, Any]:
|
||||
text_input_dict = super().generate_dummy_inputs(
|
||||
processor.tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
)
|
||||
image_input_dict = super().generate_dummy_inputs(
|
||||
processor.image_processor,
|
||||
batch_size=batch_size,
|
||||
)
|
||||
return {**text_input_dict, **image_input_dict}
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 14
|
||||
|
||||
|
||||
__all__ = ["OwlViTConfig", "OwlViTOnnxConfig", "OwlViTTextConfig", "OwlViTVisionConfig"]
|
||||
|
@ -116,23 +116,15 @@ def token_type_ids_mask_function(
|
||||
# If it's 1 for both query and key/value, we are in an image block
|
||||
# NOTE: static cache shape goes beyond input seq length, while token_type_ids.shape[1] == input seq length
|
||||
# Since vmap doesn't support `if statement` we workaround it with `torch.where`
|
||||
safe_q_idx = torch.where(q_idx < token_type_ids.shape[1], q_idx, 0)
|
||||
safe_kv_idx = torch.where(kv_idx < token_type_ids.shape[1], kv_idx, 0)
|
||||
|
||||
token_type_ids_at_q_idx = token_type_ids[batch_idx, safe_q_idx]
|
||||
token_type_ids_at_q_idx = torch.where(q_idx < token_type_ids.shape[1], token_type_ids_at_q_idx, 0)
|
||||
|
||||
token_type_ids_at_kv_idx = token_type_ids[batch_idx, safe_kv_idx]
|
||||
safe_idx = torch.where(kv_idx < token_type_ids.shape[1], kv_idx, 0)
|
||||
token_type_ids_at_kv_idx = token_type_ids[batch_idx, safe_idx]
|
||||
token_type_ids_at_kv_idx = torch.where(kv_idx < token_type_ids.shape[1], token_type_ids_at_kv_idx, 0)
|
||||
|
||||
image_group_ids_at_q_idx = image_group_ids[batch_idx, safe_q_idx]
|
||||
image_group_ids_at_q_idx = torch.where(q_idx < image_group_ids.shape[1], image_group_ids_at_q_idx, -1)
|
||||
|
||||
image_group_ids_at_kv_idx = image_group_ids[batch_idx, safe_kv_idx]
|
||||
image_group_ids_at_kv_idx = image_group_ids[batch_idx, safe_idx]
|
||||
image_group_ids_at_kv_idx = torch.where(kv_idx < image_group_ids.shape[1], image_group_ids_at_kv_idx, -1)
|
||||
|
||||
is_image_block = (token_type_ids_at_q_idx == 1) & (token_type_ids_at_kv_idx == 1)
|
||||
same_image_block = image_group_ids_at_q_idx == image_group_ids_at_kv_idx
|
||||
is_image_block = (token_type_ids[batch_idx, q_idx] == 1) & (token_type_ids_at_kv_idx == 1)
|
||||
same_image_block = image_group_ids[batch_idx, q_idx] == image_group_ids_at_kv_idx
|
||||
|
||||
# This is bidirectional attention whenever we are dealing with image tokens
|
||||
return is_image_block & same_image_block
|
||||
|
@ -14,7 +14,15 @@
|
||||
# limitations under the License.
|
||||
"""Perceiver model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any, Union
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...feature_extraction_utils import FeatureExtractionMixin
|
||||
from ...onnx import OnnxConfig
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...tokenization_utils_base import PreTrainedTokenizerBase
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -174,4 +182,63 @@ class PerceiverConfig(PreTrainedConfig):
|
||||
self._label_trainable_num_channels = _label_trainable_num_channels
|
||||
|
||||
|
||||
__all__ = ["PerceiverConfig"]
|
||||
class PerceiverOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("inputs", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
preprocessor: Union["PreTrainedTokenizerBase", "FeatureExtractionMixin"],
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
num_choices: int = -1,
|
||||
is_pair: bool = False,
|
||||
num_channels: int = 3,
|
||||
image_width: int = 40,
|
||||
image_height: int = 40,
|
||||
) -> Mapping[str, Any]:
|
||||
# copied from `transformers.onnx.config.OnnxConfig` and slightly altered/simplified
|
||||
|
||||
if isinstance(preprocessor, PreTrainedTokenizerBase):
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = preprocessor.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_input = [" ".join(["a"]) * seq_length] * batch_size
|
||||
inputs = dict(preprocessor(dummy_input, return_tensors="pt"))
|
||||
inputs["inputs"] = inputs.pop("input_ids")
|
||||
return inputs
|
||||
elif isinstance(preprocessor, FeatureExtractionMixin) and preprocessor.model_input_names[0] == "pixel_values":
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(batch_size, fixed_dimension=OnnxConfig.default_fixed_batch)
|
||||
dummy_input = self._generate_dummy_images(batch_size, num_channels, image_height, image_width)
|
||||
inputs = dict(preprocessor(images=dummy_input, return_tensors="pt"))
|
||||
inputs["inputs"] = inputs.pop("pixel_values")
|
||||
return inputs
|
||||
else:
|
||||
raise ValueError(
|
||||
"Unable to generate dummy inputs for the model. Please provide a tokenizer or a preprocessor."
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["PerceiverConfig", "PerceiverOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""PLBART model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfigWithPast
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -161,4 +165,33 @@ class PLBartConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
class PLBartOnnxConfig(OnnxConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.use_past:
|
||||
return OrderedDict(
|
||||
[
|
||||
("last_hidden_state", {0: "batch", 1: "sequence"}),
|
||||
("past_keys", {0: "batch", 2: "sequence"}),
|
||||
("encoder_last_hidden_state", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
else:
|
||||
return OrderedDict(
|
||||
[
|
||||
("last_hidden_state", {0: "batch", 1: "sequence"}),
|
||||
("encoder_last_hidden_state", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["PLBartConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""PoolFormer model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -123,4 +129,20 @@ class PoolFormerConfig(PreTrainedConfig):
|
||||
super().__init__(**kwargs)
|
||||
|
||||
|
||||
__all__ = ["PoolFormerConfig"]
|
||||
class PoolFormerOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 2e-3
|
||||
|
||||
|
||||
__all__ = ["PoolFormerConfig", "PoolFormerOnnxConfig"]
|
||||
|
@ -16,9 +16,13 @@
|
||||
# limitations under the License.
|
||||
"""Pvt model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Callable, Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -135,4 +139,24 @@ class PvtConfig(PreTrainedConfig):
|
||||
self.qkv_bias = qkv_bias
|
||||
|
||||
|
||||
__all__ = ["PvtConfig"]
|
||||
class PvtOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
|
||||
__all__ = ["PvtConfig", "PvtOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""RemBERT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -135,4 +139,24 @@ class RemBertConfig(PreTrainedConfig):
|
||||
self.tie_word_embeddings = False
|
||||
|
||||
|
||||
__all__ = ["RemBertConfig"]
|
||||
class RemBertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["RemBertConfig", "RemBertOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""ResNet model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
from ...utils.backbone_utils import BackboneConfigMixin, get_aligned_output_features_output_indices
|
||||
|
||||
@ -111,4 +117,20 @@ class ResNetConfig(BackboneConfigMixin, PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["ResNetConfig"]
|
||||
class ResNetOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-3
|
||||
|
||||
|
||||
__all__ = ["ResNetConfig", "ResNetOnnxConfig"]
|
||||
|
@ -15,7 +15,11 @@
|
||||
# limitations under the License.
|
||||
"""RoBERTa configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -125,4 +129,19 @@ class RobertaConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["RobertaConfig"]
|
||||
class RobertaOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["RobertaConfig", "RobertaOnnxConfig"]
|
||||
|
@ -15,7 +15,11 @@
|
||||
# limitations under the License.
|
||||
"""RoBERTa-PreLayerNorm configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -126,4 +130,20 @@ class RobertaPreLayerNormConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["RobertaPreLayerNormConfig"]
|
||||
# Copied from transformers.models.roberta.configuration_roberta.RobertaOnnxConfig with Roberta->RobertaPreLayerNorm
|
||||
class RobertaPreLayerNormOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["RobertaPreLayerNormConfig", "RobertaPreLayerNormOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""RoFormer model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -126,4 +130,21 @@ class RoFormerConfig(PreTrainedConfig):
|
||||
self.use_cache = use_cache
|
||||
|
||||
|
||||
__all__ = ["RoFormerConfig"]
|
||||
class RoFormerOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["RoFormerConfig", "RoFormerOnnxConfig"]
|
||||
|
@ -15,8 +15,13 @@
|
||||
"""SegFormer model configuration"""
|
||||
|
||||
import warnings
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -143,4 +148,24 @@ class SegformerConfig(PreTrainedConfig):
|
||||
self.semantic_loss_ignore_index = semantic_loss_ignore_index
|
||||
|
||||
|
||||
__all__ = ["SegformerConfig"]
|
||||
class SegformerOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
|
||||
__all__ = ["SegformerConfig", "SegformerOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""SqueezeBERT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -143,4 +147,21 @@ class SqueezeBertConfig(PreTrainedConfig):
|
||||
self.output_groups = output_groups
|
||||
|
||||
|
||||
__all__ = ["SqueezeBertConfig"]
|
||||
# # Copied from transformers.models.bert.configuration_bert.BertOnxxConfig with Bert->SqueezeBert
|
||||
class SqueezeBertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["SqueezeBertConfig", "SqueezeBertOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""SwiftFormer model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -123,4 +129,20 @@ class SwiftFormerConfig(PreTrainedConfig):
|
||||
self.batch_norm_eps = batch_norm_eps
|
||||
|
||||
|
||||
__all__ = ["SwiftFormerConfig"]
|
||||
class SwiftFormerOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["SwiftFormerConfig", "SwiftFormerOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""Swin Transformer model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
from ...utils.backbone_utils import BackboneConfigMixin, get_aligned_output_features_output_indices
|
||||
|
||||
@ -154,4 +160,20 @@ class SwinConfig(BackboneConfigMixin, PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["SwinConfig"]
|
||||
class SwinOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["SwinConfig", "SwinOnnxConfig"]
|
||||
|
@ -14,7 +14,10 @@
|
||||
# limitations under the License.
|
||||
"""T5 model configuration"""
|
||||
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxSeq2SeqConfigWithPast
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -140,4 +143,29 @@ class T5Config(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["T5Config"]
|
||||
class T5OnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = {
|
||||
"input_ids": {0: "batch", 1: "encoder_sequence"},
|
||||
"attention_mask": {0: "batch", 1: "encoder_sequence"},
|
||||
}
|
||||
if self.use_past:
|
||||
common_inputs["attention_mask"][1] = "past_encoder_sequence + sequence"
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
|
||||
__all__ = ["T5Config", "T5OnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""Table Transformer model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
from ...utils.backbone_utils import verify_backbone_config_arguments
|
||||
from ..auto import CONFIG_MAPPING, AutoConfig
|
||||
@ -241,4 +247,26 @@ class TableTransformerConfig(PreTrainedConfig):
|
||||
super().__init__(is_encoder_decoder=is_encoder_decoder, **kwargs)
|
||||
|
||||
|
||||
__all__ = ["TableTransformerConfig"]
|
||||
# Copied from transformers.models.detr.configuration_detr.DetrOnnxConfig
|
||||
class TableTransformerOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
("pixel_mask", {0: "batch"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-5
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
|
||||
__all__ = ["TableTransformerConfig", "TableTransformerOnnxConfig"]
|
||||
|
@ -14,7 +14,10 @@
|
||||
# limitations under the License.
|
||||
"""UMT5 model configuration"""
|
||||
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxSeq2SeqConfigWithPast
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -144,4 +147,35 @@ class UMT5Config(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["UMT5Config"]
|
||||
class UMT5OnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
# Copied from transformers.models.t5.configuration_t5.T5OnnxConfig.inputs
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = {
|
||||
"input_ids": {0: "batch", 1: "encoder_sequence"},
|
||||
"attention_mask": {0: "batch", 1: "encoder_sequence"},
|
||||
}
|
||||
if self.use_past:
|
||||
common_inputs["attention_mask"][1] = "past_encoder_sequence + sequence"
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
# Copied from transformers.models.t5.configuration_t5.T5OnnxConfig.default_onnx_opset
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 5e-4
|
||||
|
||||
|
||||
__all__ = ["UMT5Config", "UMT5OnnxConfig"]
|
||||
|
@ -14,11 +14,21 @@
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
from ..auto.configuration_auto import AutoConfig
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ... import PreTrainedTokenizerBase
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
@ -108,4 +118,100 @@ class VisionEncoderDecoderConfig(PreTrainedConfig):
|
||||
return cls(encoder=encoder_config.to_dict(), decoder=decoder_config.to_dict(), **kwargs)
|
||||
|
||||
|
||||
__all__ = ["VisionEncoderDecoderConfig"]
|
||||
class VisionEncoderDecoderEncoderOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict({"last_hidden_state": {0: "batch", 1: "encoder_sequence"}})
|
||||
|
||||
|
||||
class VisionEncoderDecoderDecoderOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = OrderedDict()
|
||||
common_inputs["input_ids"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
common_inputs["encoder_hidden_states"] = {0: "batch", 1: "encoder_sequence"}
|
||||
|
||||
return common_inputs
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: "PreTrainedTokenizerBase",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
import torch
|
||||
|
||||
common_inputs = OrderedDict()
|
||||
|
||||
dummy_input = super().generate_dummy_inputs(
|
||||
tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
|
||||
batch, encoder_sequence = dummy_input["input_ids"].shape
|
||||
encoder_hidden_states_shape = (batch, encoder_sequence, self._config.encoder_hidden_size)
|
||||
common_inputs["input_ids"] = dummy_input.pop("input_ids")
|
||||
common_inputs["attention_mask"] = dummy_input.pop("attention_mask")
|
||||
common_inputs["encoder_hidden_states"] = torch.zeros(encoder_hidden_states_shape)
|
||||
|
||||
return common_inputs
|
||||
|
||||
|
||||
class VisionEncoderDecoderOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> None:
|
||||
pass
|
||||
|
||||
def get_encoder_config(self, encoder_config: PreTrainedConfig) -> OnnxConfig:
|
||||
r"""
|
||||
Returns ONNX encoder config for `VisionEncoderDecoder` model.
|
||||
|
||||
Args:
|
||||
encoder_config (`PreTrainedConfig`):
|
||||
The encoder model's configuration to use when exporting to ONNX.
|
||||
|
||||
Returns:
|
||||
[`VisionEncoderDecoderEncoderOnnxConfig`]: An instance of the ONNX configuration object
|
||||
"""
|
||||
return VisionEncoderDecoderEncoderOnnxConfig(encoder_config)
|
||||
|
||||
def get_decoder_config(
|
||||
self, encoder_config: PreTrainedConfig, decoder_config: PreTrainedConfig, feature: str = "default"
|
||||
) -> OnnxConfig:
|
||||
r"""
|
||||
Returns ONNX decoder config for `VisionEncoderDecoder` model.
|
||||
|
||||
Args:
|
||||
encoder_config (`PreTrainedConfig`):
|
||||
The encoder model's configuration to use when exporting to ONNX.
|
||||
decoder_config (`PreTrainedConfig`):
|
||||
The decoder model's configuration to use when exporting to ONNX
|
||||
feature (`str`, *optional*):
|
||||
The type of feature to export the model with.
|
||||
|
||||
Returns:
|
||||
[`VisionEncoderDecoderDecoderOnnxConfig`]: An instance of the ONNX configuration object.
|
||||
"""
|
||||
decoder_config.encoder_hidden_size = encoder_config.hidden_size
|
||||
return VisionEncoderDecoderDecoderOnnxConfig(decoder_config, feature)
|
||||
|
||||
|
||||
__all__ = ["VisionEncoderDecoderConfig", "VisionEncoderDecoderOnnxConfig"]
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user