mirror of
https://github.com/huggingface/transformers.git
synced 2025-10-22 02:08:58 +08:00
Compare commits
16 Commits
remove-env
...
vision_vis
Author | SHA1 | Date | |
---|---|---|---|
557ecce22e | |||
f3b187027a | |||
2767a59df9 | |||
c9f1003c70 | |||
b356fce1da | |||
af7f75e682 | |||
34ba5909a2 | |||
fbec904fb0 | |||
a1263dfe7b | |||
1878d6c4ff | |||
a6a18efe53 | |||
e581d2f2ce | |||
1f6822d114 | |||
edb70ae15c | |||
27bc371bea | |||
58c619e809 |
@ -32,7 +32,7 @@
|
||||
لتصدير نموذج 🤗 Transformers إلى ONNX، قم أولاً بتثبيت اعتماد إضافي:
|
||||
|
||||
```bash
|
||||
pip install optimum-onnx
|
||||
pip install optimum[exporters]
|
||||
```
|
||||
|
||||
للاطلاع على جميع المعامﻻت المتاحة، يرجى الرجوع إلى [وثائق 🤗 Optimum](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli)، أو عرض المساعدة في سطر الأوامر:
|
||||
@ -111,3 +111,60 @@ optimum-cli export onnx --model keras-io/transformers-qa distilbert_base_cased_s
|
||||
### تصدير نموذج لهندسة غير مدعومة
|
||||
|
||||
إذا كنت ترغب في المساهمة من خلال إضافة دعم لنموذج لا يُمكن تصديره حاليًا، فيجب عليك أولاً التحقق مما إذا كان مدعومًا في [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview)، وإذا لم يكن مدعومًا، [فيمكنك المساهمة في 🤗 Optimum](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute) مُباشرةً.
|
||||
|
||||
### تصدير نموذج باستخدام `transformers.onnx`
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
لم يعد يتم دعم `transformers.onnx` يُرجى تصدير النماذج باستخدام 🤗 Optimum كما هو موضح أعلاه. سيتم إزالة هذا القسم في الإصدارات القادمة.
|
||||
|
||||
</Tip>
|
||||
|
||||
لتصدير نموذج 🤗 Transformers إلى ONNX باستخدام `transformers.onnx`، ثبّت التبعيات الإضافية:
|
||||
|
||||
```bash
|
||||
pip install transformers[onnx]
|
||||
```
|
||||
|
||||
استخدم حزمة `transformers.onnx` كنموذج Python لتصدير نقطة حفظ باستخدام تكوين جاهز:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
|
||||
```
|
||||
|
||||
يُصدّر هذا رسمًا بيانيًا ONNX لنقطة الحفظ المُحددة بواسطة وسيطة `--model`. مرر أي نقطة حفظ على 🤗 Hub أو نقطة حفظ مُخزنة محليًا.
|
||||
يُمكن بعد ذلك تشغيل ملف `model.onnx` الناتج على أحد المُسرعات العديدة التي تدعم معيار ONNX. على سبيل المثال، قم بتحميل وتشغيل النموذج باستخدام ONNX Runtime كما يلي:
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoTokenizer
|
||||
>>> from onnxruntime import InferenceSession
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
>>> session = InferenceSession("onnx/model.onnx")
|
||||
>>> # يتوقع ONNX Runtime مصفوفات NumPy كمدخلات
|
||||
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
|
||||
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
|
||||
```
|
||||
|
||||
يُمكن الحصول على أسماء المخرجات المطلوبة (مثل `["last_hidden_state"]`) من خلال إلقاء نظرة على تكوين ONNX لكل نموذج. على سبيل المثال، بالنسبة لـ DistilBERT، لدينا:
|
||||
|
||||
```python
|
||||
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
|
||||
|
||||
>>> config = DistilBertConfig()
|
||||
>>> onnx_config = DistilBertOnnxConfig(config)
|
||||
>>> print(list(onnx_config.outputs.keys()))
|
||||
["last_hidden_state"]
|
||||
```
|
||||
|
||||
العمليات مُتطابقة لنقاط الحفظ TensorFlow على Hub. على سبيل المثال، صدّر نقطة حفظ TensorFlow خالصة كما يلي:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
|
||||
```
|
||||
|
||||
لتصدير نموذج مُخزن محليًا، احفظ أوزان النموذج ومجزىء اللغوى في نفس الدليل (على سبيل المثال `local-pt-checkpoint`)، ثم قم بتصديره إلى ONNX عن طريق توجيه وسيط `--model` لحزمة `transformers.onnx` إلى الدليل المطلوب:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=local-pt-checkpoint onnx/
|
||||
```
|
@ -33,7 +33,7 @@ Export a Transformers model to ONNX with the Optimum CLI or the `optimum.onnxrun
|
||||
Run the command below to install Optimum and the [exporters](https://huggingface.co/docs/optimum/exporters/overview) module.
|
||||
|
||||
```bash
|
||||
pip install optimum-onnx
|
||||
pip install optimum[exporters]
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
|
50
docs/source/ja/main_classes/onnx.md
Normal file
50
docs/source/ja/main_classes/onnx.md
Normal file
@ -0,0 +1,50 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# Exporting 🤗 Transformers models to ONNX
|
||||
|
||||
🤗 Transformers は `transformers.onnx` パッケージを提供します。
|
||||
設定オブジェクトを利用することで、モデルのチェックポイントをONNXグラフに変換することができます。
|
||||
|
||||
詳細は[ガイド](../serialization) を参照してください。
|
||||
を参照してください。
|
||||
|
||||
## ONNX Configurations
|
||||
|
||||
以下の3つの抽象クラスを提供しています。
|
||||
エクスポートしたいモデルアーキテクチャのタイプに応じて、継承すべき3つの抽象クラスを提供します:
|
||||
|
||||
* エンコーダーベースのモデルは [`~onnx.config.OnnxConfig`] を継承します。
|
||||
* デコーダーベースのモデルは [`~onnx.config.OnnxConfigWithPast`] を継承します。
|
||||
* エンコーダー・デコーダーモデルは [`~onnx.config.OnnxSeq2SeqConfigWithPast`] を継承しています。
|
||||
|
||||
|
||||
### OnnxConfig
|
||||
|
||||
[[autodoc]] onnx.config.OnnxConfig
|
||||
|
||||
### OnnxConfigWithPast
|
||||
|
||||
[[autodoc]] onnx.config.OnnxConfigWithPast
|
||||
|
||||
### OnnxSeq2SeqConfigWithPast
|
||||
|
||||
[[autodoc]] onnx.config.OnnxSeq2SeqConfigWithPast
|
||||
|
||||
## ONNX Features
|
||||
|
||||
各 ONNX 構成は、次のことを可能にする一連の _機能_ に関連付けられています。
|
||||
さまざまなタイプのトポロジまたはタスクのモデルをエクスポートします。
|
@ -47,7 +47,7 @@ ONNX形式にエクスポートされたモデルは、以下のように使用
|
||||
🤗 TransformersモデルをONNXにエクスポートするには、まず追加の依存関係をインストールしてください:
|
||||
|
||||
```bash
|
||||
pip install optimum-onnx
|
||||
pip install optimum[exporters]
|
||||
```
|
||||
|
||||
すべての利用可能な引数を確認するには、[🤗 Optimumドキュメント](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli)を参照してください。または、コマンドラインでヘルプを表示することもできます:
|
||||
@ -128,3 +128,64 @@ CLIの代わりに、🤗 TransformersモデルをONNXにプログラム的に
|
||||
### Exporting a model for an unsupported architecture
|
||||
|
||||
現在エクスポートできないモデルをサポートするために貢献したい場合、まず[`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview)でサポートされているかどうかを確認し、サポートされていない場合は[🤗 Optimumに貢献](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute)してください。
|
||||
|
||||
### Exporting a model with `transformers.onnx`
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
`transformers.onnx`はもはやメンテナンスされていないため、モデルを上記で説明したように🤗 Optimumでエクスポートしてください。このセクションは将来のバージョンで削除されます。
|
||||
|
||||
</Tip>
|
||||
|
||||
🤗 TransformersモデルをONNXにエクスポートするには、追加の依存関係をインストールしてください:
|
||||
|
||||
|
||||
```bash
|
||||
pip install transformers[onnx]
|
||||
```
|
||||
|
||||
`transformers.onnx`パッケージをPythonモジュールとして使用して、事前に用意された設定を使用してチェックポイントをエクスポートする方法は以下の通りです:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
|
||||
```
|
||||
|
||||
この方法は、`--model`引数で定義されたチェックポイントのONNXグラフをエクスポートします。🤗 Hubのいずれかのチェックポイントまたはローカルに保存されたチェックポイントを渡すことができます。エクスポートされた`model.onnx`ファイルは、ONNX標準をサポートする多くのアクセラレータで実行できます。例えば、ONNX Runtimeを使用してモデルを読み込んで実行する方法は以下の通りです:
|
||||
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoTokenizer
|
||||
>>> from onnxruntime import InferenceSession
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
>>> session = InferenceSession("onnx/model.onnx")
|
||||
>>> # ONNX Runtime expects NumPy arrays as input
|
||||
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
|
||||
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
|
||||
```
|
||||
|
||||
必要な出力名(例: `["last_hidden_state"]`)は、各モデルのONNX構成を確認することで取得できます。例えば、DistilBERTの場合、次のようになります:
|
||||
|
||||
|
||||
```python
|
||||
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
|
||||
|
||||
>>> config = DistilBertConfig()
|
||||
>>> onnx_config = DistilBertOnnxConfig(config)
|
||||
>>> print(list(onnx_config.outputs.keys()))
|
||||
["last_hidden_state"]
|
||||
```
|
||||
|
||||
ハブから純粋なTensorFlowのチェックポイントをプログラム的にエクスポートするプロセスは、以下のように同様です:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
|
||||
```
|
||||
|
||||
ローカルに保存されたモデルをエクスポートする場合、モデルの重みとトークナイザのファイルを同じディレクトリに保存してください(例: `local-pt-checkpoint`)。その後、`transformers.onnx`パッケージの `--model`引数を希望するディレクトリに向けて設定して、ONNXにエクスポートします:
|
||||
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=local-pt-checkpoint onnx/
|
||||
```
|
||||
|
||||
|
45
docs/source/ko/main_classes/onnx.md
Normal file
45
docs/source/ko/main_classes/onnx.md
Normal file
@ -0,0 +1,45 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# 🤗 Transformers 모델을 ONNX로 내보내기[[exporting--transformers-models-to-onnx]]
|
||||
|
||||
🤗 트랜스포머는 `transformers.onnx` 패키지를 제공하며, 이 패키지는 설정 객체를 활용하여 모델 체크포인트를 ONNX 그래프로 변환할 수 있게 합니다.
|
||||
|
||||
🤗 Transformers에 대한 자세한 내용은 [이 가이드](../serialization)를 참조하세요.
|
||||
|
||||
## ONNX 설정[[onnx-configurations]]
|
||||
|
||||
내보내려는(export) 모델 아키텍처의 유형에 따라 상속받아야 할 세 가지 추상 클래스를 제공합니다:
|
||||
|
||||
* 인코더 기반 모델은 [`~onnx.config.OnnxConfig`]을 상속받습니다.
|
||||
* 디코더 기반 모델은 [`~onnx.config.OnnxConfigWithPast`]을 상속받습니다.
|
||||
* 인코더-디코더 기반 모델은 [`~onnx.config.OnnxSeq2SeqConfigWithPast`]을 상속받습니다.
|
||||
|
||||
### OnnxConfig[[transformers.onnx.OnnxConfig]]
|
||||
|
||||
[[autodoc]] onnx.config.OnnxConfig
|
||||
|
||||
### OnnxConfigWithPast[[transformers.onnx.OnnxConfigWithPast]]
|
||||
|
||||
[[autodoc]] onnx.config.OnnxConfigWithPast
|
||||
|
||||
### OnnxSeq2SeqConfigWithPast[[OnnxSeq2SeqConfigWithPast]]
|
||||
|
||||
[[autodoc]] onnx.config.OnnxSeq2SeqConfigWithPast
|
||||
|
||||
## ONNX 특징[[onnx-features]]
|
||||
|
||||
각 ONNX 설정은 다양한 유형의 토폴로지나 작업에 대해 모델을 내보낼 수 있게(exporting) 해주는 _features_ 세트와 연관되어 있습니다.
|
@ -47,7 +47,7 @@ ONNX 형식으로 내보낸 모델은 다음과 같이 사용할 수 있습니
|
||||
🤗 Transformers 모델을 ONNX로 내보내려면 먼저 추가 종속성을 설치하세요:
|
||||
|
||||
```bash
|
||||
pip install optimum-onnx
|
||||
pip install optimum[exporters]
|
||||
```
|
||||
|
||||
사용 가능한 모든 인수를 확인하려면 [🤗 Optimum 문서](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli)를 참조하거나 명령줄에서 도움말을 보세요.
|
||||
@ -123,3 +123,59 @@ CLI 대신에 `optimum.onnxruntime`을 사용하여 프로그래밍 방식으로
|
||||
### 지원되지 않는 아키텍처의 모델 내보내기 [[exporting-a-model-for-an-unsupported-architecture]]
|
||||
|
||||
현재 내보낼 수 없는 모델을 지원하기 위해 기여하려면, 먼저 [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview)에서 지원되는지 확인한 후 지원되지 않는 경우에는 [🤗 Optimum에 기여](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute)하세요.
|
||||
|
||||
### `transformers.onnx`를 사용하여 모델 내보내기 [[exporting-a-model-with-transformersonnx]]
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
`tranformers.onnx`는 더 이상 유지되지 않습니다. 위에서 설명한 대로 🤗 Optimum을 사용하여 모델을 내보내세요. 이 섹션은 향후 버전에서 제거될 예정입니다.
|
||||
|
||||
</Tip>
|
||||
|
||||
🤗 Transformers 모델을 ONNX로 내보내려면 추가 종속성을 설치하세요:
|
||||
|
||||
```bash
|
||||
pip install transformers[onnx]
|
||||
```
|
||||
|
||||
`transformers.onnx` 패키지를 Python 모듈로 사용하여 준비된 구성을 사용하여 체크포인트를 내보냅니다:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
|
||||
```
|
||||
|
||||
이렇게 하면 `--model` 인수에 정의된 체크포인트의 ONNX 그래프가 내보내집니다. 🤗 Hub에서 제공하는 체크포인트나 로컬에 저장된 체크포인트를 전달할 수 있습니다. 결과로 생성된 `model.onnx` 파일은 ONNX 표준을 지원하는 많은 가속기 중 하나에서 실행할 수 있습니다. 예를 들어, 다음과 같이 ONNX Runtime을 사용하여 모델을 로드하고 실행할 수 있습니다:
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoTokenizer
|
||||
>>> from onnxruntime import InferenceSession
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
>>> session = InferenceSession("onnx/model.onnx")
|
||||
>>> # ONNX Runtime expects NumPy arrays as input
|
||||
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
|
||||
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
|
||||
```
|
||||
|
||||
필요한 출력 이름(예: `["last_hidden_state"]`)은 각 모델의 ONNX 구성을 확인하여 얻을 수 있습니다. 예를 들어, DistilBERT의 경우 다음과 같습니다:
|
||||
|
||||
```python
|
||||
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
|
||||
|
||||
>>> config = DistilBertConfig()
|
||||
>>> onnx_config = DistilBertOnnxConfig(config)
|
||||
>>> print(list(onnx_config.outputs.keys()))
|
||||
["last_hidden_state"]
|
||||
```
|
||||
|
||||
Hub의 TensorFlow 체크포인트에 대해서도 동일한 프로세스가 적용됩니다. 예를 들어, 다음과 같이 순수한 TensorFlow 체크포인트를 내보냅니다:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
|
||||
```
|
||||
|
||||
로컬에 저장된 모델을 내보내려면 모델의 가중치 파일과 토크나이저 파일을 동일한 디렉토리에 저장한 다음, transformers.onnx 패키지의 --model 인수를 원하는 디렉토리로 지정하여 ONNX로 내보냅니다:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=local-pt-checkpoint onnx/
|
||||
```
|
45
docs/source/zh/main_classes/onnx.md
Normal file
45
docs/source/zh/main_classes/onnx.md
Normal file
@ -0,0 +1,45 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
# 导出 🤗 Transformers 模型到 ONNX
|
||||
|
||||
🤗 Transformers提供了一个`transformers.onnx`包,通过利用配置对象,您可以将模型checkpoints转换为ONNX图。
|
||||
|
||||
有关更多详细信息,请参阅导出 🤗 Transformers 模型的[指南](../serialization)。
|
||||
|
||||
## ONNX Configurations
|
||||
|
||||
我们提供了三个抽象类,取决于您希望导出的模型架构类型:
|
||||
|
||||
* 基于编码器的模型继承 [`~onnx.config.OnnxConfig`]
|
||||
* 基于解码器的模型继承 [`~onnx.config.OnnxConfigWithPast`]
|
||||
* 编码器-解码器模型继承 [`~onnx.config.OnnxSeq2SeqConfigWithPast`]
|
||||
|
||||
### OnnxConfig
|
||||
|
||||
[[autodoc]] onnx.config.OnnxConfig
|
||||
|
||||
### OnnxConfigWithPast
|
||||
|
||||
[[autodoc]] onnx.config.OnnxConfigWithPast
|
||||
|
||||
### OnnxSeq2SeqConfigWithPast
|
||||
|
||||
[[autodoc]] onnx.config.OnnxSeq2SeqConfigWithPast
|
||||
|
||||
## ONNX Features
|
||||
|
||||
每个ONNX配置与一组 _特性_ 相关联,使您能够为不同类型的拓扑结构或任务导出模型。
|
@ -47,7 +47,7 @@ rendered properly in your Markdown viewer.
|
||||
要将 🤗 Transformers 模型导出为 ONNX,首先需要安装额外的依赖项:
|
||||
|
||||
```bash
|
||||
pip install optimum-onnx
|
||||
pip install optimum[exporters]
|
||||
```
|
||||
|
||||
请参阅 [🤗 Optimum 文档](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli) 以查看所有可用参数,或者在命令行中查看帮助:
|
||||
@ -117,3 +117,53 @@ optimum-cli export onnx --model local_path --task question-answering distilbert_
|
||||
### 导出尚未支持的架构的模型
|
||||
|
||||
如果你想要为当前无法导出的模型添加支持,请先检查 [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview) 是否支持该模型,如果不支持,你可以 [直接为 🤗 Optimum 贡献代码](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute)。
|
||||
|
||||
### 使用 `transformers.onnx` 导出模型
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
`transformers.onnx` 不再进行维护,请如上所述,使用 🤗 Optimum 导出模型。这部分内容将在未来版本中删除。
|
||||
|
||||
</Tip>
|
||||
|
||||
要使用 `transformers.onnx` 将 🤗 Transformers 模型导出为 ONNX,请安装额外的依赖项:
|
||||
|
||||
```bash
|
||||
pip install transformers[onnx]
|
||||
```
|
||||
|
||||
将 `transformers.onnx` 包作为 Python 模块使用,以使用现成的配置导出检查点:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
|
||||
```
|
||||
|
||||
以上代码将导出由 `--model` 参数定义的检查点的 ONNX 图。传入任何 🤗 Hub 上或者存储与本地的检查点。生成的 `model.onnx` 文件可以在支持 ONNX 标准的众多加速引擎上运行。例如,使用 ONNX Runtime 加载并运行模型,如下所示:
|
||||
|
||||
```python
|
||||
>>> from transformers import AutoTokenizer
|
||||
>>> from onnxruntime import InferenceSession
|
||||
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
|
||||
>>> session = InferenceSession("onnx/model.onnx")
|
||||
>>> # ONNX Runtime expects NumPy arrays as input
|
||||
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
|
||||
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
|
||||
```
|
||||
|
||||
可以通过查看每个模型的 ONNX 配置来获取所需的输出名(例如 `["last_hidden_state"]`)。例如,对于 DistilBERT,可以用以下代码获取输出名称:
|
||||
|
||||
```python
|
||||
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
|
||||
|
||||
>>> config = DistilBertConfig()
|
||||
>>> onnx_config = DistilBertOnnxConfig(config)
|
||||
>>> print(list(onnx_config.outputs.keys()))
|
||||
["last_hidden_state"]
|
||||
```
|
||||
|
||||
要导出本地存储的模型,请将模型的权重和分词器文件保存在同一目录中(例如 `local-pt-checkpoint`),然后通过将 `transformers.onnx` 包的 `--model` 参数指向该目录,将其导出为 ONNX:
|
||||
|
||||
```bash
|
||||
python -m transformers.onnx --model=local-pt-checkpoint onnx/
|
||||
```
|
||||
|
@ -129,6 +129,8 @@ _import_structure = {
|
||||
],
|
||||
"loss": [],
|
||||
"modelcard": ["ModelCard"],
|
||||
# Models
|
||||
"onnx": [],
|
||||
"pipelines": [
|
||||
"AudioClassificationPipeline",
|
||||
"AutomaticSpeechRecognitionPipeline",
|
||||
|
@ -442,6 +442,75 @@ def normalize(
|
||||
return image
|
||||
|
||||
|
||||
def unnormalize(
|
||||
image: Union[np.ndarray, "torch.Tensor"],
|
||||
mean: Union[float, Collection[float]],
|
||||
std: Union[float, Collection[float]],
|
||||
data_format: Optional[ChannelDimension] = None,
|
||||
input_data_format: Optional[Union[str, ChannelDimension]] = None,
|
||||
) -> np.ndarray:
|
||||
"""
|
||||
Inverse of `normalize`:
|
||||
|
||||
image = image * std + mean
|
||||
|
||||
Args:
|
||||
image (`np.ndarray` or `torch.Tensor`):
|
||||
The image to unnormalize.
|
||||
mean (`float` or `Collection[float]`):
|
||||
The mean to use for unnormalization.
|
||||
std (`float` or `Collection[float]`):
|
||||
The standard deviation to use for unnormalization.
|
||||
data_format (`ChannelDimension`, *optional*):
|
||||
The channel dimension format of the output image. If unset, will use the inferred format from the input.
|
||||
input_data_format (`ChannelDimension`, *optional*):
|
||||
The channel dimension format of the input image. If unset, will use the inferred format from the input.
|
||||
|
||||
Returns:
|
||||
`np.ndarray`: The unnormalized image.
|
||||
"""
|
||||
is_torch_input = isinstance(image, torch.Tensor)
|
||||
if is_torch_input:
|
||||
image = image.detach().cpu().numpy()
|
||||
elif not isinstance(image, np.ndarray):
|
||||
raise TypeError("image must be a numpy array or a torch tensor")
|
||||
|
||||
if input_data_format is None:
|
||||
input_data_format = infer_channel_dimension_format(image)
|
||||
|
||||
if not np.issubdtype(image.dtype, np.floating):
|
||||
image = image.astype(np.float32)
|
||||
|
||||
channel_axis = get_channel_dimension_axis(image, input_data_format=input_data_format)
|
||||
num_channels = image.shape[channel_axis]
|
||||
|
||||
if isinstance(mean, Collection):
|
||||
if len(mean) != num_channels:
|
||||
raise ValueError(f"mean must have {num_channels} elements if it is an iterable, got {len(mean)}")
|
||||
else:
|
||||
mean = [mean] * num_channels
|
||||
mean = np.array(mean, dtype=image.dtype)
|
||||
|
||||
if isinstance(std, Collection):
|
||||
if len(std) != num_channels:
|
||||
raise ValueError(f"std must have {num_channels} elements if it is an iterable, got {len(std)}")
|
||||
else:
|
||||
std = [std] * num_channels
|
||||
std = np.array(std, dtype=image.dtype)
|
||||
|
||||
if input_data_format == ChannelDimension.LAST:
|
||||
image = image * std + mean
|
||||
else:
|
||||
shape = [1] * image.ndim
|
||||
shape[channel_axis] = num_channels
|
||||
mean = mean.reshape(shape)
|
||||
std = std.reshape(shape)
|
||||
image = image * std + mean
|
||||
|
||||
image = to_channel_dimension_format(image, data_format, input_data_format) if data_format is not None else image
|
||||
return image
|
||||
|
||||
|
||||
def center_crop(
|
||||
image: np.ndarray,
|
||||
size: tuple[int, int],
|
||||
|
@ -15,7 +15,11 @@
|
||||
# limitations under the License.
|
||||
"""ALBERT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
|
||||
|
||||
class AlbertConfig(PreTrainedConfig):
|
||||
@ -138,4 +142,21 @@ class AlbertConfig(PreTrainedConfig):
|
||||
self.classifier_dropout_prob = classifier_dropout_prob
|
||||
|
||||
|
||||
__all__ = ["AlbertConfig"]
|
||||
# Copied from transformers.models.bert.configuration_bert.BertOnnxConfig with Roberta->Albert
|
||||
class AlbertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["AlbertConfig", "AlbertOnnxConfig"]
|
||||
|
@ -15,9 +15,15 @@
|
||||
"""BART model configuration"""
|
||||
|
||||
import warnings
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any
|
||||
|
||||
from ... import PreTrainedTokenizer
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...utils import logging
|
||||
from ...onnx import OnnxConfig, OnnxConfigWithPast, OnnxSeq2SeqConfigWithPast
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...utils import is_torch_available, logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
@ -174,4 +180,223 @@ class BartConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BartConfig"]
|
||||
class BartOnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
elif self.task == "causal-lm":
|
||||
# TODO: figure this case out.
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_inputs[f"past_key_values.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_inputs[f"past_key_values.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
("decoder_input_ids", {0: "batch", 1: "decoder_sequence"}),
|
||||
("decoder_attention_mask", {0: "batch", 1: "decoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_outputs = super().outputs
|
||||
else:
|
||||
common_outputs = super(OnnxConfigWithPast, self).outputs
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_outputs[f"present.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_outputs[f"present.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
return common_outputs
|
||||
|
||||
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
encoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
# Generate decoder inputs
|
||||
decoder_seq_length = seq_length if not self.use_past else 1
|
||||
decoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, decoder_seq_length, is_pair
|
||||
)
|
||||
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
|
||||
common_inputs = dict(**encoder_inputs, **decoder_inputs)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, encoder_seq_length = common_inputs["input_ids"].shape
|
||||
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
|
||||
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
|
||||
encoder_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
encoder_seq_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
decoder_past_length = decoder_seq_length + 3
|
||||
decoder_shape = (
|
||||
batch,
|
||||
num_decoder_attention_heads,
|
||||
decoder_past_length,
|
||||
self._config.hidden_size // num_decoder_attention_heads,
|
||||
)
|
||||
|
||||
common_inputs["decoder_attention_mask"] = torch.cat(
|
||||
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
|
||||
)
|
||||
|
||||
common_inputs["past_key_values"] = []
|
||||
# If the number of encoder and decoder layers are present in the model configuration, both are considered
|
||||
num_encoder_layers, num_decoder_layers = self.num_layers
|
||||
min_num_layers = min(num_encoder_layers, num_decoder_layers)
|
||||
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
|
||||
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
|
||||
|
||||
for _ in range(min_num_layers):
|
||||
common_inputs["past_key_values"].append(
|
||||
(
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
)
|
||||
)
|
||||
# TODO: test this.
|
||||
shape = encoder_shape if remaining_side_name == "encoder" else decoder_shape
|
||||
for _ in range(min_num_layers, max_num_layers):
|
||||
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_causal_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
num_encoder_attention_heads, _ = self.num_attention_heads
|
||||
past_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
|
||||
mask_dtype = common_inputs["attention_mask"].dtype
|
||||
common_inputs["attention_mask"] = torch.cat(
|
||||
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
common_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(num_encoder_layers)
|
||||
]
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
# Copied from OnnxConfig.generate_dummy_inputs
|
||||
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
|
||||
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
|
||||
return common_inputs
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = self._generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
elif self.task == "causal-lm":
|
||||
common_inputs = self._generate_dummy_inputs_for_causal_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
else:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
flattened_output = super()._flatten_past_key_values_(flattened_output, name, idx, t)
|
||||
else:
|
||||
flattened_output = super(OnnxSeq2SeqConfigWithPast, self)._flatten_past_key_values_(
|
||||
flattened_output, name, idx, t
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BartConfig", "BartOnnxConfig"]
|
||||
|
@ -15,8 +15,13 @@
|
||||
"""BEiT model configuration"""
|
||||
|
||||
import warnings
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils.backbone_utils import BackboneConfigMixin, get_aligned_output_features_output_indices
|
||||
|
||||
|
||||
@ -204,4 +209,21 @@ class BeitConfig(BackboneConfigMixin, PreTrainedConfig):
|
||||
self.reshape_hidden_states = reshape_hidden_states
|
||||
|
||||
|
||||
__all__ = ["BeitConfig"]
|
||||
# Copied from transformers.models.vit.configuration_vit.ViTOnnxConfig
|
||||
class BeitOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["BeitConfig", "BeitOnnxConfig"]
|
||||
|
@ -15,7 +15,11 @@
|
||||
# limitations under the License.
|
||||
"""BERT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -123,4 +127,20 @@ class BertConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["BertConfig"]
|
||||
class BertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BertConfig", "BertOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""BigBird model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -154,4 +158,19 @@ class BigBirdConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["BigBirdConfig"]
|
||||
class BigBirdOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BigBirdConfig", "BigBirdOnnxConfig"]
|
||||
|
@ -14,8 +14,15 @@
|
||||
# limitations under the License.
|
||||
"""BigBirdPegasus model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any
|
||||
|
||||
from ... import PreTrainedTokenizer
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...utils import logging
|
||||
from ...onnx import OnnxConfig, OnnxConfigWithPast, OnnxSeq2SeqConfigWithPast
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...utils import is_torch_available, logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
@ -179,4 +186,224 @@ class BigBirdPegasusConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BigBirdPegasusConfig"]
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig with Bart->BigBirdPegasus
|
||||
class BigBirdPegasusOnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
elif self.task == "causal-lm":
|
||||
# TODO: figure this case out.
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_inputs[f"past_key_values.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_inputs[f"past_key_values.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
("decoder_input_ids", {0: "batch", 1: "decoder_sequence"}),
|
||||
("decoder_attention_mask", {0: "batch", 1: "decoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_outputs = super().outputs
|
||||
else:
|
||||
common_outputs = super(OnnxConfigWithPast, self).outputs
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_outputs[f"present.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_outputs[f"present.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
return common_outputs
|
||||
|
||||
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
encoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
# Generate decoder inputs
|
||||
decoder_seq_length = seq_length if not self.use_past else 1
|
||||
decoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, decoder_seq_length, is_pair
|
||||
)
|
||||
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
|
||||
common_inputs = dict(**encoder_inputs, **decoder_inputs)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, encoder_seq_length = common_inputs["input_ids"].shape
|
||||
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
|
||||
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
|
||||
encoder_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
encoder_seq_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
decoder_past_length = decoder_seq_length + 3
|
||||
decoder_shape = (
|
||||
batch,
|
||||
num_decoder_attention_heads,
|
||||
decoder_past_length,
|
||||
self._config.hidden_size // num_decoder_attention_heads,
|
||||
)
|
||||
|
||||
common_inputs["decoder_attention_mask"] = torch.cat(
|
||||
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
|
||||
)
|
||||
|
||||
common_inputs["past_key_values"] = []
|
||||
# If the number of encoder and decoder layers are present in the model configuration, both are considered
|
||||
num_encoder_layers, num_decoder_layers = self.num_layers
|
||||
min_num_layers = min(num_encoder_layers, num_decoder_layers)
|
||||
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
|
||||
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
|
||||
|
||||
for _ in range(min_num_layers):
|
||||
common_inputs["past_key_values"].append(
|
||||
(
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
)
|
||||
)
|
||||
# TODO: test this.
|
||||
shape = encoder_shape if remaining_side_name == "encoder" else decoder_shape
|
||||
for _ in range(min_num_layers, max_num_layers):
|
||||
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_causal_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
num_encoder_attention_heads, _ = self.num_attention_heads
|
||||
past_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
|
||||
mask_dtype = common_inputs["attention_mask"].dtype
|
||||
common_inputs["attention_mask"] = torch.cat(
|
||||
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
common_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(num_encoder_layers)
|
||||
]
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
# Copied from OnnxConfig.generate_dummy_inputs
|
||||
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
|
||||
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
|
||||
return common_inputs
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = self._generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
elif self.task == "causal-lm":
|
||||
common_inputs = self._generate_dummy_inputs_for_causal_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
else:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
flattened_output = super()._flatten_past_key_values_(flattened_output, name, idx, t)
|
||||
else:
|
||||
flattened_output = super(OnnxSeq2SeqConfigWithPast, self)._flatten_past_key_values_(
|
||||
flattened_output, name, idx, t
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BigBirdPegasusConfig", "BigBirdPegasusOnnxConfig"]
|
||||
|
@ -14,7 +14,15 @@
|
||||
# limitations under the License.
|
||||
"""Blenderbot model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any
|
||||
|
||||
from ... import PreTrainedTokenizer
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...file_utils import is_torch_available
|
||||
from ...onnx import OnnxConfig, OnnxConfigWithPast, OnnxSeq2SeqConfigWithPast
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -158,4 +166,227 @@ class BlenderbotConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BlenderbotConfig"]
|
||||
class BlenderbotOnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
if self.use_past:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
elif self.task == "causal-lm":
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
if self.use_past:
|
||||
_, num_decoder_layers = self.num_layers
|
||||
for i in range(num_decoder_layers):
|
||||
common_inputs[f"past_key_values.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_inputs[f"past_key_values.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
("decoder_input_ids", {0: "batch", 1: "decoder_sequence"}),
|
||||
("decoder_attention_mask", {0: "batch", 1: "decoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig.outputs
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_outputs = super().outputs
|
||||
else:
|
||||
common_outputs = super(OnnxConfigWithPast, self).outputs
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_outputs[f"present.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_outputs[f"present.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
return common_outputs
|
||||
|
||||
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
encoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
# Generate decoder inputs
|
||||
decoder_seq_length = seq_length if not self.use_past else 1
|
||||
decoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, decoder_seq_length, is_pair
|
||||
)
|
||||
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
|
||||
common_inputs = dict(**encoder_inputs, **decoder_inputs)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, encoder_seq_length = common_inputs["input_ids"].shape
|
||||
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
|
||||
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
|
||||
encoder_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
encoder_seq_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
decoder_past_length = decoder_seq_length
|
||||
decoder_shape = (
|
||||
batch,
|
||||
num_decoder_attention_heads,
|
||||
decoder_past_length,
|
||||
self._config.hidden_size // num_decoder_attention_heads,
|
||||
)
|
||||
common_inputs["decoder_attention_mask"] = torch.cat(
|
||||
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
|
||||
)
|
||||
common_inputs["past_key_values"] = []
|
||||
_, num_decoder_layers = self.num_layers
|
||||
|
||||
for _ in range(num_decoder_layers):
|
||||
common_inputs["past_key_values"].append(
|
||||
(
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
)
|
||||
)
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_causal_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
past_key_values_length = seqlen
|
||||
_, num_decoder_layers = self.num_layers
|
||||
num_encoder_attention_heads, _ = self.num_attention_heads
|
||||
past_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
mask_dtype = common_inputs["attention_mask"].dtype
|
||||
common_inputs["attention_mask"] = torch.cat(
|
||||
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
common_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(num_decoder_layers)
|
||||
]
|
||||
return common_inputs
|
||||
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig._generate_dummy_inputs_for_sequence_classification_and_question_answering
|
||||
def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
# Copied from OnnxConfig.generate_dummy_inputs
|
||||
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
|
||||
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
|
||||
return common_inputs
|
||||
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig.generate_dummy_inputs
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = self._generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
elif self.task == "causal-lm":
|
||||
common_inputs = self._generate_dummy_inputs_for_causal_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
else:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig._flatten_past_key_values_
|
||||
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
flattened_output = super()._flatten_past_key_values_(flattened_output, name, idx, t)
|
||||
else:
|
||||
flattened_output = super(OnnxSeq2SeqConfigWithPast, self)._flatten_past_key_values_(
|
||||
flattened_output, name, idx, t
|
||||
)
|
||||
|
||||
def fill_with_past_key_values_(self, inputs_or_outputs: Mapping[str, Mapping[int, str]], direction: str):
|
||||
if direction not in ["inputs", "outputs"]:
|
||||
raise ValueError(f'direction must either be "inputs" or "outputs", but {direction} was given')
|
||||
|
||||
name = "past_key_values" if direction == "inputs" else "present"
|
||||
_, num_decoder_layers = self.num_layers
|
||||
|
||||
encoder_sequence = "past_encoder_sequence"
|
||||
decoder_sequence = "past_decoder_sequence" if direction == "inputs" else "past_decoder_sequence + sequence"
|
||||
|
||||
for i in range(num_decoder_layers):
|
||||
inputs_or_outputs[f"{name}.{i}.decoder.key"] = {0: "batch", 2: decoder_sequence}
|
||||
inputs_or_outputs[f"{name}.{i}.decoder.value"] = {0: "batch", 2: decoder_sequence}
|
||||
inputs_or_outputs[f"{name}.{i}.encoder.key"] = {0: "batch", 2: encoder_sequence}
|
||||
inputs_or_outputs[f"{name}.{i}.encoder.value"] = {0: "batch", 2: encoder_sequence}
|
||||
|
||||
|
||||
__all__ = ["BlenderbotConfig", "BlenderbotOnnxConfig"]
|
||||
|
@ -14,7 +14,15 @@
|
||||
# limitations under the License.
|
||||
"""BlenderbotSmall model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any
|
||||
|
||||
from ... import PreTrainedTokenizer
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...file_utils import is_torch_available
|
||||
from ...onnx import OnnxConfig, OnnxConfigWithPast, OnnxSeq2SeqConfigWithPast
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -156,4 +164,224 @@ class BlenderbotSmallConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BlenderbotSmallConfig"]
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig with Bart->BlenderbotSmall
|
||||
class BlenderbotSmallOnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
elif self.task == "causal-lm":
|
||||
# TODO: figure this case out.
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_inputs[f"past_key_values.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_inputs[f"past_key_values.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
("decoder_input_ids", {0: "batch", 1: "decoder_sequence"}),
|
||||
("decoder_attention_mask", {0: "batch", 1: "decoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_outputs = super().outputs
|
||||
else:
|
||||
common_outputs = super(OnnxConfigWithPast, self).outputs
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_outputs[f"present.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_outputs[f"present.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
return common_outputs
|
||||
|
||||
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
encoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
# Generate decoder inputs
|
||||
decoder_seq_length = seq_length if not self.use_past else 1
|
||||
decoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, decoder_seq_length, is_pair
|
||||
)
|
||||
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
|
||||
common_inputs = dict(**encoder_inputs, **decoder_inputs)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, encoder_seq_length = common_inputs["input_ids"].shape
|
||||
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
|
||||
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
|
||||
encoder_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
encoder_seq_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
decoder_past_length = decoder_seq_length + 3
|
||||
decoder_shape = (
|
||||
batch,
|
||||
num_decoder_attention_heads,
|
||||
decoder_past_length,
|
||||
self._config.hidden_size // num_decoder_attention_heads,
|
||||
)
|
||||
|
||||
common_inputs["decoder_attention_mask"] = torch.cat(
|
||||
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
|
||||
)
|
||||
|
||||
common_inputs["past_key_values"] = []
|
||||
# If the number of encoder and decoder layers are present in the model configuration, both are considered
|
||||
num_encoder_layers, num_decoder_layers = self.num_layers
|
||||
min_num_layers = min(num_encoder_layers, num_decoder_layers)
|
||||
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
|
||||
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
|
||||
|
||||
for _ in range(min_num_layers):
|
||||
common_inputs["past_key_values"].append(
|
||||
(
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
)
|
||||
)
|
||||
# TODO: test this.
|
||||
shape = encoder_shape if remaining_side_name == "encoder" else decoder_shape
|
||||
for _ in range(min_num_layers, max_num_layers):
|
||||
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_causal_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
num_encoder_attention_heads, _ = self.num_attention_heads
|
||||
past_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
|
||||
mask_dtype = common_inputs["attention_mask"].dtype
|
||||
common_inputs["attention_mask"] = torch.cat(
|
||||
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
common_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(num_encoder_layers)
|
||||
]
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
# Copied from OnnxConfig.generate_dummy_inputs
|
||||
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
|
||||
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
|
||||
return common_inputs
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = self._generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
elif self.task == "causal-lm":
|
||||
common_inputs = self._generate_dummy_inputs_for_causal_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
else:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
flattened_output = super()._flatten_past_key_values_(flattened_output, name, idx, t)
|
||||
else:
|
||||
flattened_output = super(OnnxSeq2SeqConfigWithPast, self)._flatten_past_key_values_(
|
||||
flattened_output, name, idx, t
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["BlenderbotSmallConfig", "BlenderbotSmallOnnxConfig"]
|
||||
|
@ -14,8 +14,19 @@
|
||||
# limitations under the License.
|
||||
"""Bloom configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any, Optional
|
||||
|
||||
from packaging import version
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ... import PreTrainedTokenizer
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...utils import logging
|
||||
from ...onnx import OnnxConfigWithPast, PatchingSpec
|
||||
from ...utils import is_torch_available, logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
@ -131,4 +142,99 @@ class BloomConfig(PreTrainedConfig):
|
||||
super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
|
||||
|
||||
|
||||
__all__ = ["BloomConfig"]
|
||||
class BloomOnnxConfig(OnnxConfigWithPast):
|
||||
torch_onnx_minimum_version = version.parse("1.12")
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
config: PreTrainedConfig,
|
||||
task: str = "default",
|
||||
patching_specs: Optional[list[PatchingSpec]] = None,
|
||||
use_past: bool = False,
|
||||
):
|
||||
super().__init__(config, task=task, patching_specs=patching_specs, use_past=use_past)
|
||||
if not getattr(self._config, "pad_token_id", None):
|
||||
# TODO: how to do that better?
|
||||
self._config.pad_token_id = 0
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = OrderedDict({"input_ids": {0: "batch", 1: "sequence"}})
|
||||
if self.use_past:
|
||||
# BLOOM stores values on dynamic axis 2. For more details see: https://github.com/huggingface/transformers/pull/18344
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs", inverted_values_shape=True)
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "sequence"}
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def num_layers(self) -> int:
|
||||
return self._config.n_layer
|
||||
|
||||
@property
|
||||
def num_attention_heads(self) -> int:
|
||||
return self._config.n_head
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-3
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: "PreTrainedTokenizer",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = super(OnnxConfigWithPast, self).generate_dummy_inputs(
|
||||
tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
|
||||
# We need to order the input in the way they appears in the forward()
|
||||
ordered_inputs = OrderedDict({"input_ids": common_inputs["input_ids"]})
|
||||
|
||||
# Need to add the past_keys
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
head_dim = self._config.hidden_size // self.num_attention_heads
|
||||
past_key_shape = (
|
||||
batch * self.num_attention_heads,
|
||||
head_dim,
|
||||
past_key_values_length,
|
||||
)
|
||||
past_value_shape = (
|
||||
batch * self.num_attention_heads,
|
||||
past_key_values_length,
|
||||
head_dim,
|
||||
)
|
||||
ordered_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_key_shape), torch.zeros(past_value_shape)) for _ in range(self.num_layers)
|
||||
]
|
||||
|
||||
ordered_inputs["attention_mask"] = common_inputs["attention_mask"]
|
||||
if self.use_past:
|
||||
mask_dtype = ordered_inputs["attention_mask"].dtype
|
||||
ordered_inputs["attention_mask"] = torch.cat(
|
||||
[ordered_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
|
||||
return ordered_inputs
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
|
||||
__all__ = ["BloomConfig", "BloomOnnxConfig"]
|
||||
|
@ -15,7 +15,11 @@
|
||||
# limitations under the License.
|
||||
"""CamemBERT configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -125,4 +129,19 @@ class CamembertConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["CamembertConfig"]
|
||||
class CamembertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["CamembertConfig", "CamembertOnnxConfig"]
|
||||
|
@ -14,7 +14,16 @@
|
||||
# limitations under the License.
|
||||
"""Chinese-CLIP model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ...processing_utils import ProcessorMixin
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -359,4 +368,52 @@ class ChineseCLIPConfig(PreTrainedConfig):
|
||||
super().__init__(**kwargs)
|
||||
|
||||
|
||||
__all__ = ["ChineseCLIPConfig", "ChineseCLIPTextConfig", "ChineseCLIPVisionConfig"]
|
||||
class ChineseCLIPOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
("attention_mask", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("logits_per_image", {0: "batch"}),
|
||||
("logits_per_text", {0: "batch"}),
|
||||
("text_embeds", {0: "batch"}),
|
||||
("image_embeds", {0: "batch"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
processor: "ProcessorMixin",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
) -> Mapping[str, Any]:
|
||||
text_input_dict = super().generate_dummy_inputs(
|
||||
processor.tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
)
|
||||
image_input_dict = super().generate_dummy_inputs(
|
||||
processor.image_processor,
|
||||
batch_size=batch_size,
|
||||
)
|
||||
return {**text_input_dict, **image_input_dict}
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 14
|
||||
|
||||
|
||||
__all__ = ["ChineseCLIPConfig", "ChineseCLIPOnnxConfig", "ChineseCLIPTextConfig", "ChineseCLIPVisionConfig"]
|
||||
|
@ -14,7 +14,16 @@
|
||||
# limitations under the License.
|
||||
"""CLIP model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ...processing_utils import ProcessorMixin
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -355,4 +364,52 @@ class CLIPConfig(PreTrainedConfig):
|
||||
super().__init__(**kwargs)
|
||||
|
||||
|
||||
__all__ = ["CLIPConfig", "CLIPTextConfig", "CLIPVisionConfig"]
|
||||
class CLIPOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
("attention_mask", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("logits_per_image", {0: "batch"}),
|
||||
("logits_per_text", {0: "batch"}),
|
||||
("text_embeds", {0: "batch"}),
|
||||
("image_embeds", {0: "batch"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
processor: "ProcessorMixin",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
) -> Mapping[str, Any]:
|
||||
text_input_dict = super().generate_dummy_inputs(
|
||||
processor.tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
)
|
||||
image_input_dict = super().generate_dummy_inputs(
|
||||
processor.image_processor,
|
||||
batch_size=batch_size,
|
||||
)
|
||||
return {**text_input_dict, **image_input_dict}
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 14
|
||||
|
||||
|
||||
__all__ = ["CLIPConfig", "CLIPOnnxConfig", "CLIPTextConfig", "CLIPVisionConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""CodeGen model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any, Optional
|
||||
|
||||
from ... import PreTrainedTokenizer, is_torch_available
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfigWithPast, PatchingSpec
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -140,4 +146,85 @@ class CodeGenConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["CodeGenConfig"]
|
||||
# Copied from transformers.models.gpt2.configuration_gpt2.GPT2OnnxConfig with GPT2->CodeGen
|
||||
class CodeGenOnnxConfig(OnnxConfigWithPast):
|
||||
def __init__(
|
||||
self,
|
||||
config: PreTrainedConfig,
|
||||
task: str = "default",
|
||||
patching_specs: Optional[list[PatchingSpec]] = None,
|
||||
use_past: bool = False,
|
||||
):
|
||||
super().__init__(config, task=task, patching_specs=patching_specs, use_past=use_past)
|
||||
if not getattr(self._config, "pad_token_id", None):
|
||||
# TODO: how to do that better?
|
||||
self._config.pad_token_id = 0
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = OrderedDict({"input_ids": {0: "batch", 1: "sequence"}})
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "sequence"}
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def num_layers(self) -> int:
|
||||
return self._config.n_layer
|
||||
|
||||
@property
|
||||
def num_attention_heads(self) -> int:
|
||||
return self._config.n_head
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = super(OnnxConfigWithPast, self).generate_dummy_inputs(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
# We need to order the input in the way they appears in the forward()
|
||||
ordered_inputs = OrderedDict({"input_ids": common_inputs["input_ids"]})
|
||||
|
||||
# Need to add the past_keys
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
past_shape = (
|
||||
batch,
|
||||
self.num_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // self.num_attention_heads,
|
||||
)
|
||||
ordered_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(self.num_layers)
|
||||
]
|
||||
|
||||
ordered_inputs["attention_mask"] = common_inputs["attention_mask"]
|
||||
if self.use_past:
|
||||
mask_dtype = ordered_inputs["attention_mask"].dtype
|
||||
ordered_inputs["attention_mask"] = torch.cat(
|
||||
[ordered_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
|
||||
return ordered_inputs
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
|
||||
__all__ = ["CodeGenConfig", "CodeGenOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""Conditional DETR model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
from ...utils.backbone_utils import verify_backbone_config_arguments
|
||||
from ..auto import CONFIG_MAPPING, AutoConfig
|
||||
@ -241,4 +247,25 @@ class ConditionalDetrConfig(PreTrainedConfig):
|
||||
super().__init__(is_encoder_decoder=is_encoder_decoder, **kwargs)
|
||||
|
||||
|
||||
__all__ = ["ConditionalDetrConfig"]
|
||||
class ConditionalDetrOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
("pixel_mask", {0: "batch"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-5
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
|
||||
__all__ = ["ConditionalDetrConfig", "ConditionalDetrOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""ConvBERT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -136,4 +140,21 @@ class ConvBertConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["ConvBertConfig"]
|
||||
# Copied from transformers.models.bert.configuration_bert.BertOnnxConfig
|
||||
class ConvBertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["ConvBertConfig", "ConvBertOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""ConvNeXT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
from ...utils.backbone_utils import BackboneConfigMixin, get_aligned_output_features_output_indices
|
||||
|
||||
@ -117,4 +123,20 @@ class ConvNextConfig(BackboneConfigMixin, PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["ConvNextConfig"]
|
||||
class ConvNextOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-5
|
||||
|
||||
|
||||
__all__ = ["ConvNextConfig", "ConvNextOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""Data2VecText configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -124,4 +128,19 @@ class Data2VecTextConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["Data2VecTextConfig"]
|
||||
class Data2VecTextOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["Data2VecTextConfig", "Data2VecTextOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""Data2VecVision model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -168,4 +174,21 @@ class Data2VecVisionConfig(PreTrainedConfig):
|
||||
self.semantic_loss_ignore_index = semantic_loss_ignore_index
|
||||
|
||||
|
||||
__all__ = ["Data2VecVisionConfig"]
|
||||
# Copied from transformers.models.vit.configuration_vit.ViTOnnxConfig
|
||||
class Data2VecVisionOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["Data2VecVisionConfig", "Data2VecVisionOnnxConfig"]
|
||||
|
@ -14,10 +14,19 @@
|
||||
# limitations under the License.
|
||||
"""DeBERTa model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any, Union
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ... import FeatureExtractionMixin, PreTrainedTokenizerBase
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
@ -150,4 +159,41 @@ class DebertaConfig(PreTrainedConfig):
|
||||
self.legacy = legacy
|
||||
|
||||
|
||||
__all__ = ["DebertaConfig"]
|
||||
# Copied from transformers.models.deberta_v2.configuration_deberta_v2.DebertaV2OnnxConfig
|
||||
class DebertaOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
if self._config.type_vocab_size > 0:
|
||||
return OrderedDict(
|
||||
[("input_ids", dynamic_axis), ("attention_mask", dynamic_axis), ("token_type_ids", dynamic_axis)]
|
||||
)
|
||||
else:
|
||||
return OrderedDict([("input_ids", dynamic_axis), ("attention_mask", dynamic_axis)])
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
preprocessor: Union["PreTrainedTokenizerBase", "FeatureExtractionMixin"],
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
num_choices: int = -1,
|
||||
is_pair: bool = False,
|
||||
num_channels: int = 3,
|
||||
image_width: int = 40,
|
||||
image_height: int = 40,
|
||||
tokenizer: "PreTrainedTokenizerBase" = None,
|
||||
) -> Mapping[str, Any]:
|
||||
dummy_inputs = super().generate_dummy_inputs(preprocessor=preprocessor)
|
||||
if self._config.type_vocab_size == 0 and "token_type_ids" in dummy_inputs:
|
||||
del dummy_inputs["token_type_ids"]
|
||||
return dummy_inputs
|
||||
|
||||
|
||||
__all__ = ["DebertaConfig", "DebertaOnnxConfig"]
|
||||
|
@ -14,10 +14,19 @@
|
||||
# limitations under the License.
|
||||
"""DeBERTa-v2 model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any, Union
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ... import FeatureExtractionMixin, PreTrainedTokenizerBase
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
@ -150,4 +159,40 @@ class DebertaV2Config(PreTrainedConfig):
|
||||
self.legacy = legacy
|
||||
|
||||
|
||||
__all__ = ["DebertaV2Config"]
|
||||
class DebertaV2OnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
if self._config.type_vocab_size > 0:
|
||||
return OrderedDict(
|
||||
[("input_ids", dynamic_axis), ("attention_mask", dynamic_axis), ("token_type_ids", dynamic_axis)]
|
||||
)
|
||||
else:
|
||||
return OrderedDict([("input_ids", dynamic_axis), ("attention_mask", dynamic_axis)])
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
preprocessor: Union["PreTrainedTokenizerBase", "FeatureExtractionMixin"],
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
num_choices: int = -1,
|
||||
is_pair: bool = False,
|
||||
num_channels: int = 3,
|
||||
image_width: int = 40,
|
||||
image_height: int = 40,
|
||||
tokenizer: "PreTrainedTokenizerBase" = None,
|
||||
) -> Mapping[str, Any]:
|
||||
dummy_inputs = super().generate_dummy_inputs(preprocessor=preprocessor)
|
||||
if self._config.type_vocab_size == 0 and "token_type_ids" in dummy_inputs:
|
||||
del dummy_inputs["token_type_ids"]
|
||||
return dummy_inputs
|
||||
|
||||
|
||||
__all__ = ["DebertaV2Config", "DebertaV2OnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""DeiT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -125,4 +131,20 @@ class DeiTConfig(PreTrainedConfig):
|
||||
self.pooler_act = pooler_act
|
||||
|
||||
|
||||
__all__ = ["DeiTConfig"]
|
||||
class DeiTOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["DeiTConfig", "DeiTOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""MEGA configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ....configuration_utils import PreTrainedConfig
|
||||
from ....onnx import OnnxConfig
|
||||
from ....utils import logging
|
||||
|
||||
|
||||
@ -221,4 +225,19 @@ class MegaConfig(PreTrainedConfig):
|
||||
self.num_attention_heads = 1 # not used but required by Hugging Face
|
||||
|
||||
|
||||
__all__ = ["MegaConfig"]
|
||||
class MegaOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["MegaConfig", "MegaOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""DETR model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
from ...utils.backbone_utils import verify_backbone_config_arguments
|
||||
from ..auto import CONFIG_MAPPING, AutoConfig
|
||||
@ -240,4 +246,25 @@ class DetrConfig(PreTrainedConfig):
|
||||
super().__init__(is_encoder_decoder=is_encoder_decoder, **kwargs)
|
||||
|
||||
|
||||
__all__ = ["DetrConfig"]
|
||||
class DetrOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
("pixel_mask", {0: "batch"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-5
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
|
||||
__all__ = ["DetrConfig", "DetrOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""DINOv2 model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
from ...utils.backbone_utils import BackboneConfigMixin, get_aligned_output_features_output_indices
|
||||
|
||||
@ -154,4 +160,20 @@ class Dinov2Config(BackboneConfigMixin, PreTrainedConfig):
|
||||
self.use_mask_token = use_mask_token
|
||||
|
||||
|
||||
__all__ = ["Dinov2Config"]
|
||||
class Dinov2OnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["Dinov2Config", "Dinov2OnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""DistilBERT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -119,4 +123,19 @@ class DistilBertConfig(PreTrainedConfig):
|
||||
super().__init__(**kwargs, pad_token_id=pad_token_id)
|
||||
|
||||
|
||||
__all__ = ["DistilBertConfig"]
|
||||
class DistilBertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["DistilBertConfig", "DistilBertOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""EfficientNet model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -144,4 +150,20 @@ class EfficientNetConfig(PreTrainedConfig):
|
||||
self.num_hidden_layers = sum(num_block_repeats) * 4
|
||||
|
||||
|
||||
__all__ = ["EfficientNetConfig"]
|
||||
class EfficientNetOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-5
|
||||
|
||||
|
||||
__all__ = ["EfficientNetConfig", "EfficientNetOnnxConfig"]
|
||||
|
@ -15,7 +15,11 @@
|
||||
# limitations under the License.
|
||||
"""ELECTRA model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -156,4 +160,20 @@ class ElectraConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["ElectraConfig"]
|
||||
class ElectraOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["ElectraConfig", "ElectraOnnxConfig"]
|
||||
|
@ -15,7 +15,11 @@
|
||||
# limitations under the License.
|
||||
"""ERNIE model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -131,4 +135,21 @@ class ErnieConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["ErnieConfig"]
|
||||
class ErnieOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
("task_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["ErnieConfig", "ErnieOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""Flaubert configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -213,4 +217,19 @@ class FlaubertConfig(PreTrainedConfig):
|
||||
super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, **kwargs)
|
||||
|
||||
|
||||
__all__ = ["FlaubertConfig"]
|
||||
class FlaubertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["FlaubertConfig", "FlaubertOnnxConfig"]
|
||||
|
@ -15,7 +15,13 @@
|
||||
# limitations under the License.
|
||||
"""OpenAI GPT-2 configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any, Optional
|
||||
|
||||
from ... import PreTrainedTokenizer, is_torch_available
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfigWithPast, PatchingSpec
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -184,4 +190,84 @@ class GPT2Config(PreTrainedConfig):
|
||||
super().__init__(bos_token_id=bos_token_id, eos_token_id=eos_token_id, **kwargs)
|
||||
|
||||
|
||||
__all__ = ["GPT2Config"]
|
||||
class GPT2OnnxConfig(OnnxConfigWithPast):
|
||||
def __init__(
|
||||
self,
|
||||
config: PreTrainedConfig,
|
||||
task: str = "default",
|
||||
patching_specs: Optional[list[PatchingSpec]] = None,
|
||||
use_past: bool = False,
|
||||
):
|
||||
super().__init__(config, task=task, patching_specs=patching_specs, use_past=use_past)
|
||||
if not getattr(self._config, "pad_token_id", None):
|
||||
# TODO: how to do that better?
|
||||
self._config.pad_token_id = 0
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = OrderedDict({"input_ids": {0: "batch", 1: "sequence"}})
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "sequence"}
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def num_layers(self) -> int:
|
||||
return self._config.n_layer
|
||||
|
||||
@property
|
||||
def num_attention_heads(self) -> int:
|
||||
return self._config.n_head
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = super(OnnxConfigWithPast, self).generate_dummy_inputs(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
# We need to order the input in the way they appears in the forward()
|
||||
ordered_inputs = OrderedDict({"input_ids": common_inputs["input_ids"]})
|
||||
|
||||
# Need to add the past_keys
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
past_shape = (
|
||||
batch,
|
||||
self.num_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // self.num_attention_heads,
|
||||
)
|
||||
ordered_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(self.num_layers)
|
||||
]
|
||||
|
||||
ordered_inputs["attention_mask"] = common_inputs["attention_mask"]
|
||||
if self.use_past:
|
||||
mask_dtype = ordered_inputs["attention_mask"].dtype
|
||||
ordered_inputs["attention_mask"] = torch.cat(
|
||||
[ordered_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
|
||||
return ordered_inputs
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
|
||||
__all__ = ["GPT2Config", "GPT2OnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""GPT Neo model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any
|
||||
|
||||
from ... import PreTrainedTokenizer, is_torch_available
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfigWithPast
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -199,4 +205,71 @@ def custom_get_block_length_and_num_blocks(seq_length, window_size):
|
||||
return largest_divisor, torch.div(seq_length, largest_divisor, rounding_mode="floor")
|
||||
|
||||
|
||||
__all__ = ["GPTNeoConfig"]
|
||||
class GPTNeoOnnxConfig(OnnxConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = OrderedDict({"input_ids": {0: "batch", 1: "sequence"}})
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "sequence"}
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def num_attention_heads(self) -> int:
|
||||
return self._config.num_heads
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = super(OnnxConfigWithPast, self).generate_dummy_inputs(
|
||||
tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
|
||||
# We need to order the input in the way they appears in the forward()
|
||||
ordered_inputs = OrderedDict({"input_ids": common_inputs["input_ids"]})
|
||||
|
||||
# Need to add the past_keys
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
past_shape = (
|
||||
batch,
|
||||
self.num_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // self.num_attention_heads,
|
||||
)
|
||||
ordered_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(self.num_layers)
|
||||
]
|
||||
|
||||
ordered_inputs["attention_mask"] = common_inputs["attention_mask"]
|
||||
if self.use_past:
|
||||
mask_dtype = ordered_inputs["attention_mask"].dtype
|
||||
ordered_inputs["attention_mask"] = torch.cat(
|
||||
[ordered_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
|
||||
return ordered_inputs
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
|
||||
__all__ = ["GPTNeoConfig", "GPTNeoOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""GPT-J model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any, Optional
|
||||
|
||||
from ... import PreTrainedTokenizer, is_torch_available
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfigWithPast, PatchingSpec
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -129,4 +135,85 @@ class GPTJConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["GPTJConfig"]
|
||||
# Copied from transformers.models.gpt2.configuration_gpt2.GPT2OnnxConfig
|
||||
class GPTJOnnxConfig(OnnxConfigWithPast):
|
||||
def __init__(
|
||||
self,
|
||||
config: PreTrainedConfig,
|
||||
task: str = "default",
|
||||
patching_specs: Optional[list[PatchingSpec]] = None,
|
||||
use_past: bool = False,
|
||||
):
|
||||
super().__init__(config, task=task, patching_specs=patching_specs, use_past=use_past)
|
||||
if not getattr(self._config, "pad_token_id", None):
|
||||
# TODO: how to do that better?
|
||||
self._config.pad_token_id = 0
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = OrderedDict({"input_ids": {0: "batch", 1: "sequence"}})
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "sequence"}
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def num_layers(self) -> int:
|
||||
return self._config.n_layer
|
||||
|
||||
@property
|
||||
def num_attention_heads(self) -> int:
|
||||
return self._config.n_head
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = super(OnnxConfigWithPast, self).generate_dummy_inputs(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
# We need to order the input in the way they appears in the forward()
|
||||
ordered_inputs = OrderedDict({"input_ids": common_inputs["input_ids"]})
|
||||
|
||||
# Need to add the past_keys
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
past_shape = (
|
||||
batch,
|
||||
self.num_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // self.num_attention_heads,
|
||||
)
|
||||
ordered_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(self.num_layers)
|
||||
]
|
||||
|
||||
ordered_inputs["attention_mask"] = common_inputs["attention_mask"]
|
||||
if self.use_past:
|
||||
mask_dtype = ordered_inputs["attention_mask"].dtype
|
||||
ordered_inputs["attention_mask"] = torch.cat(
|
||||
[ordered_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
|
||||
return ordered_inputs
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
|
||||
__all__ = ["GPTJConfig", "GPTJOnnxConfig"]
|
||||
|
@ -14,10 +14,19 @@
|
||||
# limitations under the License.
|
||||
"""GroupViT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ...processing_utils import ProcessorMixin
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
@ -351,4 +360,52 @@ class GroupViTConfig(PreTrainedConfig):
|
||||
super().__init__(**kwargs)
|
||||
|
||||
|
||||
__all__ = ["GroupViTConfig", "GroupViTTextConfig", "GroupViTVisionConfig"]
|
||||
class GroupViTOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
("attention_mask", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("logits_per_image", {0: "batch"}),
|
||||
("logits_per_text", {0: "batch"}),
|
||||
("text_embeds", {0: "batch"}),
|
||||
("image_embeds", {0: "batch"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
processor: "ProcessorMixin",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
) -> Mapping[str, Any]:
|
||||
text_input_dict = super().generate_dummy_inputs(
|
||||
processor.tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
)
|
||||
image_input_dict = super().generate_dummy_inputs(
|
||||
processor.image_processor,
|
||||
batch_size=batch_size,
|
||||
)
|
||||
return {**text_input_dict, **image_input_dict}
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 14
|
||||
|
||||
|
||||
__all__ = ["GroupViTConfig", "GroupViTOnnxConfig", "GroupViTTextConfig", "GroupViTVisionConfig"]
|
||||
|
@ -16,7 +16,11 @@
|
||||
# limitations under the License.
|
||||
"""I-BERT configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -112,4 +116,19 @@ class IBertConfig(PreTrainedConfig):
|
||||
self.force_dequant = force_dequant
|
||||
|
||||
|
||||
__all__ = ["IBertConfig"]
|
||||
class IBertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["IBertConfig", "IBertOnnxConfig"]
|
||||
|
@ -14,10 +14,18 @@
|
||||
# limitations under the License.
|
||||
"""OpenAI ImageGPT configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ... import FeatureExtractionMixin
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
@ -136,4 +144,54 @@ class ImageGPTConfig(PreTrainedConfig):
|
||||
super().__init__(tie_word_embeddings=tie_word_embeddings, **kwargs)
|
||||
|
||||
|
||||
__all__ = ["ImageGPTConfig"]
|
||||
class ImageGPTOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
preprocessor: "FeatureExtractionMixin",
|
||||
batch_size: int = 1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
num_channels: int = 3,
|
||||
image_width: int = 32,
|
||||
image_height: int = 32,
|
||||
) -> Mapping[str, Any]:
|
||||
"""
|
||||
Generate inputs to provide to the ONNX exporter.
|
||||
|
||||
Args:
|
||||
preprocessor ([`PreTrainedTokenizerBase`] or [`FeatureExtractionMixin`]):
|
||||
The preprocessor associated with this model configuration.
|
||||
batch_size (`int`, *optional*, defaults to -1):
|
||||
The batch size to export the model for (-1 means dynamic axis).
|
||||
num_choices (`int`, *optional*, defaults to -1):
|
||||
The number of candidate answers provided for multiple choice task (-1 means dynamic axis).
|
||||
seq_length (`int`, *optional*, defaults to -1):
|
||||
The sequence length to export the model for (-1 means dynamic axis).
|
||||
is_pair (`bool`, *optional*, defaults to `False`):
|
||||
Indicate if the input is a pair (sentence 1, sentence 2)
|
||||
num_channels (`int`, *optional*, defaults to 3):
|
||||
The number of channels of the generated images.
|
||||
image_width (`int`, *optional*, defaults to 40):
|
||||
The width of the generated images.
|
||||
image_height (`int`, *optional*, defaults to 40):
|
||||
The height of the generated images.
|
||||
|
||||
Returns:
|
||||
Mapping[str, Tensor] holding the kwargs to provide to the model's forward function
|
||||
"""
|
||||
|
||||
input_image = self._generate_dummy_images(batch_size, num_channels, image_height, image_width)
|
||||
inputs = dict(preprocessor(images=input_image, return_tensors="pt"))
|
||||
|
||||
return inputs
|
||||
|
||||
|
||||
__all__ = ["ImageGPTConfig", "ImageGPTOnnxConfig"]
|
||||
|
@ -14,8 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""LayoutLM model configuration"""
|
||||
|
||||
from ... import PreTrainedConfig
|
||||
from ...utils import logging
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any, Optional
|
||||
|
||||
from ... import PreTrainedConfig, PreTrainedTokenizer
|
||||
from ...onnx import OnnxConfig, PatchingSpec
|
||||
from ...utils import is_torch_available, logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
@ -122,4 +127,64 @@ class LayoutLMConfig(PreTrainedConfig):
|
||||
self.max_2d_position_embeddings = max_2d_position_embeddings
|
||||
|
||||
|
||||
__all__ = ["LayoutLMConfig"]
|
||||
class LayoutLMOnnxConfig(OnnxConfig):
|
||||
def __init__(
|
||||
self,
|
||||
config: PreTrainedConfig,
|
||||
task: str = "default",
|
||||
patching_specs: Optional[list[PatchingSpec]] = None,
|
||||
):
|
||||
super().__init__(config, task=task, patching_specs=patching_specs)
|
||||
self.max_2d_positions = config.max_2d_position_embeddings - 1
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
("bbox", {0: "batch", 1: "sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "sequence"}),
|
||||
("token_type_ids", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
"""
|
||||
Generate inputs to provide to the ONNX exporter
|
||||
|
||||
Args:
|
||||
tokenizer: The tokenizer associated with this model configuration
|
||||
batch_size: The batch size (int) to export the model for (-1 means dynamic axis)
|
||||
seq_length: The sequence length (int) to export the model for (-1 means dynamic axis)
|
||||
is_pair: Indicate if the input is a pair (sentence 1, sentence 2)
|
||||
|
||||
Returns:
|
||||
Mapping[str, Tensor] holding the kwargs to provide to the model's forward function
|
||||
"""
|
||||
|
||||
input_dict = super().generate_dummy_inputs(
|
||||
tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
|
||||
# Generate a dummy bbox
|
||||
box = [48, 84, 73, 128]
|
||||
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy inputs without PyTorch installed.")
|
||||
import torch
|
||||
|
||||
batch_size, seq_length = input_dict["input_ids"].shape
|
||||
input_dict["bbox"] = torch.tensor([*[box] * seq_length]).tile(batch_size, 1, 1)
|
||||
return input_dict
|
||||
|
||||
|
||||
__all__ = ["LayoutLMConfig", "LayoutLMOnnxConfig"]
|
||||
|
@ -14,10 +14,22 @@
|
||||
# limitations under the License.
|
||||
"""LayoutLMv3 model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ...processing_utils import ProcessorMixin
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
@ -175,4 +187,104 @@ class LayoutLMv3Config(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["LayoutLMv3Config"]
|
||||
class LayoutLMv3OnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.12")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
# The order of inputs is different for question answering and sequence classification
|
||||
if self.task in ["question-answering", "sequence-classification"]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "sequence"}),
|
||||
("bbox", {0: "batch", 1: "sequence"}),
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
else:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
("bbox", {0: "batch", 1: "sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "sequence"}),
|
||||
("pixel_values", {0: "batch", 1: "num_channels"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-5
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
processor: "ProcessorMixin",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
num_channels: int = 3,
|
||||
image_width: int = 40,
|
||||
image_height: int = 40,
|
||||
) -> Mapping[str, Any]:
|
||||
"""
|
||||
Generate inputs to provide to the ONNX exporter
|
||||
|
||||
Args:
|
||||
processor ([`ProcessorMixin`]):
|
||||
The processor associated with this model configuration.
|
||||
batch_size (`int`, *optional*, defaults to -1):
|
||||
The batch size to export the model for (-1 means dynamic axis).
|
||||
seq_length (`int`, *optional*, defaults to -1):
|
||||
The sequence length to export the model for (-1 means dynamic axis).
|
||||
is_pair (`bool`, *optional*, defaults to `False`):
|
||||
Indicate if the input is a pair (sentence 1, sentence 2).
|
||||
num_channels (`int`, *optional*, defaults to 3):
|
||||
The number of channels of the generated images.
|
||||
image_width (`int`, *optional*, defaults to 40):
|
||||
The width of the generated images.
|
||||
image_height (`int`, *optional*, defaults to 40):
|
||||
The height of the generated images.
|
||||
|
||||
Returns:
|
||||
Mapping[str, Any]: holding the kwargs to provide to the model's forward function
|
||||
"""
|
||||
|
||||
# A dummy image is used so OCR should not be applied
|
||||
setattr(processor.image_processor, "apply_ocr", False)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = processor.tokenizer.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_text = [[" ".join([processor.tokenizer.unk_token]) * seq_length]] * batch_size
|
||||
|
||||
# Generate dummy bounding boxes
|
||||
dummy_bboxes = [[[48, 84, 73, 128]]] * batch_size
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
# batch_size = compute_effective_axis_dimension(batch_size, fixed_dimension=OnnxConfig.default_fixed_batch)
|
||||
dummy_image = self._generate_dummy_images(batch_size, num_channels, image_height, image_width)
|
||||
|
||||
inputs = dict(
|
||||
processor(
|
||||
dummy_image,
|
||||
text=dummy_text,
|
||||
boxes=dummy_bboxes,
|
||||
return_tensors="pt",
|
||||
)
|
||||
)
|
||||
|
||||
return inputs
|
||||
|
||||
|
||||
__all__ = ["LayoutLMv3Config", "LayoutLMv3OnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""LeViT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -118,4 +124,21 @@ class LevitConfig(PreTrainedConfig):
|
||||
]
|
||||
|
||||
|
||||
__all__ = ["LevitConfig"]
|
||||
# Copied from transformers.models.vit.configuration_vit.ViTOnnxConfig
|
||||
class LevitOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["LevitConfig", "LevitOnnxConfig"]
|
||||
|
@ -14,12 +14,20 @@
|
||||
# limitations under the License.
|
||||
"""Longformer configuration"""
|
||||
|
||||
from typing import Union
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any, Optional, Union
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ...onnx.config import PatchingSpec
|
||||
from ...tokenization_utils_base import PreTrainedTokenizerBase
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
@ -131,4 +139,71 @@ class LongformerConfig(PreTrainedConfig):
|
||||
self.onnx_export = onnx_export
|
||||
|
||||
|
||||
__all__ = ["LongformerConfig"]
|
||||
class LongformerOnnxConfig(OnnxConfig):
|
||||
def __init__(
|
||||
self, config: "PreTrainedConfig", task: str = "default", patching_specs: "Optional[list[PatchingSpec]]" = None
|
||||
):
|
||||
super().__init__(config, task, patching_specs)
|
||||
config.onnx_export = True
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("global_attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
outputs = super().outputs
|
||||
if self.task == "default":
|
||||
outputs["pooler_output"] = {0: "batch"}
|
||||
return outputs
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
"""
|
||||
What absolute tolerance value to use during model conversion validation.
|
||||
|
||||
Returns:
|
||||
Float absolute tolerance value.
|
||||
"""
|
||||
return 1e-4
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
# needs to be >= 14 to support tril operator
|
||||
return max(super().default_onnx_opset, 14)
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: "PreTrainedTokenizerBase",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
inputs = super().generate_dummy_inputs(
|
||||
preprocessor=tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
import torch
|
||||
|
||||
# for some reason, replacing this code by inputs["global_attention_mask"] = torch.randint(2, inputs["input_ids"].shape, dtype=torch.int64)
|
||||
# makes the export fail randomly
|
||||
inputs["global_attention_mask"] = torch.zeros_like(inputs["input_ids"])
|
||||
# make every second token global
|
||||
inputs["global_attention_mask"][:, ::2] = 1
|
||||
|
||||
return inputs
|
||||
|
||||
|
||||
__all__ = ["LongformerConfig", "LongformerOnnxConfig"]
|
||||
|
@ -14,7 +14,10 @@
|
||||
# limitations under the License.
|
||||
"""LongT5 model configuration"""
|
||||
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxSeq2SeqConfigWithPast
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -149,4 +152,29 @@ class LongT5Config(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["LongT5Config"]
|
||||
class LongT5OnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = {
|
||||
"input_ids": {0: "batch", 1: "encoder_sequence"},
|
||||
"attention_mask": {0: "batch", 1: "encoder_sequence"},
|
||||
}
|
||||
if self.use_past:
|
||||
common_inputs["attention_mask"][1] = "past_encoder_sequence + sequence"
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
|
||||
__all__ = ["LongT5Config", "LongT5OnnxConfig"]
|
||||
|
@ -14,8 +14,15 @@
|
||||
# limitations under the License.
|
||||
"""M2M100 model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any
|
||||
|
||||
from ... import PreTrainedTokenizer
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...utils import logging
|
||||
from ...onnx import OnnxConfig, OnnxSeq2SeqConfigWithPast
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...utils import is_torch_available, logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
@ -151,4 +158,125 @@ class M2M100Config(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["M2M100Config"]
|
||||
class M2M100OnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
return common_inputs
|
||||
|
||||
# Copied from BartOnnxConfig._generate_dummy_inputs_for_sequence_classification_and_question_answering
|
||||
# A better name would be _generate_dummy_inputs_for_encoder_and_decoder because sequence classification and question
|
||||
# answering are not supported for M2M100, but this name is preserved to be able to check that the copy matches what
|
||||
# was done for BART so that it can be updated if need be.
|
||||
def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
# Copied from OnnxConfig.generate_dummy_inputs
|
||||
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
|
||||
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
|
||||
return common_inputs
|
||||
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig._generate_dummy_inputs_for_default_and_seq2seq_lm
|
||||
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
encoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
# Generate decoder inputs
|
||||
decoder_seq_length = seq_length if not self.use_past else 1
|
||||
decoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, decoder_seq_length, is_pair
|
||||
)
|
||||
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
|
||||
common_inputs = dict(**encoder_inputs, **decoder_inputs)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, encoder_seq_length = common_inputs["input_ids"].shape
|
||||
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
|
||||
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
|
||||
encoder_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
encoder_seq_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
decoder_past_length = decoder_seq_length + 3
|
||||
decoder_shape = (
|
||||
batch,
|
||||
num_decoder_attention_heads,
|
||||
decoder_past_length,
|
||||
self._config.hidden_size // num_decoder_attention_heads,
|
||||
)
|
||||
|
||||
common_inputs["decoder_attention_mask"] = torch.cat(
|
||||
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
|
||||
)
|
||||
|
||||
common_inputs["past_key_values"] = []
|
||||
# If the number of encoder and decoder layers are present in the model configuration, both are considered
|
||||
num_encoder_layers, num_decoder_layers = self.num_layers
|
||||
min_num_layers = min(num_encoder_layers, num_decoder_layers)
|
||||
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
|
||||
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
|
||||
|
||||
for _ in range(min_num_layers):
|
||||
common_inputs["past_key_values"].append(
|
||||
(
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
)
|
||||
)
|
||||
# TODO: test this.
|
||||
shape = encoder_shape if remaining_side_name == "encoder" else decoder_shape
|
||||
for _ in range(min_num_layers, max_num_layers):
|
||||
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
|
||||
return common_inputs
|
||||
|
||||
generate_dummy_inputs = _generate_dummy_inputs_for_default_and_seq2seq_lm
|
||||
|
||||
|
||||
__all__ = ["M2M100Config", "M2M100OnnxConfig"]
|
||||
|
@ -14,8 +14,15 @@
|
||||
# limitations under the License.
|
||||
"""Marian model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any
|
||||
|
||||
from ... import PreTrainedTokenizer
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...utils import logging
|
||||
from ...onnx import OnnxConfig, OnnxConfigWithPast, OnnxSeq2SeqConfigWithPast
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...utils import is_torch_available, logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
@ -157,4 +164,243 @@ class MarianConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["MarianConfig"]
|
||||
class MarianOnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig.inputs
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
elif self.task == "causal-lm":
|
||||
# TODO: figure this case out.
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_inputs[f"past_key_values.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_inputs[f"past_key_values.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
("decoder_input_ids", {0: "batch", 1: "decoder_sequence"}),
|
||||
("decoder_attention_mask", {0: "batch", 1: "decoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig.outputs
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_outputs = super().outputs
|
||||
else:
|
||||
common_outputs = super(OnnxConfigWithPast, self).outputs
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_outputs[f"present.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_outputs[f"present.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
return common_outputs
|
||||
|
||||
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
encoder_inputs = self._generate_dummy_inputs_for_encoder_and_decoder(
|
||||
tokenizer,
|
||||
batch_size,
|
||||
seq_length,
|
||||
is_pair,
|
||||
)
|
||||
|
||||
# Generate decoder inputs
|
||||
decoder_seq_length = seq_length if not self.use_past else 1
|
||||
decoder_inputs = self._generate_dummy_inputs_for_encoder_and_decoder(
|
||||
tokenizer,
|
||||
batch_size,
|
||||
decoder_seq_length,
|
||||
is_pair,
|
||||
)
|
||||
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
|
||||
common_inputs = dict(**encoder_inputs, **decoder_inputs)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, encoder_seq_length = common_inputs["input_ids"].shape
|
||||
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
|
||||
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
|
||||
encoder_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
encoder_seq_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
decoder_past_length = decoder_seq_length + 3
|
||||
decoder_shape = (
|
||||
batch,
|
||||
num_decoder_attention_heads,
|
||||
decoder_past_length,
|
||||
self._config.hidden_size // num_decoder_attention_heads,
|
||||
)
|
||||
|
||||
common_inputs["decoder_attention_mask"] = torch.cat(
|
||||
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
|
||||
)
|
||||
|
||||
common_inputs["past_key_values"] = []
|
||||
# If the number of encoder and decoder layers are present in the model configuration, both are considered
|
||||
num_encoder_layers, num_decoder_layers = self.num_layers
|
||||
min_num_layers = min(num_encoder_layers, num_decoder_layers)
|
||||
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
|
||||
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
|
||||
|
||||
for _ in range(min_num_layers):
|
||||
common_inputs["past_key_values"].append(
|
||||
(
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
)
|
||||
)
|
||||
# TODO: test this.
|
||||
shape = encoder_shape if remaining_side_name == "encoder" else decoder_shape
|
||||
for _ in range(min_num_layers, max_num_layers):
|
||||
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_causal_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = self._generate_dummy_inputs_for_encoder_and_decoder(
|
||||
tokenizer,
|
||||
batch_size,
|
||||
seq_length,
|
||||
is_pair,
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
num_encoder_attention_heads, _ = self.num_attention_heads
|
||||
past_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
|
||||
mask_dtype = common_inputs["attention_mask"].dtype
|
||||
common_inputs["attention_mask"] = torch.cat(
|
||||
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
common_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(num_encoder_layers)
|
||||
]
|
||||
return common_inputs
|
||||
|
||||
# Copied from BartOnnxConfig._generate_dummy_inputs_for_sequence_classification_and_question_answering
|
||||
# We renamed this function because Marian models do not have a sequence classification or question answering head
|
||||
def _generate_dummy_inputs_for_encoder_and_decoder(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
# Copied from OnnxConfig.generate_dummy_inputs
|
||||
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
|
||||
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
|
||||
return common_inputs
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = self._generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
|
||||
else:
|
||||
common_inputs = self._generate_dummy_inputs_for_causal_lm(
|
||||
tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig._flatten_past_key_values_
|
||||
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
flattened_output = super()._flatten_past_key_values_(flattened_output, name, idx, t)
|
||||
else:
|
||||
flattened_output = super(OnnxSeq2SeqConfigWithPast, self)._flatten_past_key_values_(
|
||||
flattened_output, name, idx, t
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["MarianConfig", "MarianOnnxConfig"]
|
||||
|
@ -14,8 +14,15 @@
|
||||
# limitations under the License.
|
||||
"""MBART model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any
|
||||
|
||||
from ... import PreTrainedTokenizer
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...utils import logging
|
||||
from ...onnx import OnnxConfig, OnnxConfigWithPast, OnnxSeq2SeqConfigWithPast
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...utils import is_torch_available, logging
|
||||
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
@ -157,4 +164,224 @@ class MBartConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["MBartConfig"]
|
||||
# Copied from transformers.models.bart.configuration_bart.BartOnnxConfig with Bart->MBart
|
||||
class MBartOnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
elif self.task == "causal-lm":
|
||||
# TODO: figure this case out.
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_inputs[f"past_key_values.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_inputs[f"past_key_values.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
else:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "encoder_sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "encoder_sequence"}),
|
||||
("decoder_input_ids", {0: "batch", 1: "decoder_sequence"}),
|
||||
("decoder_attention_mask", {0: "batch", 1: "decoder_sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_outputs = super().outputs
|
||||
else:
|
||||
common_outputs = super(OnnxConfigWithPast, self).outputs
|
||||
if self.use_past:
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
for i in range(num_encoder_layers):
|
||||
common_outputs[f"present.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
common_outputs[f"present.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
return common_outputs
|
||||
|
||||
def _generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
encoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
# Generate decoder inputs
|
||||
decoder_seq_length = seq_length if not self.use_past else 1
|
||||
decoder_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, decoder_seq_length, is_pair
|
||||
)
|
||||
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
|
||||
common_inputs = dict(**encoder_inputs, **decoder_inputs)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, encoder_seq_length = common_inputs["input_ids"].shape
|
||||
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
|
||||
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
|
||||
encoder_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
encoder_seq_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
decoder_past_length = decoder_seq_length + 3
|
||||
decoder_shape = (
|
||||
batch,
|
||||
num_decoder_attention_heads,
|
||||
decoder_past_length,
|
||||
self._config.hidden_size // num_decoder_attention_heads,
|
||||
)
|
||||
|
||||
common_inputs["decoder_attention_mask"] = torch.cat(
|
||||
[common_inputs["decoder_attention_mask"], torch.ones(batch, decoder_past_length)], dim=1
|
||||
)
|
||||
|
||||
common_inputs["past_key_values"] = []
|
||||
# If the number of encoder and decoder layers are present in the model configuration, both are considered
|
||||
num_encoder_layers, num_decoder_layers = self.num_layers
|
||||
min_num_layers = min(num_encoder_layers, num_decoder_layers)
|
||||
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
|
||||
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
|
||||
|
||||
for _ in range(min_num_layers):
|
||||
common_inputs["past_key_values"].append(
|
||||
(
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
)
|
||||
)
|
||||
# TODO: test this.
|
||||
shape = encoder_shape if remaining_side_name == "encoder" else decoder_shape
|
||||
for _ in range(min_num_layers, max_num_layers):
|
||||
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_causal_lm(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size, seq_length, is_pair
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
num_encoder_layers, _ = self.num_layers
|
||||
num_encoder_attention_heads, _ = self.num_attention_heads
|
||||
past_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
|
||||
mask_dtype = common_inputs["attention_mask"].dtype
|
||||
common_inputs["attention_mask"] = torch.cat(
|
||||
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)], dim=1
|
||||
)
|
||||
common_inputs["past_key_values"] = [
|
||||
(torch.zeros(past_shape), torch.zeros(past_shape)) for _ in range(num_encoder_layers)
|
||||
]
|
||||
return common_inputs
|
||||
|
||||
def _generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
# Copied from OnnxConfig.generate_dummy_inputs
|
||||
# Did not use super(OnnxConfigWithPast, self).generate_dummy_inputs for code clarity.
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = tokenizer.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_input = [" ".join([tokenizer.unk_token]) * seq_length] * batch_size
|
||||
common_inputs = dict(tokenizer(dummy_input, return_tensors="pt"))
|
||||
return common_inputs
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: PreTrainedTokenizer,
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
common_inputs = self._generate_dummy_inputs_for_default_and_seq2seq_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
elif self.task == "causal-lm":
|
||||
common_inputs = self._generate_dummy_inputs_for_causal_lm(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
else:
|
||||
common_inputs = self._generate_dummy_inputs_for_sequence_classification_and_question_answering(
|
||||
tokenizer, batch_size=batch_size, seq_length=seq_length, is_pair=is_pair
|
||||
)
|
||||
|
||||
return common_inputs
|
||||
|
||||
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
|
||||
if self.task in ["default", "seq2seq-lm"]:
|
||||
flattened_output = super()._flatten_past_key_values_(flattened_output, name, idx, t)
|
||||
else:
|
||||
flattened_output = super(OnnxSeq2SeqConfigWithPast, self)._flatten_past_key_values_(
|
||||
flattened_output, name, idx, t
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["MBartConfig", "MBartOnnxConfig"]
|
||||
|
@ -4,6 +4,7 @@
|
||||
# the file from the modular. If any change should be done, please apply the change to the
|
||||
# modular_metaclip_2.py file directly. One of our CI enforces this.
|
||||
# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""MobileBERT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -160,4 +164,21 @@ class MobileBertConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["MobileBertConfig"]
|
||||
# Copied from transformers.models.bert.configuration_bert.BertOnnxConfig with Bert->MobileBert
|
||||
class MobileBertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["MobileBertConfig", "MobileBertOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""MobileNetV1 model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -98,4 +104,23 @@ class MobileNetV1Config(PreTrainedConfig):
|
||||
self.layer_norm_eps = layer_norm_eps
|
||||
|
||||
|
||||
__all__ = ["MobileNetV1Config"]
|
||||
class MobileNetV1OnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict([("pixel_values", {0: "batch"})])
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "image-classification":
|
||||
return OrderedDict([("logits", {0: "batch"})])
|
||||
else:
|
||||
return OrderedDict([("last_hidden_state", {0: "batch"}), ("pooler_output", {0: "batch"})])
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["MobileNetV1Config", "MobileNetV1OnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""MobileNetV2 model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -126,4 +132,23 @@ class MobileNetV2Config(PreTrainedConfig):
|
||||
self.semantic_loss_ignore_index = semantic_loss_ignore_index
|
||||
|
||||
|
||||
__all__ = ["MobileNetV2Config"]
|
||||
class MobileNetV2OnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict([("pixel_values", {0: "batch"})])
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "image-classification":
|
||||
return OrderedDict([("logits", {0: "batch"})])
|
||||
else:
|
||||
return OrderedDict([("last_hidden_state", {0: "batch"}), ("pooler_output", {0: "batch"})])
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["MobileNetV2Config", "MobileNetV2OnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""MobileViT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -144,4 +150,23 @@ class MobileViTConfig(PreTrainedConfig):
|
||||
self.semantic_loss_ignore_index = semantic_loss_ignore_index
|
||||
|
||||
|
||||
__all__ = ["MobileViTConfig"]
|
||||
class MobileViTOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict([("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"})])
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "image-classification":
|
||||
return OrderedDict([("logits", {0: "batch"})])
|
||||
else:
|
||||
return OrderedDict([("last_hidden_state", {0: "batch"}), ("pooler_output", {0: "batch"})])
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["MobileViTConfig", "MobileViTOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""MobileViTV2 model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -140,4 +146,23 @@ class MobileViTV2Config(PreTrainedConfig):
|
||||
self.semantic_loss_ignore_index = semantic_loss_ignore_index
|
||||
|
||||
|
||||
__all__ = ["MobileViTV2Config"]
|
||||
class MobileViTV2OnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict([("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"})])
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "image-classification":
|
||||
return OrderedDict([("logits", {0: "batch"})])
|
||||
else:
|
||||
return OrderedDict([("last_hidden_state", {0: "batch"}), ("pooler_output", {0: "batch"})])
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["MobileViTV2Config", "MobileViTV2OnnxConfig"]
|
||||
|
@ -14,7 +14,10 @@
|
||||
# limitations under the License.
|
||||
"""mT5 model configuration"""
|
||||
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxSeq2SeqConfigWithPast
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -145,4 +148,35 @@ class MT5Config(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["MT5Config"]
|
||||
class MT5OnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
# Copied from transformers.models.t5.configuration_t5.T5OnnxConfig.inputs
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = {
|
||||
"input_ids": {0: "batch", 1: "encoder_sequence"},
|
||||
"attention_mask": {0: "batch", 1: "encoder_sequence"},
|
||||
}
|
||||
if self.use_past:
|
||||
common_inputs["attention_mask"][1] = "past_encoder_sequence + sequence"
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
# Copied from transformers.models.t5.configuration_t5.T5OnnxConfig.default_onnx_opset
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 5e-4
|
||||
|
||||
|
||||
__all__ = ["MT5Config", "MT5OnnxConfig"]
|
||||
|
@ -14,7 +14,16 @@
|
||||
# limitations under the License.
|
||||
"""OWL-ViT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ...processing_utils import ProcessorMixin
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -265,4 +274,52 @@ class OwlViTConfig(PreTrainedConfig):
|
||||
super().__init__(**kwargs)
|
||||
|
||||
|
||||
__all__ = ["OwlViTConfig", "OwlViTTextConfig", "OwlViTVisionConfig"]
|
||||
class OwlViTOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
("attention_mask", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("logits_per_image", {0: "batch"}),
|
||||
("logits_per_text", {0: "batch"}),
|
||||
("text_embeds", {0: "batch"}),
|
||||
("image_embeds", {0: "batch"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
processor: "ProcessorMixin",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
) -> Mapping[str, Any]:
|
||||
text_input_dict = super().generate_dummy_inputs(
|
||||
processor.tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
)
|
||||
image_input_dict = super().generate_dummy_inputs(
|
||||
processor.image_processor,
|
||||
batch_size=batch_size,
|
||||
)
|
||||
return {**text_input_dict, **image_input_dict}
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 14
|
||||
|
||||
|
||||
__all__ = ["OwlViTConfig", "OwlViTOnnxConfig", "OwlViTTextConfig", "OwlViTVisionConfig"]
|
||||
|
@ -14,7 +14,15 @@
|
||||
# limitations under the License.
|
||||
"""Perceiver model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import Any, Union
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...feature_extraction_utils import FeatureExtractionMixin
|
||||
from ...onnx import OnnxConfig
|
||||
from ...onnx.utils import compute_effective_axis_dimension
|
||||
from ...tokenization_utils_base import PreTrainedTokenizerBase
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -174,4 +182,63 @@ class PerceiverConfig(PreTrainedConfig):
|
||||
self._label_trainable_num_channels = _label_trainable_num_channels
|
||||
|
||||
|
||||
__all__ = ["PerceiverConfig"]
|
||||
class PerceiverOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("inputs", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
preprocessor: Union["PreTrainedTokenizerBase", "FeatureExtractionMixin"],
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
num_choices: int = -1,
|
||||
is_pair: bool = False,
|
||||
num_channels: int = 3,
|
||||
image_width: int = 40,
|
||||
image_height: int = 40,
|
||||
) -> Mapping[str, Any]:
|
||||
# copied from `transformers.onnx.config.OnnxConfig` and slightly altered/simplified
|
||||
|
||||
if isinstance(preprocessor, PreTrainedTokenizerBase):
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = preprocessor.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
dummy_input = [" ".join(["a"]) * seq_length] * batch_size
|
||||
inputs = dict(preprocessor(dummy_input, return_tensors="pt"))
|
||||
inputs["inputs"] = inputs.pop("input_ids")
|
||||
return inputs
|
||||
elif isinstance(preprocessor, FeatureExtractionMixin) and preprocessor.model_input_names[0] == "pixel_values":
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(batch_size, fixed_dimension=OnnxConfig.default_fixed_batch)
|
||||
dummy_input = self._generate_dummy_images(batch_size, num_channels, image_height, image_width)
|
||||
inputs = dict(preprocessor(images=dummy_input, return_tensors="pt"))
|
||||
inputs["inputs"] = inputs.pop("pixel_values")
|
||||
return inputs
|
||||
else:
|
||||
raise ValueError(
|
||||
"Unable to generate dummy inputs for the model. Please provide a tokenizer or a preprocessor."
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["PerceiverConfig", "PerceiverOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""PLBART model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfigWithPast
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -161,4 +165,33 @@ class PLBartConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
class PLBartOnnxConfig(OnnxConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", {0: "batch", 1: "sequence"}),
|
||||
("attention_mask", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.use_past:
|
||||
return OrderedDict(
|
||||
[
|
||||
("last_hidden_state", {0: "batch", 1: "sequence"}),
|
||||
("past_keys", {0: "batch", 2: "sequence"}),
|
||||
("encoder_last_hidden_state", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
else:
|
||||
return OrderedDict(
|
||||
[
|
||||
("last_hidden_state", {0: "batch", 1: "sequence"}),
|
||||
("encoder_last_hidden_state", {0: "batch", 1: "sequence"}),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["PLBartConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""PoolFormer model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -123,4 +129,20 @@ class PoolFormerConfig(PreTrainedConfig):
|
||||
super().__init__(**kwargs)
|
||||
|
||||
|
||||
__all__ = ["PoolFormerConfig"]
|
||||
class PoolFormerOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 2e-3
|
||||
|
||||
|
||||
__all__ = ["PoolFormerConfig", "PoolFormerOnnxConfig"]
|
||||
|
@ -16,9 +16,13 @@
|
||||
# limitations under the License.
|
||||
"""Pvt model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Callable, Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -135,4 +139,24 @@ class PvtConfig(PreTrainedConfig):
|
||||
self.qkv_bias = qkv_bias
|
||||
|
||||
|
||||
__all__ = ["PvtConfig"]
|
||||
class PvtOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
|
||||
__all__ = ["PvtConfig", "PvtOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""RemBERT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -135,4 +139,24 @@ class RemBertConfig(PreTrainedConfig):
|
||||
self.tie_word_embeddings = False
|
||||
|
||||
|
||||
__all__ = ["RemBertConfig"]
|
||||
class RemBertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["RemBertConfig", "RemBertOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""ResNet model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
from ...utils.backbone_utils import BackboneConfigMixin, get_aligned_output_features_output_indices
|
||||
|
||||
@ -111,4 +117,20 @@ class ResNetConfig(BackboneConfigMixin, PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["ResNetConfig"]
|
||||
class ResNetOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-3
|
||||
|
||||
|
||||
__all__ = ["ResNetConfig", "ResNetOnnxConfig"]
|
||||
|
@ -15,7 +15,11 @@
|
||||
# limitations under the License.
|
||||
"""RoBERTa configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -125,4 +129,19 @@ class RobertaConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["RobertaConfig"]
|
||||
class RobertaOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["RobertaConfig", "RobertaOnnxConfig"]
|
||||
|
@ -15,7 +15,11 @@
|
||||
# limitations under the License.
|
||||
"""RoBERTa-PreLayerNorm configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -126,4 +130,20 @@ class RobertaPreLayerNormConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["RobertaPreLayerNormConfig"]
|
||||
# Copied from transformers.models.roberta.configuration_roberta.RobertaOnnxConfig with Roberta->RobertaPreLayerNorm
|
||||
class RobertaPreLayerNormOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["RobertaPreLayerNormConfig", "RobertaPreLayerNormOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""RoFormer model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -126,4 +130,21 @@ class RoFormerConfig(PreTrainedConfig):
|
||||
self.use_cache = use_cache
|
||||
|
||||
|
||||
__all__ = ["RoFormerConfig"]
|
||||
class RoFormerOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["RoFormerConfig", "RoFormerOnnxConfig"]
|
||||
|
@ -15,8 +15,13 @@
|
||||
"""SegFormer model configuration"""
|
||||
|
||||
import warnings
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -143,4 +148,24 @@ class SegformerConfig(PreTrainedConfig):
|
||||
self.semantic_loss_ignore_index = semantic_loss_ignore_index
|
||||
|
||||
|
||||
__all__ = ["SegformerConfig"]
|
||||
class SegformerOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
|
||||
__all__ = ["SegformerConfig", "SegformerOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""SqueezeBERT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -143,4 +147,21 @@ class SqueezeBertConfig(PreTrainedConfig):
|
||||
self.output_groups = output_groups
|
||||
|
||||
|
||||
__all__ = ["SqueezeBertConfig"]
|
||||
# # Copied from transformers.models.bert.configuration_bert.BertOnxxConfig with Bert->SqueezeBert
|
||||
class SqueezeBertOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["SqueezeBertConfig", "SqueezeBertOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""SwiftFormer model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -123,4 +129,20 @@ class SwiftFormerConfig(PreTrainedConfig):
|
||||
self.batch_norm_eps = batch_norm_eps
|
||||
|
||||
|
||||
__all__ = ["SwiftFormerConfig"]
|
||||
class SwiftFormerOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["SwiftFormerConfig", "SwiftFormerOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""Swin Transformer model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
from ...utils.backbone_utils import BackboneConfigMixin, get_aligned_output_features_output_indices
|
||||
|
||||
@ -154,4 +160,20 @@ class SwinConfig(BackboneConfigMixin, PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["SwinConfig"]
|
||||
class SwinOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["SwinConfig", "SwinOnnxConfig"]
|
||||
|
@ -14,7 +14,10 @@
|
||||
# limitations under the License.
|
||||
"""T5 model configuration"""
|
||||
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxSeq2SeqConfigWithPast
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -140,4 +143,29 @@ class T5Config(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["T5Config"]
|
||||
class T5OnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = {
|
||||
"input_ids": {0: "batch", 1: "encoder_sequence"},
|
||||
"attention_mask": {0: "batch", 1: "encoder_sequence"},
|
||||
}
|
||||
if self.use_past:
|
||||
common_inputs["attention_mask"][1] = "past_encoder_sequence + sequence"
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
|
||||
__all__ = ["T5Config", "T5OnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""Table Transformer model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
from ...utils.backbone_utils import verify_backbone_config_arguments
|
||||
from ..auto import CONFIG_MAPPING, AutoConfig
|
||||
@ -241,4 +247,26 @@ class TableTransformerConfig(PreTrainedConfig):
|
||||
super().__init__(is_encoder_decoder=is_encoder_decoder, **kwargs)
|
||||
|
||||
|
||||
__all__ = ["TableTransformerConfig"]
|
||||
# Copied from transformers.models.detr.configuration_detr.DetrOnnxConfig
|
||||
class TableTransformerOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
("pixel_mask", {0: "batch"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-5
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
|
||||
__all__ = ["TableTransformerConfig", "TableTransformerOnnxConfig"]
|
||||
|
@ -14,7 +14,10 @@
|
||||
# limitations under the License.
|
||||
"""UMT5 model configuration"""
|
||||
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxSeq2SeqConfigWithPast
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -144,4 +147,35 @@ class UMT5Config(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["UMT5Config"]
|
||||
class UMT5OnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
# Copied from transformers.models.t5.configuration_t5.T5OnnxConfig.inputs
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = {
|
||||
"input_ids": {0: "batch", 1: "encoder_sequence"},
|
||||
"attention_mask": {0: "batch", 1: "encoder_sequence"},
|
||||
}
|
||||
if self.use_past:
|
||||
common_inputs["attention_mask"][1] = "past_encoder_sequence + sequence"
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
common_inputs["decoder_attention_mask"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
|
||||
return common_inputs
|
||||
|
||||
@property
|
||||
# Copied from transformers.models.t5.configuration_t5.T5OnnxConfig.default_onnx_opset
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 13
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 5e-4
|
||||
|
||||
|
||||
__all__ = ["UMT5Config", "UMT5OnnxConfig"]
|
||||
|
@ -14,11 +14,21 @@
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
from ..auto.configuration_auto import AutoConfig
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ... import PreTrainedTokenizerBase
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
@ -108,4 +118,100 @@ class VisionEncoderDecoderConfig(PreTrainedConfig):
|
||||
return cls(encoder=encoder_config.to_dict(), decoder=decoder_config.to_dict(), **kwargs)
|
||||
|
||||
|
||||
__all__ = ["VisionEncoderDecoderConfig"]
|
||||
class VisionEncoderDecoderEncoderOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict({"last_hidden_state": {0: "batch", 1: "encoder_sequence"}})
|
||||
|
||||
|
||||
class VisionEncoderDecoderDecoderOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = OrderedDict()
|
||||
common_inputs["input_ids"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
common_inputs["attention_mask"] = {0: "batch", 1: "past_decoder_sequence + sequence"}
|
||||
common_inputs["encoder_hidden_states"] = {0: "batch", 1: "encoder_sequence"}
|
||||
|
||||
return common_inputs
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: "PreTrainedTokenizerBase",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
import torch
|
||||
|
||||
common_inputs = OrderedDict()
|
||||
|
||||
dummy_input = super().generate_dummy_inputs(
|
||||
tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
|
||||
batch, encoder_sequence = dummy_input["input_ids"].shape
|
||||
encoder_hidden_states_shape = (batch, encoder_sequence, self._config.encoder_hidden_size)
|
||||
common_inputs["input_ids"] = dummy_input.pop("input_ids")
|
||||
common_inputs["attention_mask"] = dummy_input.pop("attention_mask")
|
||||
common_inputs["encoder_hidden_states"] = torch.zeros(encoder_hidden_states_shape)
|
||||
|
||||
return common_inputs
|
||||
|
||||
|
||||
class VisionEncoderDecoderOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> None:
|
||||
pass
|
||||
|
||||
def get_encoder_config(self, encoder_config: PreTrainedConfig) -> OnnxConfig:
|
||||
r"""
|
||||
Returns ONNX encoder config for `VisionEncoderDecoder` model.
|
||||
|
||||
Args:
|
||||
encoder_config (`PreTrainedConfig`):
|
||||
The encoder model's configuration to use when exporting to ONNX.
|
||||
|
||||
Returns:
|
||||
[`VisionEncoderDecoderEncoderOnnxConfig`]: An instance of the ONNX configuration object
|
||||
"""
|
||||
return VisionEncoderDecoderEncoderOnnxConfig(encoder_config)
|
||||
|
||||
def get_decoder_config(
|
||||
self, encoder_config: PreTrainedConfig, decoder_config: PreTrainedConfig, feature: str = "default"
|
||||
) -> OnnxConfig:
|
||||
r"""
|
||||
Returns ONNX decoder config for `VisionEncoderDecoder` model.
|
||||
|
||||
Args:
|
||||
encoder_config (`PreTrainedConfig`):
|
||||
The encoder model's configuration to use when exporting to ONNX.
|
||||
decoder_config (`PreTrainedConfig`):
|
||||
The decoder model's configuration to use when exporting to ONNX
|
||||
feature (`str`, *optional*):
|
||||
The type of feature to export the model with.
|
||||
|
||||
Returns:
|
||||
[`VisionEncoderDecoderDecoderOnnxConfig`]: An instance of the ONNX configuration object.
|
||||
"""
|
||||
decoder_config.encoder_hidden_size = encoder_config.hidden_size
|
||||
return VisionEncoderDecoderDecoderOnnxConfig(decoder_config, feature)
|
||||
|
||||
|
||||
__all__ = ["VisionEncoderDecoderConfig", "VisionEncoderDecoderOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""ViT model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -124,4 +130,20 @@ class ViTConfig(PreTrainedConfig):
|
||||
self.pooler_act = pooler_act
|
||||
|
||||
|
||||
__all__ = ["ViTConfig"]
|
||||
class ViTOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
|
||||
__all__ = ["ViTConfig", "ViTOnnxConfig"]
|
||||
|
@ -14,10 +14,19 @@
|
||||
# limitations under the License.
|
||||
"""Whisper model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
from typing import TYPE_CHECKING, Any, Union
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig, OnnxSeq2SeqConfigWithPast
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ...feature_extraction_utils import FeatureExtractionMixin
|
||||
from ...tokenization_utils_base import PreTrainedTokenizerBase
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
@ -276,4 +285,64 @@ class WhisperConfig(PreTrainedConfig):
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["WhisperConfig"]
|
||||
class WhisperOnnxConfig(OnnxSeq2SeqConfigWithPast):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_inputs = OrderedDict(
|
||||
[
|
||||
("input_features", {0: "batch", 1: "feature_size", 2: "encoder_sequence"}),
|
||||
]
|
||||
)
|
||||
if self.use_past:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch"}
|
||||
else:
|
||||
common_inputs["decoder_input_ids"] = {0: "batch", 1: "decoder_sequence"}
|
||||
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_inputs, direction="inputs")
|
||||
|
||||
return common_inputs
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
preprocessor: Union["PreTrainedTokenizerBase", "FeatureExtractionMixin"],
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
sampling_rate: int = 22050,
|
||||
time_duration: float = 5.0,
|
||||
frequency: int = 220,
|
||||
) -> Mapping[str, Any]:
|
||||
dummy_inputs = OrderedDict()
|
||||
encoder_inputs = OnnxConfig.generate_dummy_inputs(
|
||||
self,
|
||||
preprocessor=preprocessor.feature_extractor,
|
||||
batch_size=batch_size,
|
||||
sampling_rate=sampling_rate,
|
||||
time_duration=time_duration,
|
||||
frequency=frequency,
|
||||
)
|
||||
encoder_sequence_length = encoder_inputs["input_features"].shape[2]
|
||||
seq_length = encoder_sequence_length // 2 if self.use_past else seq_length
|
||||
|
||||
decoder_inputs = super().generate_dummy_inputs(
|
||||
preprocessor.tokenizer,
|
||||
batch_size,
|
||||
seq_length,
|
||||
is_pair,
|
||||
)
|
||||
|
||||
dummy_inputs["input_features"] = encoder_inputs.pop("input_features")
|
||||
dummy_inputs["decoder_input_ids"] = decoder_inputs.pop("decoder_input_ids")
|
||||
|
||||
if "past_key_values" in decoder_inputs:
|
||||
dummy_inputs["past_key_values"] = decoder_inputs.pop("past_key_values")
|
||||
|
||||
return dummy_inputs
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-3
|
||||
|
||||
|
||||
__all__ = ["WhisperConfig", "WhisperOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""XLM configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -217,4 +221,21 @@ class XLMConfig(PreTrainedConfig):
|
||||
super().__init__(pad_token_id=pad_token_id, bos_token_id=bos_token_id, **kwargs)
|
||||
|
||||
|
||||
__all__ = ["XLMConfig"]
|
||||
# Copied from transformers.models.bert.configuration_bert.BertOnnxConfig
|
||||
class XLMOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
("token_type_ids", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["XLMConfig", "XLMOnnxConfig"]
|
||||
|
@ -15,7 +15,11 @@
|
||||
# limitations under the License.
|
||||
"""XLM-RoBERTa configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -126,4 +130,20 @@ class XLMRobertaConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["XLMRobertaConfig"]
|
||||
# Copied from transformers.models.roberta.configuration_roberta.RobertaOnnxConfig with Roberta->XLMRoberta
|
||||
class XLMRobertaOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["XLMRobertaConfig", "XLMRobertaOnnxConfig"]
|
||||
|
@ -14,7 +14,11 @@
|
||||
# limitations under the License.
|
||||
"""XLM_ROBERTa_XL configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -122,4 +126,20 @@ class XLMRobertaXLConfig(PreTrainedConfig):
|
||||
self.classifier_dropout = classifier_dropout
|
||||
|
||||
|
||||
__all__ = ["XLMRobertaXLConfig"]
|
||||
# Copied from transformers.models.roberta.configuration_roberta.RobertaOnnxConfig with Roberta->XLMRobertaXL
|
||||
class XLMRobertaXLOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["XLMRobertaXLConfig", "XLMRobertaXLOnnxConfig"]
|
||||
|
@ -15,7 +15,11 @@
|
||||
# limitations under the License.
|
||||
"""X-MOD configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -154,4 +158,20 @@ class XmodConfig(PreTrainedConfig):
|
||||
self.default_language = default_language
|
||||
|
||||
|
||||
__all__ = ["XmodConfig"]
|
||||
# Copied from transformers.models.roberta.configuration_roberta.RobertaOnnxConfig with Roberta->Xmod
|
||||
class XmodOnnxConfig(OnnxConfig):
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
if self.task == "multiple-choice":
|
||||
dynamic_axis = {0: "batch", 1: "choice", 2: "sequence"}
|
||||
else:
|
||||
dynamic_axis = {0: "batch", 1: "sequence"}
|
||||
return OrderedDict(
|
||||
[
|
||||
("input_ids", dynamic_axis),
|
||||
("attention_mask", dynamic_axis),
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
__all__ = ["XmodConfig", "XmodOnnxConfig"]
|
||||
|
@ -14,7 +14,13 @@
|
||||
# limitations under the License.
|
||||
"""YOLOS model configuration"""
|
||||
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Mapping
|
||||
|
||||
from packaging import version
|
||||
|
||||
from ...configuration_utils import PreTrainedConfig
|
||||
from ...onnx import OnnxConfig
|
||||
from ...utils import logging
|
||||
|
||||
|
||||
@ -149,4 +155,24 @@ class YolosConfig(PreTrainedConfig):
|
||||
self.eos_coefficient = eos_coefficient
|
||||
|
||||
|
||||
__all__ = ["YolosConfig"]
|
||||
class YolosOnnxConfig(OnnxConfig):
|
||||
torch_onnx_minimum_version = version.parse("1.11")
|
||||
|
||||
@property
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
return OrderedDict(
|
||||
[
|
||||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
|
||||
]
|
||||
)
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
return 1e-4
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
return 12
|
||||
|
||||
|
||||
__all__ = ["YolosConfig", "YolosOnnxConfig"]
|
||||
|
45
src/transformers/onnx/__init__.py
Normal file
45
src/transformers/onnx/__init__.py
Normal file
@ -0,0 +1,45 @@
|
||||
# Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
from ..utils import _LazyModule
|
||||
|
||||
|
||||
_import_structure = {
|
||||
"config": [
|
||||
"EXTERNAL_DATA_FORMAT_SIZE_LIMIT",
|
||||
"OnnxConfig",
|
||||
"OnnxConfigWithPast",
|
||||
"OnnxSeq2SeqConfigWithPast",
|
||||
"PatchingSpec",
|
||||
],
|
||||
"utils": ["ParameterFormat", "compute_serialized_parameters_size"],
|
||||
}
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from .config import (
|
||||
EXTERNAL_DATA_FORMAT_SIZE_LIMIT,
|
||||
OnnxConfig,
|
||||
OnnxConfigWithPast,
|
||||
OnnxSeq2SeqConfigWithPast,
|
||||
PatchingSpec,
|
||||
)
|
||||
from .utils import ParameterFormat, compute_serialized_parameters_size
|
||||
|
||||
else:
|
||||
import sys
|
||||
|
||||
sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
|
748
src/transformers/onnx/config.py
Normal file
748
src/transformers/onnx/config.py
Normal file
@ -0,0 +1,748 @@
|
||||
# Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
import copy
|
||||
import dataclasses
|
||||
import warnings
|
||||
from abc import ABC, abstractmethod
|
||||
from collections import OrderedDict
|
||||
from collections.abc import Callable, Iterable, Mapping
|
||||
from typing import TYPE_CHECKING, Any, Optional, Union
|
||||
|
||||
import numpy as np
|
||||
from packaging import version
|
||||
|
||||
from ..utils import is_torch_available, is_vision_available, logging
|
||||
from .utils import ParameterFormat, compute_effective_axis_dimension, compute_serialized_parameters_size
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ..configuration_utils import PreTrainedConfig
|
||||
from ..feature_extraction_utils import FeatureExtractionMixin
|
||||
from ..image_processing_utils import ImageProcessingMixin
|
||||
from ..tokenization_utils_base import PreTrainedTokenizerBase
|
||||
|
||||
|
||||
if is_vision_available():
|
||||
from PIL import Image
|
||||
|
||||
logger = logging.get_logger(__name__)
|
||||
|
||||
|
||||
DEFAULT_ONNX_OPSET = 11
|
||||
|
||||
# 2 Gb
|
||||
EXTERNAL_DATA_FORMAT_SIZE_LIMIT = 2 * 1024 * 1024 * 1024
|
||||
|
||||
|
||||
@dataclasses.dataclass
|
||||
class PatchingSpec:
|
||||
"""
|
||||
Data class that holds patching specifications.
|
||||
|
||||
Args:
|
||||
o: Module / object where the op to patch is located
|
||||
name: Name of the op to monkey patch
|
||||
custom_op: Custom op that patches the original op
|
||||
orig_op: Original op that is being patched
|
||||
op_wrapper: Wrapper (optional) that wraps both the original and custom ops.
|
||||
It is useful for ops that are class or static methods for instance.
|
||||
"""
|
||||
|
||||
o: Any
|
||||
name: str
|
||||
custom_op: Callable
|
||||
orig_op: Optional[Callable] = None
|
||||
op_wrapper: Optional[Callable] = None
|
||||
|
||||
|
||||
class OnnxConfig(ABC):
|
||||
"""
|
||||
Base class for ONNX exportable model describing metadata on how to export the model through the ONNX format.
|
||||
"""
|
||||
|
||||
default_fixed_batch = 2
|
||||
default_fixed_sequence = 8
|
||||
default_fixed_num_choices = 4
|
||||
torch_onnx_minimum_version = version.parse("1.8")
|
||||
_tasks_to_common_outputs = {
|
||||
"causal-lm": OrderedDict({"logits": {0: "batch", 1: "sequence"}}),
|
||||
"default": OrderedDict({"last_hidden_state": {0: "batch", 1: "sequence"}}),
|
||||
"image-classification": OrderedDict({"logits": {0: "batch", 1: "sequence"}}),
|
||||
"image-segmentation": OrderedDict(
|
||||
{
|
||||
"logits": {0: "batch", 1: "sequence"},
|
||||
"pred_boxes": {0: "batch", 1: "sequence"},
|
||||
"pred_masks": {0: "batch", 1: "sequence"},
|
||||
}
|
||||
),
|
||||
"masked-im": OrderedDict({"logits": {0: "batch", 1: "sequence"}}),
|
||||
"masked-lm": OrderedDict({"logits": {0: "batch", 1: "sequence"}}),
|
||||
"multiple-choice": OrderedDict({"logits": {0: "batch"}}),
|
||||
"object-detection": OrderedDict(
|
||||
{
|
||||
"logits": {0: "batch", 1: "sequence"},
|
||||
"pred_boxes": {0: "batch", 1: "sequence"},
|
||||
}
|
||||
),
|
||||
"question-answering": OrderedDict(
|
||||
{
|
||||
"start_logits": {0: "batch", 1: "sequence"},
|
||||
"end_logits": {0: "batch", 1: "sequence"},
|
||||
}
|
||||
),
|
||||
"semantic-segmentation": OrderedDict({"logits": {0: "batch", 1: "num_labels", 2: "height", 3: "width"}}),
|
||||
"seq2seq-lm": OrderedDict({"logits": {0: "batch", 1: "decoder_sequence"}}),
|
||||
"sequence-classification": OrderedDict({"logits": {0: "batch"}}),
|
||||
"token-classification": OrderedDict({"logits": {0: "batch", 1: "sequence"}}),
|
||||
"vision2seq-lm": OrderedDict({"logits": {0: "batch", 1: "sequence"}}),
|
||||
"speech2seq-lm": OrderedDict({"logits": {0: "batch", 1: "sequence"}}),
|
||||
}
|
||||
|
||||
def __init__(
|
||||
self, config: "PreTrainedConfig", task: str = "default", patching_specs: Optional[list[PatchingSpec]] = None
|
||||
):
|
||||
self._config = config
|
||||
|
||||
if task not in self._tasks_to_common_outputs:
|
||||
raise ValueError(
|
||||
f"{task} is not a supported task, supported tasks: {self._tasks_to_common_outputs.keys()}"
|
||||
)
|
||||
self.task = task
|
||||
|
||||
self._patching_specs = []
|
||||
for spec in patching_specs if patching_specs is not None else []:
|
||||
final_spec = spec
|
||||
if spec.orig_op is None:
|
||||
final_spec = dataclasses.replace(spec, orig_op=getattr(spec.o, spec.name))
|
||||
self._patching_specs.append(final_spec)
|
||||
|
||||
@classmethod
|
||||
def from_model_config(cls, config: "PreTrainedConfig", task: str = "default") -> "OnnxConfig":
|
||||
"""
|
||||
Instantiate a OnnxConfig for a specific model
|
||||
|
||||
Args:
|
||||
config: The model's configuration to use when exporting to ONNX
|
||||
|
||||
Returns:
|
||||
OnnxConfig for this model
|
||||
"""
|
||||
return cls(config, task=task)
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def inputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
"""
|
||||
Mapping containing the axis definition of the input tensors to provide to the model
|
||||
|
||||
Returns:
|
||||
For each input: its name associated to the axes symbolic name and the axis position within the tensor
|
||||
"""
|
||||
raise NotImplementedError()
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
"""
|
||||
Mapping containing the axis definition of the output tensors to provide to the model
|
||||
|
||||
Returns:
|
||||
For each output: its name associated to the axes symbolic name and the axis position within the tensor
|
||||
"""
|
||||
common_outputs = self._tasks_to_common_outputs[self.task]
|
||||
return copy.deepcopy(common_outputs)
|
||||
|
||||
@property
|
||||
def values_override(self) -> Optional[Mapping[str, Any]]:
|
||||
"""
|
||||
Dictionary of keys to override in the model's config before exporting
|
||||
|
||||
Returns:
|
||||
Dictionary with the keys (and their corresponding values) to override
|
||||
"""
|
||||
if hasattr(self._config, "use_cache"):
|
||||
return {"use_cache": False}
|
||||
|
||||
return None
|
||||
|
||||
@property
|
||||
def default_batch_size(self) -> int:
|
||||
"""
|
||||
The default batch size to use if no other indication
|
||||
|
||||
Returns:
|
||||
Integer > 0
|
||||
"""
|
||||
# Using 2 avoid ONNX making assumption about single sample batch
|
||||
return OnnxConfig.default_fixed_batch
|
||||
|
||||
@property
|
||||
def default_sequence_length(self) -> int:
|
||||
"""
|
||||
The default sequence length to use if no other indication
|
||||
|
||||
Returns:
|
||||
Integer > 0
|
||||
"""
|
||||
return OnnxConfig.default_fixed_sequence
|
||||
|
||||
@property
|
||||
def default_num_choices(self) -> int:
|
||||
"""
|
||||
The default number of choices to use if no other indication
|
||||
|
||||
Returns:
|
||||
Integer > 0
|
||||
"""
|
||||
return OnnxConfig.default_fixed_num_choices
|
||||
|
||||
@property
|
||||
def default_onnx_opset(self) -> int:
|
||||
"""
|
||||
Which onnx opset to use when exporting the model
|
||||
|
||||
Returns:
|
||||
Integer ONNX Opset version
|
||||
"""
|
||||
return DEFAULT_ONNX_OPSET
|
||||
|
||||
@property
|
||||
def atol_for_validation(self) -> float:
|
||||
"""
|
||||
What absolute tolerance value to use during model conversion validation.
|
||||
|
||||
Returns:
|
||||
Float absolute tolerance value.
|
||||
"""
|
||||
return 1e-5
|
||||
|
||||
@property
|
||||
def is_torch_support_available(self) -> bool:
|
||||
"""
|
||||
The minimum PyTorch version required to export the model.
|
||||
|
||||
Returns:
|
||||
`bool`: Whether the installed version of PyTorch is compatible with the model.
|
||||
"""
|
||||
if is_torch_available():
|
||||
from transformers.utils import get_torch_version
|
||||
|
||||
return version.parse(get_torch_version()) >= self.torch_onnx_minimum_version
|
||||
else:
|
||||
return False
|
||||
|
||||
@staticmethod
|
||||
def use_external_data_format(num_parameters: int) -> bool:
|
||||
"""
|
||||
Flag indicating if the model requires using external data format
|
||||
|
||||
Args:
|
||||
num_parameters: Number of parameter on the model
|
||||
|
||||
Returns:
|
||||
True if model.num_parameters() * size_of(float32) >= 2Gb False otherwise
|
||||
"""
|
||||
|
||||
return (
|
||||
compute_serialized_parameters_size(num_parameters, ParameterFormat.Float)
|
||||
>= EXTERNAL_DATA_FORMAT_SIZE_LIMIT
|
||||
)
|
||||
|
||||
def _generate_dummy_images(
|
||||
self, batch_size: int = 2, num_channels: int = 3, image_height: int = 40, image_width: int = 40
|
||||
):
|
||||
images = []
|
||||
for _ in range(batch_size):
|
||||
data = np.random.rand(image_height, image_width, num_channels) * 255
|
||||
images.append(Image.fromarray(data.astype("uint8")).convert("RGB"))
|
||||
return images
|
||||
|
||||
def _generate_dummy_audio(
|
||||
self, batch_size: int = 2, sampling_rate: int = 22050, time_duration: float = 5.0, frequency: int = 220
|
||||
):
|
||||
audio_data = []
|
||||
for _ in range(batch_size):
|
||||
# time variable
|
||||
t = np.linspace(0, time_duration, int(time_duration * sampling_rate), endpoint=False)
|
||||
|
||||
# generate pure sine wave at `frequency` Hz
|
||||
audio_data.append(0.5 * np.sin(2 * np.pi * frequency * t))
|
||||
|
||||
return audio_data
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
preprocessor: Union["PreTrainedTokenizerBase", "FeatureExtractionMixin", "ImageProcessingMixin"],
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
num_choices: int = -1,
|
||||
is_pair: bool = False,
|
||||
num_channels: int = 3,
|
||||
image_width: int = 40,
|
||||
image_height: int = 40,
|
||||
sampling_rate: int = 22050,
|
||||
time_duration: float = 5.0,
|
||||
frequency: int = 220,
|
||||
tokenizer: Optional["PreTrainedTokenizerBase"] = None,
|
||||
) -> Mapping[str, Any]:
|
||||
"""
|
||||
Generate inputs to provide to the ONNX exporter
|
||||
|
||||
Args:
|
||||
preprocessor: ([`PreTrainedTokenizerBase`], [`FeatureExtractionMixin`], or [`ImageProcessingMixin`]):
|
||||
The preprocessor associated with this model configuration.
|
||||
batch_size (`int`, *optional*, defaults to -1):
|
||||
The batch size to export the model for (-1 means dynamic axis).
|
||||
num_choices (`int`, *optional*, defaults to -1):
|
||||
The number of candidate answers provided for multiple choice task (-1 means dynamic axis).
|
||||
seq_length (`int`, *optional*, defaults to -1):
|
||||
The sequence length to export the model for (-1 means dynamic axis).
|
||||
is_pair (`bool`, *optional*, defaults to `False`):
|
||||
Indicate if the input is a pair (sentence 1, sentence 2)
|
||||
num_channels (`int`, *optional*, defaults to 3):
|
||||
The number of channels of the generated images.
|
||||
image_width (`int`, *optional*, defaults to 40):
|
||||
The width of the generated images.
|
||||
image_height (`int`, *optional*, defaults to 40):
|
||||
The height of the generated images.
|
||||
sampling_rate (`int`, *optional* defaults to 22050)
|
||||
The sampling rate for audio data generation.
|
||||
time_duration (`float`, *optional* defaults to 5.0)
|
||||
Total seconds of sampling for audio data generation.
|
||||
frequency (`int`, *optional* defaults to 220)
|
||||
The desired natural frequency of generated audio.
|
||||
|
||||
Returns:
|
||||
Mapping[str, Tensor] holding the kwargs to provide to the model's forward function
|
||||
"""
|
||||
from ..feature_extraction_utils import FeatureExtractionMixin
|
||||
from ..image_processing_utils import ImageProcessingMixin
|
||||
from ..tokenization_utils_base import PreTrainedTokenizerBase
|
||||
|
||||
if isinstance(preprocessor, PreTrainedTokenizerBase) and tokenizer is not None:
|
||||
raise ValueError("You cannot provide both a tokenizer and a preprocessor to generate dummy inputs.")
|
||||
if tokenizer is not None:
|
||||
warnings.warn(
|
||||
"The `tokenizer` argument is deprecated and will be removed in version 5 of Transformers. Use"
|
||||
" `preprocessor` instead.",
|
||||
FutureWarning,
|
||||
)
|
||||
logger.warning("Overwriting the `preprocessor` argument with `tokenizer` to generate dummy inputs.")
|
||||
preprocessor = tokenizer
|
||||
if isinstance(preprocessor, PreTrainedTokenizerBase):
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(
|
||||
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
|
||||
)
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
|
||||
token_to_add = preprocessor.num_special_tokens_to_add(is_pair)
|
||||
seq_length = compute_effective_axis_dimension(
|
||||
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
|
||||
)
|
||||
# Generate dummy inputs according to compute batch and sequence
|
||||
input_token = (
|
||||
preprocessor.unk_token
|
||||
if (preprocessor.unk_token is not None and len(preprocessor.unk_token) > 0)
|
||||
else "0"
|
||||
)
|
||||
dummy_input = [" ".join([input_token]) * seq_length] * batch_size
|
||||
if self.task == "multiple-choice":
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 4 candidate answers to avoid optimizations
|
||||
# made by ONNX
|
||||
num_choices = compute_effective_axis_dimension(
|
||||
num_choices, fixed_dimension=OnnxConfig.default_fixed_num_choices, num_token_to_add=0
|
||||
)
|
||||
dummy_input = dummy_input * num_choices
|
||||
# The shape of the tokenized inputs values is [batch_size * num_choices, seq_length]
|
||||
tokenized_input = preprocessor(dummy_input, text_pair=dummy_input)
|
||||
# Unflatten the tokenized inputs values expanding it to the shape [batch_size, num_choices, seq_length]
|
||||
for k, v in tokenized_input.items():
|
||||
tokenized_input[k] = [v[i : i + num_choices] for i in range(0, len(v), num_choices)]
|
||||
return dict(tokenized_input.convert_to_tensors(tensor_type="pt"))
|
||||
return dict(preprocessor(dummy_input, return_tensors="pt"))
|
||||
elif isinstance(preprocessor, ImageProcessingMixin):
|
||||
if preprocessor.model_input_names[0] != "pixel_values":
|
||||
raise ValueError(
|
||||
f"The `preprocessor` is an image processor ({preprocessor.__class__.__name__}) and expects"
|
||||
f' `model_input_names[0]` to be "pixel_values", but got {preprocessor.model_input_names[0]}'
|
||||
)
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(batch_size, fixed_dimension=OnnxConfig.default_fixed_batch)
|
||||
dummy_input = self._generate_dummy_images(batch_size, num_channels, image_height, image_width)
|
||||
return dict(preprocessor(images=dummy_input, return_tensors="pt"))
|
||||
elif isinstance(preprocessor, FeatureExtractionMixin) and preprocessor.model_input_names[0] == "pixel_values":
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(batch_size, fixed_dimension=OnnxConfig.default_fixed_batch)
|
||||
dummy_input = self._generate_dummy_images(batch_size, num_channels, image_height, image_width)
|
||||
return dict(preprocessor(images=dummy_input, return_tensors="pt"))
|
||||
elif (
|
||||
isinstance(preprocessor, FeatureExtractionMixin) and preprocessor.model_input_names[0] == "input_features"
|
||||
):
|
||||
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
|
||||
batch_size = compute_effective_axis_dimension(batch_size, fixed_dimension=OnnxConfig.default_fixed_batch)
|
||||
dummy_input = self._generate_dummy_audio(batch_size, sampling_rate, time_duration, frequency)
|
||||
return dict(preprocessor(dummy_input, return_tensors="pt"))
|
||||
else:
|
||||
raise ValueError(
|
||||
"Unable to generate dummy inputs for the model. Please provide a tokenizer or a preprocessor."
|
||||
)
|
||||
|
||||
def generate_dummy_inputs_onnxruntime(self, reference_model_inputs: Mapping[str, Any]) -> Mapping[str, Any]:
|
||||
"""
|
||||
Generate inputs for ONNX Runtime using the reference model inputs. Override this to run inference with seq2seq
|
||||
models which have the encoder and decoder exported as separate ONNX files.
|
||||
|
||||
Args:
|
||||
reference_model_inputs ([`Mapping[str, Tensor]`):
|
||||
Reference inputs for the model.
|
||||
|
||||
Returns:
|
||||
`Mapping[str, Tensor]`: The mapping holding the kwargs to provide to the model's forward function
|
||||
"""
|
||||
return reference_model_inputs
|
||||
|
||||
def patch_ops(self):
|
||||
for spec in self._patching_specs:
|
||||
custom_op = spec.custom_op if spec.op_wrapper is None else spec.op_wrapper(spec.custom_op)
|
||||
setattr(spec.o, spec.name, custom_op)
|
||||
|
||||
def restore_ops(self):
|
||||
for spec in self._patching_specs:
|
||||
orig_op = spec.orig_op if spec.op_wrapper is None else spec.op_wrapper(spec.orig_op)
|
||||
setattr(spec.o, spec.name, orig_op)
|
||||
|
||||
@classmethod
|
||||
def flatten_output_collection_property(cls, name: str, field: Iterable[Any]) -> dict[str, Any]:
|
||||
"""
|
||||
Flatten any potential nested structure expanding the name of the field with the index of the element within the
|
||||
structure.
|
||||
|
||||
Args:
|
||||
name: The name of the nested structure
|
||||
field: The structure to, potentially, be flattened
|
||||
|
||||
Returns:
|
||||
(dict[str, Any]): Outputs with flattened structure and key mapping this new structure.
|
||||
|
||||
"""
|
||||
from itertools import chain
|
||||
|
||||
return {f"{name}.{idx}": item for idx, item in enumerate(chain.from_iterable(field))}
|
||||
|
||||
|
||||
class OnnxConfigWithPast(OnnxConfig, ABC):
|
||||
def __init__(
|
||||
self,
|
||||
config: "PreTrainedConfig",
|
||||
task: str = "default",
|
||||
patching_specs: Optional[list[PatchingSpec]] = None,
|
||||
use_past: bool = False,
|
||||
):
|
||||
super().__init__(config, task=task, patching_specs=patching_specs)
|
||||
self.use_past = use_past
|
||||
|
||||
@classmethod
|
||||
def with_past(cls, config: "PreTrainedConfig", task: str = "default") -> "OnnxConfigWithPast":
|
||||
"""
|
||||
Instantiate a OnnxConfig with `use_past` attribute set to True
|
||||
|
||||
Args:
|
||||
config: The underlying model's config to use when exporting to ONNX
|
||||
|
||||
Returns:
|
||||
OnnxConfig with `.use_past = True`
|
||||
"""
|
||||
return cls(config, task=task, use_past=True)
|
||||
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_outputs = super().outputs
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_outputs, direction="outputs")
|
||||
|
||||
return common_outputs
|
||||
|
||||
@property
|
||||
def values_override(self) -> Optional[Mapping[str, Any]]:
|
||||
if hasattr(self._config, "use_cache"):
|
||||
return {"use_cache": self.use_past}
|
||||
|
||||
return None
|
||||
|
||||
@property
|
||||
def num_layers(self) -> int:
|
||||
"""
|
||||
The number of layers attribute retrieved from the model config. Override this for model configs where the
|
||||
number of layers attribute is not called `num_layers`.
|
||||
"""
|
||||
if not hasattr(self._config, "num_layers"):
|
||||
raise AttributeError(
|
||||
"could not find the number of layers attribute in the model configuration, override the num_layers"
|
||||
" property of the model OnnxConfig to solve this"
|
||||
)
|
||||
return self._config.num_layers
|
||||
|
||||
@property
|
||||
def num_attention_heads(self) -> int:
|
||||
"""
|
||||
The number of attention heads attribute retrieved from the model config. Override this for model configs where
|
||||
the number of attention heads attribute is not called `num_attention_heads`.
|
||||
"""
|
||||
if not hasattr(self._config, "num_attention_heads"):
|
||||
raise AttributeError(
|
||||
"could not find the number of attention heads attribute in the model configuration, override the"
|
||||
" num_attention_heads property of the model OnnxConfig to solve this"
|
||||
)
|
||||
return self._config.num_attention_heads
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: "PreTrainedTokenizerBase",
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
# TODO: should we set seq_length = 1 when self.use_past = True?
|
||||
common_inputs = super().generate_dummy_inputs(
|
||||
tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
|
||||
batch, seqlen = common_inputs["input_ids"].shape
|
||||
# Not using the same length for past_key_values
|
||||
past_key_values_length = seqlen + 2
|
||||
shape = (
|
||||
batch,
|
||||
self.num_attention_heads,
|
||||
past_key_values_length,
|
||||
self._config.hidden_size // self.num_attention_heads,
|
||||
)
|
||||
|
||||
if "attention_mask" in common_inputs:
|
||||
mask_dtype = common_inputs["attention_mask"].dtype
|
||||
common_inputs["attention_mask"] = torch.cat(
|
||||
[common_inputs["attention_mask"], torch.ones(batch, past_key_values_length, dtype=mask_dtype)],
|
||||
dim=1,
|
||||
)
|
||||
|
||||
common_inputs["past_key_values"] = []
|
||||
for _ in range(self.num_layers):
|
||||
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
|
||||
|
||||
return common_inputs
|
||||
|
||||
def fill_with_past_key_values_(
|
||||
self, inputs_or_outputs: Mapping[str, Mapping[int, str]], direction: str, inverted_values_shape: bool = False
|
||||
):
|
||||
"""
|
||||
Fill the input_or_outputs mapping with past_key_values dynamic axes considering.
|
||||
|
||||
Args:
|
||||
inputs_or_outputs: The mapping to fill.
|
||||
direction: either "inputs" or "outputs", it specifies whether input_or_outputs is the input mapping or the
|
||||
output mapping, this is important for axes naming.
|
||||
inverted_values_shape:
|
||||
If `True`, store values on dynamic axis 1, else on axis 2.
|
||||
|
||||
"""
|
||||
if direction not in ["inputs", "outputs"]:
|
||||
raise ValueError(f'direction must either be "inputs" or "outputs", but {direction} was given')
|
||||
|
||||
name = "past_key_values" if direction == "inputs" else "present"
|
||||
for i in range(self.num_layers):
|
||||
inputs_or_outputs[f"{name}.{i}.key"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
if inverted_values_shape:
|
||||
inputs_or_outputs[f"{name}.{i}.value"] = {0: "batch", 1: "past_sequence + sequence"}
|
||||
else:
|
||||
inputs_or_outputs[f"{name}.{i}.value"] = {0: "batch", 2: "past_sequence + sequence"}
|
||||
|
||||
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
|
||||
flattened_output[f"{name}.{idx}.key"] = t[0]
|
||||
flattened_output[f"{name}.{idx}.value"] = t[1]
|
||||
|
||||
def flatten_output_collection_property(self, name: str, field: Iterable[Any]) -> dict[str, Any]:
|
||||
flattened_output = {}
|
||||
if name in ["present", "past_key_values"]:
|
||||
for idx, t in enumerate(field):
|
||||
self._flatten_past_key_values_(flattened_output, name, idx, t)
|
||||
else:
|
||||
flattened_output = super().flatten_output_collection_property(name, field)
|
||||
|
||||
return flattened_output
|
||||
|
||||
|
||||
class OnnxSeq2SeqConfigWithPast(OnnxConfigWithPast):
|
||||
@property
|
||||
def outputs(self) -> Mapping[str, Mapping[int, str]]:
|
||||
common_outputs = super(OnnxConfigWithPast, self).outputs
|
||||
# Renaming the outputs axes properly.
|
||||
for name, axes_names in common_outputs.items():
|
||||
sequence_name = "encoder_sequence" if "encoder" in name else "decoder_sequence"
|
||||
for axis_idx, name in axes_names.items():
|
||||
if "sequence" in name:
|
||||
axes_names[axis_idx] = sequence_name
|
||||
# We reset the value as the order in common_outputs (OrderedDict) is lost otherwise
|
||||
else:
|
||||
axes_names[axis_idx] = name
|
||||
if self.use_past:
|
||||
self.fill_with_past_key_values_(common_outputs, direction="outputs")
|
||||
|
||||
return common_outputs
|
||||
|
||||
@property
|
||||
def num_layers(self) -> tuple[int, ...]:
|
||||
try:
|
||||
num_layers = super().num_layers
|
||||
num_layers = (num_layers, num_layers)
|
||||
except AttributeError:
|
||||
if hasattr(self._config, "encoder_layers") and hasattr(self._config, "decoder_layers"):
|
||||
num_layers = (self._config.encoder_layers, self._config.decoder_layers)
|
||||
else:
|
||||
raise AttributeError(
|
||||
"could not find the number of encoder and decoder layers attributes in the model configuration,"
|
||||
" override the num_layers property of the model OnnxConfig to solve this"
|
||||
)
|
||||
|
||||
return num_layers
|
||||
|
||||
@property
|
||||
def num_attention_heads(self) -> tuple[int, ...]:
|
||||
try:
|
||||
num_attention_heads = super().num_attention_heads
|
||||
num_attention_heads = (num_attention_heads, num_attention_heads)
|
||||
except AttributeError:
|
||||
if hasattr(self._config, "encoder_attention_heads") and hasattr(self._config, "decoder_attention_heads"):
|
||||
num_attention_heads = (self._config.encoder_attention_heads, self._config.decoder_attention_heads)
|
||||
else:
|
||||
raise AttributeError(
|
||||
"could not find the number of attention heads for the encoder and the decoder attributes in the"
|
||||
" model configuration, override the num_attention_heads property of the model OnnxConfig to solve"
|
||||
" this"
|
||||
)
|
||||
return num_attention_heads
|
||||
|
||||
def generate_dummy_inputs(
|
||||
self,
|
||||
tokenizer: Optional["PreTrainedTokenizerBase"],
|
||||
batch_size: int = -1,
|
||||
seq_length: int = -1,
|
||||
is_pair: bool = False,
|
||||
) -> Mapping[str, Any]:
|
||||
encoder_inputs = super(OnnxConfigWithPast, self).generate_dummy_inputs(
|
||||
tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
|
||||
# Generate decoder inputs
|
||||
decoder_seq_length = seq_length if not self.use_past else 1
|
||||
decoder_inputs = super(OnnxConfigWithPast, self).generate_dummy_inputs(
|
||||
tokenizer,
|
||||
batch_size=batch_size,
|
||||
seq_length=decoder_seq_length,
|
||||
is_pair=is_pair,
|
||||
)
|
||||
decoder_inputs = {f"decoder_{name}": tensor for name, tensor in decoder_inputs.items()}
|
||||
common_inputs = dict(**encoder_inputs, **decoder_inputs)
|
||||
|
||||
if self.use_past:
|
||||
if not is_torch_available():
|
||||
raise ValueError("Cannot generate dummy past_keys inputs without PyTorch installed.")
|
||||
else:
|
||||
import torch
|
||||
batch = common_inputs["input_ids"].shape[0]
|
||||
encoder_seq_length = common_inputs["input_ids"].shape[1]
|
||||
decoder_seq_length = common_inputs["decoder_input_ids"].shape[1]
|
||||
num_encoder_attention_heads, num_decoder_attention_heads = self.num_attention_heads
|
||||
encoder_shape = (
|
||||
batch,
|
||||
num_encoder_attention_heads,
|
||||
encoder_seq_length,
|
||||
self._config.hidden_size // num_encoder_attention_heads,
|
||||
)
|
||||
decoder_shape = (
|
||||
batch,
|
||||
num_decoder_attention_heads,
|
||||
# Not using the same length for past_key_values
|
||||
decoder_seq_length + 3,
|
||||
self._config.hidden_size // num_decoder_attention_heads,
|
||||
)
|
||||
|
||||
common_inputs["past_key_values"] = []
|
||||
# If the number of encoder and decoder layers are present in the model configuration, both are considered
|
||||
num_encoder_layers, num_decoder_layers = self.num_layers
|
||||
min_num_layers = min(num_encoder_layers, num_decoder_layers)
|
||||
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
|
||||
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
|
||||
|
||||
for _ in range(min_num_layers):
|
||||
# For encoder-decoder models, past_key_values contains pre-computed values for both the encoder and the
|
||||
# decoder layers, hence a tuple of 4 tensors instead of 2
|
||||
common_inputs["past_key_values"].append(
|
||||
(
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(decoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
torch.zeros(encoder_shape),
|
||||
)
|
||||
)
|
||||
|
||||
# TODO: test this.
|
||||
shape = encoder_shape if remaining_side_name == "encoder" else decoder_shape
|
||||
for _ in range(min_num_layers, max_num_layers):
|
||||
common_inputs["past_key_values"].append((torch.zeros(shape), torch.zeros(shape)))
|
||||
|
||||
return common_inputs
|
||||
|
||||
def fill_with_past_key_values_(self, inputs_or_outputs: Mapping[str, Mapping[int, str]], direction: str):
|
||||
if direction not in ["inputs", "outputs"]:
|
||||
raise ValueError(f'direction must either be "inputs" or "outputs", but {direction} was given')
|
||||
|
||||
name = "past_key_values" if direction == "inputs" else "present"
|
||||
|
||||
# If the number of encoder and decoder layers are present in the model configuration, both are considered
|
||||
num_encoder_layers, num_decoder_layers = self.num_layers
|
||||
min_num_layers = min(num_encoder_layers, num_decoder_layers)
|
||||
max_num_layers = max(num_encoder_layers, num_decoder_layers) - min_num_layers
|
||||
remaining_side_name = "encoder" if num_encoder_layers > num_decoder_layers else "decoder"
|
||||
|
||||
encoder_sequence = "past_encoder_sequence"
|
||||
decoder_sequence = "past_decoder_sequence" if direction == "inputs" else "past_decoder_sequence + sequence"
|
||||
|
||||
for i in range(min_num_layers):
|
||||
inputs_or_outputs[f"{name}.{i}.decoder.key"] = {0: "batch", 2: decoder_sequence}
|
||||
inputs_or_outputs[f"{name}.{i}.decoder.value"] = {0: "batch", 2: decoder_sequence}
|
||||
inputs_or_outputs[f"{name}.{i}.encoder.key"] = {0: "batch", 2: encoder_sequence}
|
||||
inputs_or_outputs[f"{name}.{i}.encoder.value"] = {0: "batch", 2: encoder_sequence}
|
||||
|
||||
for i in range(min_num_layers, max_num_layers):
|
||||
if remaining_side_name == "encoder":
|
||||
axes_info = {0: "batch", 2: encoder_sequence}
|
||||
else:
|
||||
axes_info = {0: "batch", 2: decoder_sequence}
|
||||
inputs_or_outputs[f"{name}.{i}.{remaining_side_name}.key"] = axes_info
|
||||
|
||||
def _flatten_past_key_values_(self, flattened_output, name, idx, t):
|
||||
flattened_output[f"{name}.{idx}.decoder.key"] = t[0]
|
||||
flattened_output[f"{name}.{idx}.decoder.value"] = t[1]
|
||||
flattened_output[f"{name}.{idx}.encoder.key"] = t[2]
|
||||
flattened_output[f"{name}.{idx}.encoder.value"] = t[3]
|
109
src/transformers/onnx/utils.py
Normal file
109
src/transformers/onnx/utils.py
Normal file
@ -0,0 +1,109 @@
|
||||
# Copyright 2021 The HuggingFace Team. All rights reserved.
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
from ctypes import c_float, sizeof
|
||||
from enum import Enum
|
||||
from typing import TYPE_CHECKING, Optional, Union
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from .. import AutoFeatureExtractor, AutoProcessor, AutoTokenizer # tests_ignore
|
||||
|
||||
|
||||
class ParameterFormat(Enum):
|
||||
Float = c_float
|
||||
|
||||
@property
|
||||
def size(self) -> int:
|
||||
"""
|
||||
Number of byte required for this data type
|
||||
|
||||
Returns:
|
||||
Integer > 0
|
||||
"""
|
||||
return sizeof(self.value)
|
||||
|
||||
|
||||
def compute_effective_axis_dimension(dimension: int, fixed_dimension: int, num_token_to_add: int = 0) -> int:
|
||||
"""
|
||||
|
||||
Args:
|
||||
dimension:
|
||||
fixed_dimension:
|
||||
num_token_to_add:
|
||||
|
||||
Returns:
|
||||
|
||||
"""
|
||||
# < 0 is possible if using a dynamic axis
|
||||
if dimension <= 0:
|
||||
dimension = fixed_dimension
|
||||
|
||||
dimension -= num_token_to_add
|
||||
return dimension
|
||||
|
||||
|
||||
def compute_serialized_parameters_size(num_parameters: int, dtype: ParameterFormat) -> int:
|
||||
"""
|
||||
Compute the size taken by all the parameters in the given the storage format when serializing the model
|
||||
|
||||
Args:
|
||||
num_parameters: Number of parameters to be saved
|
||||
dtype: The data format each parameter will be saved
|
||||
|
||||
Returns:
|
||||
Size (in byte) taken to save all the parameters
|
||||
"""
|
||||
return num_parameters * dtype.size
|
||||
|
||||
|
||||
def get_preprocessor(model_name: str) -> Optional[Union["AutoTokenizer", "AutoFeatureExtractor", "AutoProcessor"]]:
|
||||
"""
|
||||
Gets a preprocessor (tokenizer, feature extractor or processor) that is available for `model_name`.
|
||||
|
||||
Args:
|
||||
model_name (`str`): Name of the model for which a preprocessor are loaded.
|
||||
|
||||
Returns:
|
||||
`Optional[Union[AutoTokenizer, AutoFeatureExtractor, AutoProcessor]]`:
|
||||
If a processor is found, it is returned. Otherwise, if a tokenizer or a feature extractor exists, it is
|
||||
returned. If both a tokenizer and a feature extractor exist, an error is raised. The function returns
|
||||
`None` if no preprocessor is found.
|
||||
"""
|
||||
# Avoid circular imports by only importing this here.
|
||||
from .. import AutoFeatureExtractor, AutoProcessor, AutoTokenizer # tests_ignore
|
||||
|
||||
try:
|
||||
return AutoProcessor.from_pretrained(model_name)
|
||||
except (ValueError, OSError, KeyError):
|
||||
tokenizer, feature_extractor = None, None
|
||||
try:
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
except (OSError, KeyError):
|
||||
pass
|
||||
try:
|
||||
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)
|
||||
except (OSError, KeyError):
|
||||
pass
|
||||
|
||||
if tokenizer is not None and feature_extractor is not None:
|
||||
raise ValueError(
|
||||
f"Couldn't auto-detect preprocessor for {model_name}. Found both a tokenizer and a feature extractor."
|
||||
)
|
||||
elif tokenizer is None and feature_extractor is None:
|
||||
return None
|
||||
elif tokenizer is not None:
|
||||
return tokenizer
|
||||
else:
|
||||
return feature_extractor
|
@ -209,6 +209,7 @@ if is_peft_available():
|
||||
|
||||
if is_accelerate_available():
|
||||
from accelerate import Accelerator, skip_first_batches
|
||||
from accelerate import __version__ as accelerate_version
|
||||
from accelerate.state import AcceleratorState
|
||||
from accelerate.utils import (
|
||||
DataLoaderConfiguration,
|
||||
@ -4966,12 +4967,7 @@ class Trainer:
|
||||
# this would have been updated above, no need for it anymore
|
||||
accelerator_config.pop("gradient_accumulation_kwargs")
|
||||
|
||||
args = {
|
||||
"mixed_precision": self.args.mixed_precision,
|
||||
"dataloader_config": dataloader_config,
|
||||
"fsdp_plugin": self.args.fsdp_plugin,
|
||||
"deepspeed_plugin": self.args.deepspeed_plugin,
|
||||
}
|
||||
args = {"deepspeed_plugin": self.args.deepspeed_plugin, "dataloader_config": dataloader_config}
|
||||
|
||||
# We defer compatibility checks to accelerator
|
||||
if self.args.parallelism_config is not None:
|
||||
@ -4985,7 +4981,7 @@ class Trainer:
|
||||
if getattr(self.model, "tp_size", None) is not None and self.model.tp_size > 1:
|
||||
self.is_tp_enabled = True
|
||||
if self.args.parallelism_config is not None:
|
||||
if is_accelerate_available("1.10.1"):
|
||||
if version.parse(accelerate_version) > version.parse("1.10.1"):
|
||||
if self.args.parallelism_config is not None:
|
||||
from accelerate import ParallelismConfig
|
||||
|
||||
@ -4993,15 +4989,6 @@ class Trainer:
|
||||
else:
|
||||
raise ValueError("Requires accelerate>1.10.1 to use Tensor Parallelism.")
|
||||
|
||||
if is_accelerate_available("1.2.0"):
|
||||
# it we don't have the correct version, we will rely on env var instead that were set in TrainingArguments
|
||||
from accelerate.utils import TorchDynamoPlugin
|
||||
|
||||
dynamo_plugin = TorchDynamoPlugin(
|
||||
backend=self.args.torch_compile_backend, mode=self.args.torch_compile_mode
|
||||
)
|
||||
args["dynamo_plugin"] = dynamo_plugin
|
||||
|
||||
# create accelerator object
|
||||
self.accelerator = Accelerator(**args)
|
||||
# some Trainer classes need to use `gather` instead of `gather_for_metrics`, thus we store a flag
|
||||
|
@ -462,8 +462,6 @@ class TrainingArguments:
|
||||
fsdp json config file (e.g., `fsdp_config.json`) or an already loaded json file as `dict`.
|
||||
|
||||
A List of config and its options:
|
||||
- fsdp_version (`int`, *optional*, defaults to `1`):
|
||||
The version of FSDP to use. Defaults to 1.
|
||||
- min_num_params (`int`, *optional*, defaults to `0`):
|
||||
FSDP's minimum number of parameters for Default Auto Wrapping. (useful only when `fsdp` field is
|
||||
passed).
|
||||
@ -1536,28 +1534,6 @@ class TrainingArguments:
|
||||
|
||||
self.optim = OptimizerNames(self.optim)
|
||||
|
||||
# We need to get from the env as we are setting deepspeed mixed precison afterwards
|
||||
self.mixed_precision = os.environ.get("ACCELERATE_MIXED_PRECISION", "no")
|
||||
if self.fp16:
|
||||
self.mixed_precision = "fp16"
|
||||
elif self.bf16:
|
||||
self.mixed_precision = "bf16"
|
||||
|
||||
if (self.torch_compile_mode is not None or self.torch_compile_backend is not None) and not self.torch_compile:
|
||||
self.torch_compile = True
|
||||
if self.torch_compile and self.torch_compile_backend is None:
|
||||
if not self.use_cpu and is_torch_hpu_available():
|
||||
self.torch_compile_backend = "hpu_backend"
|
||||
else:
|
||||
self.torch_compile_backend = "inductor"
|
||||
|
||||
if self.torch_compile:
|
||||
# TODO: remove this once we've bumped the minimum accelerate version
|
||||
if not is_accelerate_available("1.2.0"):
|
||||
os.environ["ACCELERATE_DYNAMO_BACKEND"] = self.torch_compile_backend
|
||||
if self.torch_compile_mode is not None:
|
||||
os.environ["ACCELERATE_DYNAMO_MODE"] = self.torch_compile_mode
|
||||
|
||||
# We need to setup the accelerator config here *before* the first call to `self.device`
|
||||
if is_accelerate_available():
|
||||
if not isinstance(self.accelerator_config, AcceleratorConfig):
|
||||
@ -1584,6 +1560,22 @@ class TrainingArguments:
|
||||
if is_torch_available():
|
||||
self.device
|
||||
|
||||
if (self.torch_compile_mode is not None or self.torch_compile_backend is not None) and not self.torch_compile:
|
||||
self.torch_compile = True
|
||||
if self.torch_compile and self.torch_compile_backend is None:
|
||||
if not self.use_cpu and is_torch_hpu_available():
|
||||
self.torch_compile_backend = "hpu_backend"
|
||||
else:
|
||||
self.torch_compile_backend = "inductor"
|
||||
|
||||
# accelerate integration for torch compile
|
||||
if self.torch_compile:
|
||||
# set env vars for accelerate
|
||||
prefix = "ACCELERATE_DYNAMO_"
|
||||
os.environ[prefix + "BACKEND"] = self.torch_compile_backend
|
||||
if self.torch_compile_mode is not None:
|
||||
os.environ[prefix + "MODE"] = self.torch_compile_mode
|
||||
|
||||
if is_torch_available() and self.torch_compile:
|
||||
if is_torch_tf32_available():
|
||||
if self.tf32 is None and not self.fp16 or self.bf16:
|
||||
@ -1620,6 +1612,14 @@ class TrainingArguments:
|
||||
torch.backends.cudnn.allow_tf32 = False
|
||||
# no need to assert on else
|
||||
|
||||
# if training args is specified, it will override the one specified in the accelerate config
|
||||
mixed_precision_dtype = os.environ.get("ACCELERATE_MIXED_PRECISION", "no")
|
||||
if self.fp16:
|
||||
mixed_precision_dtype = "fp16"
|
||||
elif self.bf16:
|
||||
mixed_precision_dtype = "bf16"
|
||||
os.environ["ACCELERATE_MIXED_PRECISION"] = mixed_precision_dtype
|
||||
|
||||
if self.report_to == "all" or self.report_to == ["all"]:
|
||||
# Import at runtime to avoid a circular import.
|
||||
from .integrations import get_available_reporting_integrations
|
||||
@ -1648,19 +1648,130 @@ class TrainingArguments:
|
||||
if not isinstance(self.warmup_steps, int) or self.warmup_steps < 0:
|
||||
raise ValueError("warmup_steps must be of type int and must be 0 or a positive integer.")
|
||||
|
||||
if self.fsdp is None:
|
||||
self.fsdp = []
|
||||
elif self.fsdp is True:
|
||||
self.fsdp = [FSDPOption.FULL_SHARD]
|
||||
elif isinstance(self.fsdp, str):
|
||||
self.fsdp = [FSDPOption(s) for s in self.fsdp.split()]
|
||||
|
||||
if self.fsdp == [FSDPOption.OFFLOAD]:
|
||||
raise ValueError(
|
||||
"`--fsdp offload` can't work on its own. It needs to be added to `--fsdp full_shard` or "
|
||||
'`--fsdp shard_grad_op`. For example, `--fsdp "full_shard offload"`.'
|
||||
)
|
||||
elif FSDPOption.FULL_SHARD in self.fsdp and FSDPOption.SHARD_GRAD_OP in self.fsdp:
|
||||
raise ValueError("`--fsdp full_shard` is not compatible with `--fsdp shard_grad_op`.")
|
||||
|
||||
if self.gradient_checkpointing and (
|
||||
FSDPOption.FULL_SHARD in self.fsdp or FSDPOption.HYBRID_SHARD in self.fsdp
|
||||
):
|
||||
logger.warning(
|
||||
"When using FSDP full shard, instead of using `gradient_checkpointing` in TrainingArguments, please"
|
||||
" use `activation_checkpointing` in `fsdp_config`. The former introduces a redundant AllGather"
|
||||
" operation in backward pass. Reference: https://github.com/huggingface/transformers/issues/30404"
|
||||
)
|
||||
|
||||
if self.fsdp_config is None:
|
||||
self.fsdp_config = {}
|
||||
|
||||
if isinstance(self.fsdp_config, str):
|
||||
if len(self.fsdp) == 0:
|
||||
warnings.warn("`--fsdp_config` is useful only when `--fsdp` is specified.")
|
||||
with open(self.fsdp_config, encoding="utf-8") as f:
|
||||
self.fsdp_config = json.load(f)
|
||||
|
||||
if self.fsdp_config is not None and isinstance(self.fsdp_config, dict):
|
||||
for k in list(self.fsdp_config.keys()):
|
||||
if k.startswith("fsdp_"):
|
||||
v = self.fsdp_config.pop(k)
|
||||
self.fsdp_config[k[5:]] = v
|
||||
|
||||
self.fsdp_config["min_num_params"] = self.fsdp_config.get("min_num_params", 0)
|
||||
|
||||
# if fsdp_config["transformer_layer_cls_to_wrap"] is specified as a string, convert it to a list with a single object
|
||||
if isinstance(self.fsdp_config.get("transformer_layer_cls_to_wrap", None), str):
|
||||
self.fsdp_config["transformer_layer_cls_to_wrap"] = [self.fsdp_config["transformer_layer_cls_to_wrap"]]
|
||||
|
||||
if len(self.fsdp) == 0 and self.fsdp_config["min_num_params"] > 0:
|
||||
warnings.warn("`min_num_params` is useful only when `--fsdp` is specified.")
|
||||
|
||||
if len(self.fsdp) == 0 and self.fsdp_config.get("transformer_layer_cls_to_wrap", None) is not None:
|
||||
warnings.warn("`transformer_layer_cls_to_wrap` is useful only when `--fsdp` is specified.")
|
||||
|
||||
if (
|
||||
len(self.fsdp) > 0
|
||||
and self.fsdp_config["min_num_params"] > 0
|
||||
and self.fsdp_config.get("transformer_layer_cls_to_wrap", None) is not None
|
||||
):
|
||||
raise ValueError("`min_num_params` and `transformer_layer_cls_to_wrap` are mutually exclusive.")
|
||||
self.fsdp_config["xla"] = self.fsdp_config.get("xla", False)
|
||||
self.fsdp_config["xla_fsdp_v2"] = self.fsdp_config.get("xla_fsdp_v2", False)
|
||||
self.fsdp_config["xla_fsdp_grad_ckpt"] = self.fsdp_config.get("xla_fsdp_grad_ckpt", False)
|
||||
if self.fsdp_config["xla"]:
|
||||
if len(self.fsdp) > 0:
|
||||
# store XLA fsdp configuration parameters into a dictionary
|
||||
# Copy the config to avoid modifying the original config (which may be used for JSON serialization)
|
||||
self.xla_fsdp_config = self.fsdp_config.get("xla_fsdp_settings", {}).copy()
|
||||
# apply appropriate string to torch.dtype conversions for parameters
|
||||
if "compute_dtype" in self.xla_fsdp_config:
|
||||
self.xla_fsdp_config["compute_dtype"] = getattr(torch, self.xla_fsdp_config["compute_dtype"])
|
||||
if "buffer_dtype" in self.xla_fsdp_config:
|
||||
self.xla_fsdp_config["buffer_dtype"] = getattr(torch, self.xla_fsdp_config["buffer_dtype"])
|
||||
else:
|
||||
warnings.warn("XLA FSDP can be used only when `--fsdp` is specified.")
|
||||
else:
|
||||
if self.fsdp_config["xla_fsdp_grad_ckpt"]:
|
||||
warnings.warn("`--xla_fsdp_grad_ckpt` is useful only when `--xla` is set to true.")
|
||||
|
||||
# accelerate integration for FSDP
|
||||
if len(self.fsdp) > 0 and not self.fsdp_config["xla"]:
|
||||
os.environ["ACCELERATE_USE_FSDP"] = "true"
|
||||
from accelerate.utils.constants import (
|
||||
FSDP_AUTO_WRAP_POLICY,
|
||||
FSDP_SHARDING_STRATEGY,
|
||||
)
|
||||
|
||||
prefix = "FSDP_"
|
||||
for fsdp_option in self.fsdp:
|
||||
if fsdp_option.upper() in FSDP_SHARDING_STRATEGY:
|
||||
# set environment variable for FSDP sharding strategy
|
||||
os.environ[f"{prefix}SHARDING_STRATEGY"] = str(
|
||||
FSDP_SHARDING_STRATEGY.index(fsdp_option.upper()) + 1
|
||||
)
|
||||
elif fsdp_option == FSDPOption.OFFLOAD:
|
||||
os.environ[f"{prefix}OFFLOAD_PARAMS"] = "true"
|
||||
elif fsdp_option == FSDPOption.AUTO_WRAP:
|
||||
os.environ[f"{prefix}AUTO_WRAP_POLICY"] = FSDP_AUTO_WRAP_POLICY[0]
|
||||
if self.fsdp_config["min_num_params"] > 0:
|
||||
os.environ[f"{prefix}MIN_NUM_PARAMS"] = str(self.fsdp_config["min_num_params"])
|
||||
os.environ[f"{prefix}AUTO_WRAP_POLICY"] = FSDP_AUTO_WRAP_POLICY[1]
|
||||
elif self.fsdp_config.get("transformer_layer_cls_to_wrap", None) is not None:
|
||||
os.environ[f"{prefix}TRANSFORMER_CLS_TO_WRAP"] = ",".join(
|
||||
self.fsdp_config["transformer_layer_cls_to_wrap"]
|
||||
)
|
||||
prefetch_policy = self.fsdp_config.get("backward_prefetch", "NO_PREFETCH")
|
||||
os.environ[f"{prefix}BACKWARD_PREFETCH"] = prefetch_policy.upper()
|
||||
os.environ[f"{prefix}FORWARD_PREFETCH"] = str(self.fsdp_config.get("forward_prefetch", "false")).lower()
|
||||
|
||||
sync_module_states = str(self.fsdp_config.get("sync_module_states", "true")).lower()
|
||||
cpu_ram_efficient_loading = str(self.fsdp_config.get("cpu_ram_efficient_loading", "false")).lower()
|
||||
|
||||
if sync_module_states == "false" and cpu_ram_efficient_loading == "true":
|
||||
# In this case, all the processes except the main process would have random weights leading
|
||||
# to unexpected behaviour during training, thus throwing error here to prevent it.
|
||||
raise ValueError('`sync_module_states` must be `"True"` if `cpu_ram_efficient_loading` is `"True"`')
|
||||
|
||||
os.environ[f"{prefix}SYNC_MODULE_STATES"] = sync_module_states
|
||||
os.environ[f"{prefix}CPU_RAM_EFFICIENT_LOADING"] = cpu_ram_efficient_loading
|
||||
|
||||
os.environ[f"{prefix}USE_ORIG_PARAMS"] = str(self.fsdp_config.get("use_orig_params", "true")).lower()
|
||||
|
||||
if isinstance(self.debug, str):
|
||||
self.debug = [DebugOption(s) for s in self.debug.split()]
|
||||
elif self.debug is None:
|
||||
self.debug = []
|
||||
|
||||
self.fsdp_plugin = None
|
||||
fsdp_plugin_args = self._process_fsdp_args()
|
||||
if fsdp_plugin_args is not None:
|
||||
# Accelerate FSDP Plugin
|
||||
from accelerate.utils import FullyShardedDataParallelPlugin
|
||||
|
||||
self.fsdp_plugin = FullyShardedDataParallelPlugin(**fsdp_plugin_args)
|
||||
|
||||
self.deepspeed_plugin = None
|
||||
if self.deepspeed:
|
||||
# - must be run very last in arg parsing, since it will use a lot of these settings.
|
||||
@ -1679,13 +1790,15 @@ class TrainingArguments:
|
||||
# Accelerate DeepSpeed Plugin
|
||||
from accelerate.utils import DeepSpeedPlugin
|
||||
|
||||
os.environ["ACCELERATE_USE_DEEPSPEED"] = "true"
|
||||
self.deepspeed_plugin = DeepSpeedPlugin(hf_ds_config=self.hf_deepspeed_config)
|
||||
elif strtobool(os.environ.get("ACCELERATE_USE_DEEPSPEED", "false")):
|
||||
# Accelerate DeepSpeed Plugin
|
||||
from accelerate.utils import DeepSpeedPlugin
|
||||
|
||||
self.deepspeed_plugin = DeepSpeedPlugin()
|
||||
self.deepspeed_plugin.set_mixed_precision(self.mixed_precision)
|
||||
mixed_precision = os.environ.get("ACCELERATE_MIXED_PRECISION", "no")
|
||||
self.deepspeed_plugin.set_mixed_precision(mixed_precision)
|
||||
self.deepspeed_plugin.set_deepspeed_weakref()
|
||||
|
||||
if self.use_cpu:
|
||||
@ -1766,6 +1879,8 @@ class TrainingArguments:
|
||||
else:
|
||||
AcceleratorState._reset_state(reset_partial_state=True)
|
||||
self.distributed_state = None
|
||||
if "ACCELERATE_USE_IPEX" not in os.environ:
|
||||
os.environ["ACCELERATE_USE_IPEX"] = "false"
|
||||
|
||||
self._n_gpu = 1
|
||||
if self.use_cpu or strtobool(os.environ.get("ACCELERATE_USE_CPU", "False")):
|
||||
@ -2647,127 +2762,6 @@ class TrainingArguments:
|
||||
self.data_seed = sampler_seed
|
||||
return self
|
||||
|
||||
def _process_fsdp_args(self):
|
||||
if self.fsdp is None:
|
||||
self.fsdp = []
|
||||
elif self.fsdp is True:
|
||||
self.fsdp = [FSDPOption.FULL_SHARD]
|
||||
elif isinstance(self.fsdp, str):
|
||||
self.fsdp = [FSDPOption(s) for s in self.fsdp.split()]
|
||||
|
||||
if self.fsdp == [FSDPOption.OFFLOAD]:
|
||||
raise ValueError(
|
||||
"`--fsdp offload` can't work on its own. It needs to be added to `--fsdp full_shard` or "
|
||||
'`--fsdp shard_grad_op`. For example, `--fsdp "full_shard offload"`.'
|
||||
)
|
||||
elif FSDPOption.FULL_SHARD in self.fsdp and FSDPOption.SHARD_GRAD_OP in self.fsdp:
|
||||
raise ValueError("`--fsdp full_shard` is not compatible with `--fsdp shard_grad_op`.")
|
||||
|
||||
if self.gradient_checkpointing and (
|
||||
FSDPOption.FULL_SHARD in self.fsdp or FSDPOption.HYBRID_SHARD in self.fsdp
|
||||
):
|
||||
logger.warning(
|
||||
"When using FSDP full shard, instead of using `gradient_checkpointing` in TrainingArguments, please"
|
||||
" use `activation_checkpointing` in `fsdp_config`. The former introduces a redundant AllGather"
|
||||
" operation in backward pass. Reference: https://github.com/huggingface/transformers/issues/30404"
|
||||
)
|
||||
|
||||
if self.fsdp_config is None:
|
||||
self.fsdp_config = {}
|
||||
|
||||
if isinstance(self.fsdp_config, str):
|
||||
if len(self.fsdp) == 0:
|
||||
warnings.warn("`--fsdp_config` is useful only when `--fsdp` is specified.")
|
||||
with open(self.fsdp_config, encoding="utf-8") as f:
|
||||
self.fsdp_config = json.load(f)
|
||||
|
||||
if self.fsdp_config is not None and isinstance(self.fsdp_config, dict):
|
||||
for k in list(self.fsdp_config.keys()):
|
||||
if k.startswith("fsdp_"):
|
||||
v = self.fsdp_config.pop(k)
|
||||
self.fsdp_config[k[5:]] = v
|
||||
|
||||
self.fsdp_config["min_num_params"] = self.fsdp_config.get("min_num_params", 0)
|
||||
|
||||
# if fsdp_config["transformer_layer_cls_to_wrap"] is specified as a string, convert it to a list with a single object
|
||||
if isinstance(self.fsdp_config.get("transformer_layer_cls_to_wrap", None), str):
|
||||
self.fsdp_config["transformer_layer_cls_to_wrap"] = [self.fsdp_config["transformer_layer_cls_to_wrap"]]
|
||||
|
||||
if len(self.fsdp) == 0 and self.fsdp_config["min_num_params"] > 0:
|
||||
warnings.warn("`min_num_params` is useful only when `--fsdp` is specified.")
|
||||
|
||||
if len(self.fsdp) == 0 and self.fsdp_config.get("transformer_layer_cls_to_wrap", None) is not None:
|
||||
warnings.warn("`transformer_layer_cls_to_wrap` is useful only when `--fsdp` is specified.")
|
||||
|
||||
if (
|
||||
len(self.fsdp) > 0
|
||||
and self.fsdp_config["min_num_params"] > 0
|
||||
and self.fsdp_config.get("transformer_layer_cls_to_wrap", None) is not None
|
||||
):
|
||||
raise ValueError("`min_num_params` and `transformer_layer_cls_to_wrap` are mutually exclusive.")
|
||||
self.fsdp_config["xla"] = self.fsdp_config.get("xla", False)
|
||||
self.fsdp_config["xla_fsdp_v2"] = self.fsdp_config.get("xla_fsdp_v2", False)
|
||||
self.fsdp_config["xla_fsdp_grad_ckpt"] = self.fsdp_config.get("xla_fsdp_grad_ckpt", False)
|
||||
if self.fsdp_config["xla"]:
|
||||
if len(self.fsdp) > 0:
|
||||
# store XLA fsdp configuration parameters into a dictionary
|
||||
# Copy the config to avoid modifying the original config (which may be used for JSON serialization)
|
||||
self.xla_fsdp_config = self.fsdp_config.get("xla_fsdp_settings", {}).copy()
|
||||
# apply appropriate string to torch.dtype conversions for parameters
|
||||
if "compute_dtype" in self.xla_fsdp_config:
|
||||
self.xla_fsdp_config["compute_dtype"] = getattr(torch, self.xla_fsdp_config["compute_dtype"])
|
||||
if "buffer_dtype" in self.xla_fsdp_config:
|
||||
self.xla_fsdp_config["buffer_dtype"] = getattr(torch, self.xla_fsdp_config["buffer_dtype"])
|
||||
else:
|
||||
warnings.warn("XLA FSDP can be used only when `--fsdp` is specified.")
|
||||
else:
|
||||
if self.fsdp_config["xla_fsdp_grad_ckpt"]:
|
||||
warnings.warn("`--xla_fsdp_grad_ckpt` is useful only when `--xla` is set to true.")
|
||||
|
||||
# accelerate integration for FSDP
|
||||
fsdp_plugin_args = None
|
||||
if len(self.fsdp) > 0 and not self.fsdp_config["xla"]:
|
||||
from accelerate.utils.constants import (
|
||||
FSDP_AUTO_WRAP_POLICY,
|
||||
FSDP_SHARDING_STRATEGY,
|
||||
)
|
||||
|
||||
fsdp_plugin_args = {}
|
||||
for fsdp_option in self.fsdp:
|
||||
if fsdp_option.upper() in FSDP_SHARDING_STRATEGY:
|
||||
fsdp_plugin_args["sharding_strategy"] = fsdp_option
|
||||
elif fsdp_option == FSDPOption.OFFLOAD:
|
||||
fsdp_plugin_args["cpu_offload"] = True
|
||||
elif fsdp_option == FSDPOption.AUTO_WRAP:
|
||||
fsdp_plugin_args["auto_wrap_policy"] = FSDP_AUTO_WRAP_POLICY[0]
|
||||
if self.fsdp_config["min_num_params"] > 0:
|
||||
fsdp_plugin_args["min_num_params"] = self.fsdp_config["min_num_params"]
|
||||
fsdp_plugin_args["auto_wrap_policy"] = FSDP_AUTO_WRAP_POLICY[1]
|
||||
elif self.fsdp_config.get("transformer_layer_cls_to_wrap", None) is not None:
|
||||
fsdp_plugin_args["transformer_cls_names_to_wrap"] = ",".join(
|
||||
self.fsdp_config["transformer_layer_cls_to_wrap"]
|
||||
)
|
||||
fsdp_plugin_args["fsdp_version"] = self.fsdp_config.get("fsdp_version", 1)
|
||||
prefetch_policy = self.fsdp_config.get("backward_prefetch", "NO_PREFETCH")
|
||||
fsdp_plugin_args["backward_prefetch"] = prefetch_policy.upper()
|
||||
fsdp_plugin_args["forward_prefetch"] = str(self.fsdp_config.get("forward_prefetch", "false")).lower()
|
||||
|
||||
sync_module_states = str(self.fsdp_config.get("sync_module_states", "true")).lower()
|
||||
cpu_ram_efficient_loading = str(self.fsdp_config.get("cpu_ram_efficient_loading", "false")).lower()
|
||||
if sync_module_states == "false" and cpu_ram_efficient_loading == "true":
|
||||
# In this case, all the processes except the main process would have random weights leading
|
||||
# to unexpected behaviour during training, thus throwing error here to prevent it.
|
||||
raise ValueError('`sync_module_states` must be `"True"` if `cpu_ram_efficient_loading` is `"True"`')
|
||||
|
||||
# we need to set the env here as otherwise we get a warning in accelerate + we need to set it for transformers
|
||||
fsdp_plugin_args["cpu_ram_efficient_loading"] = cpu_ram_efficient_loading
|
||||
os.environ["FSDP_CPU_RAM_EFFICIENT_LOADING"] = cpu_ram_efficient_loading
|
||||
|
||||
fsdp_plugin_args["sync_module_states"] = sync_module_states
|
||||
fsdp_plugin_args["use_orig_params"] = str(self.fsdp_config.get("use_orig_params", "true")).lower()
|
||||
|
||||
return fsdp_plugin_args
|
||||
|
||||
|
||||
class ParallelMode(Enum):
|
||||
NOT_PARALLEL = "not_parallel"
|
||||
|
@ -2161,7 +2161,7 @@ def create_import_structure_from_path(module_path):
|
||||
{
|
||||
'albert': {
|
||||
frozenset(): {
|
||||
'configuration_albert': {'AlbertConfig'}
|
||||
'configuration_albert': {'AlbertConfig', 'AlbertOnnxConfig'}
|
||||
},
|
||||
frozenset({'tokenizers'}): {
|
||||
'tokenization_albert_fast': {'AlbertTokenizerFast'}
|
||||
@ -2337,7 +2337,7 @@ def spread_import_structure(nested_import_structure):
|
||||
{
|
||||
'albert': {
|
||||
frozenset(): {
|
||||
'configuration_albert': {'AlbertConfig'}
|
||||
'configuration_albert': {'AlbertConfig', 'AlbertOnnxConfig'}
|
||||
},
|
||||
frozenset({'tokenizers'}): {
|
||||
'tokenization_albert_fast': {'AlbertTokenizerFast'}
|
||||
@ -2364,7 +2364,7 @@ def spread_import_structure(nested_import_structure):
|
||||
'albert.tokenization_albert_fast': {'AlbertTokenizerFast'}
|
||||
},
|
||||
frozenset(): {
|
||||
'albert.configuration_albert': {'AlbertConfig'},
|
||||
'albert.configuration_albert': {'AlbertConfig', 'AlbertOnnxConfig'},
|
||||
'align.processing_align': {'AlignProcessor'},
|
||||
'align.configuration_align': {'AlignConfig', 'AlignTextConfig', 'AlignVisionConfig'},
|
||||
'altclip.configuration_altclip': {'AltCLIPConfig', 'AltCLIPTextConfig', 'AltCLIPVisionConfig'},
|
||||
@ -2465,7 +2465,7 @@ def define_import_structure(module_path: str, prefix: str | None = None) -> IMPO
|
||||
'albert.tokenization_albert_fast': {'AlbertTokenizerFast'}
|
||||
},
|
||||
frozenset(): {
|
||||
'albert.configuration_albert': {'AlbertConfig'},
|
||||
'albert.configuration_albert': {'AlbertConfig', 'AlbertOnnxConfig'},
|
||||
'align.processing_align': {'AlignProcessor'},
|
||||
'align.configuration_align': {'AlignConfig', 'AlignTextConfig', 'AlignVisionConfig'},
|
||||
'altclip.configuration_altclip': {'AltCLIPConfig', 'AltCLIPTextConfig', 'AltCLIPVisionConfig'},
|
||||
|
488
src/transformers/utils/processor_visualizer_utils.py
Normal file
488
src/transformers/utils/processor_visualizer_utils.py
Normal file
@ -0,0 +1,488 @@
|
||||
import re
|
||||
from typing import Optional, Union
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
import requests
|
||||
from PIL import Image
|
||||
|
||||
from ..image_transforms import convert_to_rgb, to_pil_image, unnormalize
|
||||
from ..image_utils import ChannelDimension, infer_channel_dimension_format
|
||||
from ..models.auto import AutoConfig, AutoProcessor
|
||||
|
||||
|
||||
# archs failing that should raise immediately for this util:
|
||||
|
||||
INCOMPATIBLE_MODELS = [
|
||||
"bit",
|
||||
"colpali",
|
||||
"colqwen2",
|
||||
"convnext",
|
||||
"d_fine",
|
||||
"data2vec",
|
||||
"efficientloftr",
|
||||
"efficientnet",
|
||||
"fuyu",
|
||||
"gemma3",
|
||||
"glm4v",
|
||||
"glpn",
|
||||
"hgnet_v2",
|
||||
"hiera",
|
||||
"internvl",
|
||||
"janus",
|
||||
"layoutlmv3",
|
||||
"levit",
|
||||
"lightglue",
|
||||
"llama4",
|
||||
"mistral3",
|
||||
"mllama",
|
||||
"mobilevit",
|
||||
"mobilevitv2",
|
||||
"musicgen",
|
||||
"musicgen_melody",
|
||||
"oneformer",
|
||||
"perceiver",
|
||||
"perception_lm",
|
||||
"phi4_multimodal",
|
||||
"regnet",
|
||||
"resnet",
|
||||
"superglue",
|
||||
"superpoint",
|
||||
"swin2sr",
|
||||
"timm_wrapper",
|
||||
"tvp",
|
||||
"udop",
|
||||
"vitmatte",
|
||||
"vitpose",
|
||||
"vjepa2",
|
||||
"whisper",
|
||||
]
|
||||
|
||||
|
||||
DEFAULT_IMAGE_URL = (
|
||||
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/hf-logo-224x224.png"
|
||||
)
|
||||
|
||||
|
||||
def _looks_like_global(tile: np.ndarray, base: Image.Image, *, mae_tol: float = 0.05) -> bool:
|
||||
"""
|
||||
Heuristic to check if a tile is a downscaled version of the original image.
|
||||
Uses mean absolute error with a strict threshold.
|
||||
"""
|
||||
base_r = base.convert("RGB").resize(tile.shape[:2][::-1], Image.BILINEAR)
|
||||
base_np = np.asarray(base_r).astype(np.float32) / 255.0
|
||||
|
||||
tile_f32 = tile.astype(np.float32)
|
||||
if tile_f32.max() > 1.5:
|
||||
tile_f32 /= 255.0
|
||||
|
||||
mae = np.abs(tile_f32 - base_np).mean()
|
||||
return mae < mae_tol
|
||||
|
||||
|
||||
def _find_global_tile_index(tiles: np.ndarray, base: Image.Image) -> Optional[int]:
|
||||
"""
|
||||
Find which tile (if any) is the global/downscaled image.
|
||||
Checks first and last tiles only, as models place global images at these positions.
|
||||
|
||||
Returns:
|
||||
Index of global tile (0 or len-1), or None if not found
|
||||
"""
|
||||
if tiles.shape[0] <= 1:
|
||||
return None
|
||||
|
||||
if _looks_like_global(tiles[0], base):
|
||||
return 0
|
||||
|
||||
if _looks_like_global(tiles[-1], base):
|
||||
return tiles.shape[0] - 1
|
||||
|
||||
return None
|
||||
|
||||
|
||||
class ImageVisualizer:
|
||||
def __init__(self, repo_id: str):
|
||||
self.processor = AutoProcessor.from_pretrained(repo_id, trust_remote_code=False)
|
||||
self.config = AutoConfig.from_pretrained(repo_id, trust_remote_code=False)
|
||||
|
||||
if hasattr(self.processor, "image_processor"):
|
||||
image_processor = self.processor.image_processor
|
||||
elif hasattr(self.processor, "image_mean"):
|
||||
image_processor = self.processor # weak test, but works most of the time
|
||||
else:
|
||||
raise ValueError(f"No image processor found for {repo_id}.")
|
||||
|
||||
self.channel_means = getattr(image_processor, "image_mean", [0.485, 0.456, 0.406])
|
||||
self.channel_stds = getattr(image_processor, "image_std", [0.229, 0.224, 0.225])
|
||||
if hasattr(self.processor, "image_token"):
|
||||
self.image_token_marker = self.processor.image_token
|
||||
elif hasattr(self.processor, "image_token_id"):
|
||||
self.image_token_marker = self.processor.decode(self.processor.image_token_id)
|
||||
else:
|
||||
self.image_token_marker = "<image>"
|
||||
|
||||
self.default_prompt = f"{self.image_token_marker} How does it look?"
|
||||
|
||||
self.vision_config = getattr(self.config, "vision_config", None)
|
||||
self.patch_size = getattr(self.vision_config, "patch_size", getattr(image_processor, "patch_size", 14))
|
||||
self.merge_size = getattr(image_processor, "merge_size", 1)
|
||||
|
||||
def _prepare_images_for_display(self, image_array: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Convert unnormalized images to NHWC format for display, flattening any extra batch dimensions.
|
||||
|
||||
Args:
|
||||
image_array: Array of shape [..., C, H, W] or [..., H, W, C]
|
||||
|
||||
Returns:
|
||||
Array of shape [N, H, W, C] suitable for plotting
|
||||
"""
|
||||
input_format = infer_channel_dimension_format(image_array)
|
||||
|
||||
if input_format == ChannelDimension.FIRST:
|
||||
if image_array.ndim == 3:
|
||||
image_array = image_array[np.newaxis, ...]
|
||||
elif image_array.ndim > 4:
|
||||
batch_size = int(np.prod(image_array.shape[: image_array.ndim - 3]))
|
||||
num_channels, height, width = image_array.shape[-3:]
|
||||
image_array = image_array.reshape(batch_size, num_channels, height, width)
|
||||
|
||||
if image_array.ndim == 4:
|
||||
image_array = np.transpose(image_array, (0, 2, 3, 1))
|
||||
else:
|
||||
if image_array.ndim == 3:
|
||||
image_array = image_array[np.newaxis, ...]
|
||||
elif image_array.ndim > 4:
|
||||
batch_size = int(np.prod(image_array.shape[: image_array.ndim - 3]))
|
||||
height, width, num_channels = image_array.shape[-3:]
|
||||
image_array = image_array.reshape(batch_size, height, width, num_channels)
|
||||
|
||||
return image_array
|
||||
|
||||
def _display_single_image(
|
||||
self,
|
||||
image_array: np.ndarray,
|
||||
show_patch_grid: bool,
|
||||
figsize=(7, 7),
|
||||
patch_grid_rows=None,
|
||||
patch_grid_cols=None,
|
||||
):
|
||||
plt.figure(figsize=figsize)
|
||||
plt.imshow(image_array)
|
||||
plt.xticks([])
|
||||
plt.yticks([])
|
||||
|
||||
if show_patch_grid:
|
||||
height, width = image_array.shape[:2]
|
||||
|
||||
if patch_grid_rows is not None and patch_grid_cols is not None:
|
||||
step_h = height / patch_grid_rows
|
||||
step_w = width / patch_grid_cols
|
||||
for i in range(1, patch_grid_cols):
|
||||
plt.axvline(i * step_w, color="red", linewidth=0.5)
|
||||
for i in range(1, patch_grid_rows):
|
||||
plt.axhline(i * step_h, color="red", linewidth=0.5)
|
||||
else:
|
||||
step = max(1, min(height, width) // self.patch_size)
|
||||
for x_pos in range(0, width, step):
|
||||
plt.axvline(x_pos, color="red", linewidth=0.5)
|
||||
for y_pos in range(0, height, step):
|
||||
plt.axhline(y_pos, color="red", linewidth=0.5)
|
||||
|
||||
caption = f"{width}×{height} | mean={', '.join(f'{m:.3f}' for m in self.channel_means)} | std={', '.join(f'{s:.3f}' for s in self.channel_stds)}"
|
||||
plt.tight_layout()
|
||||
plt.figtext(0.5, -0.02, caption, ha="center", va="top", fontsize=12)
|
||||
plt.show()
|
||||
|
||||
def _display_tiled_images(
|
||||
self,
|
||||
tiles_array: np.ndarray,
|
||||
source_image: Image.Image,
|
||||
rows: Optional[int] = None,
|
||||
cols: Optional[int] = None,
|
||||
aspect_ratio: float = 1.0,
|
||||
add_grid: bool = True,
|
||||
figsize=(7, 7),
|
||||
global_tile: Optional[np.ndarray] = None,
|
||||
):
|
||||
"""
|
||||
Display a grid of image tiles with optional global image display.
|
||||
|
||||
Args:
|
||||
tiles_array: Array of tiles to display in grid format
|
||||
source_image: Original source image
|
||||
rows: Number of rows in the grid
|
||||
cols: Number of columns in the grid
|
||||
aspect_ratio: Aspect ratio for grid layout calculation
|
||||
add_grid: Whether to add patch grid overlay
|
||||
figsize: Figure size for matplotlib
|
||||
global_tile: Optional global/downscaled image to display separately
|
||||
"""
|
||||
num_tiles = tiles_array.shape[0]
|
||||
|
||||
# Infer grid if not specified
|
||||
grid_rows, grid_cols = rows, cols
|
||||
if grid_rows is None or grid_cols is None:
|
||||
if aspect_ratio >= 1:
|
||||
guessed_cols = int(np.ceil(np.sqrt(num_tiles * aspect_ratio)))
|
||||
guessed_rows = int(np.ceil(num_tiles / max(guessed_cols, 1)))
|
||||
else:
|
||||
guessed_rows = int(np.ceil(np.sqrt(num_tiles / max(aspect_ratio, 1e-8))))
|
||||
guessed_cols = int(np.ceil(num_tiles / max(guessed_rows, 1)))
|
||||
grid_rows = grid_rows if grid_rows is not None else guessed_rows
|
||||
grid_cols = grid_cols if grid_cols is not None else guessed_cols
|
||||
|
||||
fig, axes = plt.subplots(grid_rows, grid_cols, figsize=figsize, squeeze=False)
|
||||
tile_index = 0
|
||||
for row_idx in range(grid_rows):
|
||||
for col_idx in range(grid_cols):
|
||||
ax = axes[row_idx, col_idx]
|
||||
if tile_index < num_tiles:
|
||||
tile_image = tiles_array[tile_index]
|
||||
ax.imshow(tile_image)
|
||||
ax.set_xticks([])
|
||||
ax.set_yticks([])
|
||||
|
||||
if add_grid:
|
||||
height, width = tile_image.shape[:2]
|
||||
step = max(1, min(height, width) // self.patch_size)
|
||||
for x_pos in range(0, width, step):
|
||||
ax.axvline(x_pos, color="red", linewidth=0.5)
|
||||
for y_pos in range(0, height, step):
|
||||
ax.axhline(y_pos, color="red", linewidth=0.5)
|
||||
else:
|
||||
ax.axis("off")
|
||||
tile_index += 1
|
||||
|
||||
unique = sorted({f"{t.shape[1]}×{t.shape[0]}" for t in tiles_array})
|
||||
sizes = ", ".join(unique)
|
||||
caption = f"{tiles_array.shape[0]} patches | {sizes} | mean={', '.join(f'{m:.3f}' for m in self.channel_means)} | std={', '.join(f'{s:.3f}' for s in self.channel_stds)}"
|
||||
plt.tight_layout()
|
||||
fig.text(0.5, 0.02, caption, ha="center", va="bottom", fontsize=12)
|
||||
plt.show()
|
||||
|
||||
if global_tile is not None:
|
||||
fig2, ax2 = plt.subplots(figsize=figsize)
|
||||
ax2.imshow(global_tile)
|
||||
ax2.set_xticks([])
|
||||
ax2.set_yticks([])
|
||||
ax2.set_aspect("equal", adjustable="box")
|
||||
fig2.subplots_adjust(left=0, right=1, top=1, bottom=0)
|
||||
h0, w0 = global_tile.shape[:2]
|
||||
caption = f"Global: {w0}×{h0} | mean={', '.join(f'{m:.3f}' for m in self.channel_means)} | std={', '.join(f'{s:.3f}' for s in self.channel_stds)}"
|
||||
fig2.text(0.5, 0.02, caption, ha="center", va="bottom", fontsize=12)
|
||||
plt.show()
|
||||
|
||||
def default_message(self, full_output: bool = False) -> str:
|
||||
"""
|
||||
Build a single formatted prompt string using the processor's chat template.
|
||||
Contains one image (HF logo) and one user text message.
|
||||
If available, adds the generation prompt as well.
|
||||
Falls back to a minimal '<image>' string if no template is available.
|
||||
"""
|
||||
# ensure this is a multimodal processor with image + tokenizer
|
||||
if not (
|
||||
hasattr(self.processor, "attributes")
|
||||
and "image_processor" in self.processor.attributes
|
||||
and "tokenizer" in self.processor.attributes
|
||||
):
|
||||
raise RuntimeError(
|
||||
"Processor does not expose both 'image_processor' and 'tokenizer'; cannot build multimodal example."
|
||||
)
|
||||
|
||||
conversation = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{
|
||||
"type": "image",
|
||||
"url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/hf-logo-224x224.png",
|
||||
},
|
||||
{"type": "text", "text": "Please describe this image."},
|
||||
],
|
||||
}
|
||||
]
|
||||
|
||||
try:
|
||||
print("For a 224x224 RGB png image: \n")
|
||||
decoded_message = self.processor.batch_decode(
|
||||
self.processor.apply_chat_template(
|
||||
conversation,
|
||||
add_generation_prompt=True,
|
||||
tokenize=True,
|
||||
return_dict=False,
|
||||
truncation=False,
|
||||
),
|
||||
skip_special_tokens=False,
|
||||
)[0]
|
||||
|
||||
image_token_string = getattr(self.processor, "image_token", "<image>")
|
||||
token_escaped = re.escape(image_token_string)
|
||||
image_token_run_pattern = re.compile(rf"(?:{token_escaped})(?:\s*{token_escaped}){{2,}}")
|
||||
|
||||
def compress_image_token_run(match: re.Match) -> str:
|
||||
n_tokens = match.group(0).count(image_token_string)
|
||||
return f"{image_token_string}[...{n_tokens} tokens...]{image_token_string}"
|
||||
|
||||
if full_output:
|
||||
return decoded_message
|
||||
else:
|
||||
return image_token_run_pattern.sub(compress_image_token_run, decoded_message)
|
||||
|
||||
except ValueError:
|
||||
image_token_string = getattr(
|
||||
self.processor,
|
||||
"image_token",
|
||||
getattr(getattr(self.processor, "tokenizer", None), "image_token", "<image>"),
|
||||
)
|
||||
return f"{image_token_string} {'Please describe this image.'}"
|
||||
|
||||
def visualize(
|
||||
self,
|
||||
images: Optional[Union[Image.Image, np.ndarray, str, list[Union[Image.Image, np.ndarray, str]]]] = None,
|
||||
rows: Optional[int] = None,
|
||||
cols: Optional[int] = None,
|
||||
add_grid: bool = True,
|
||||
figsize=(12, 12),
|
||||
):
|
||||
"""
|
||||
Visualize the model-processed image(s). Only single images are supported.
|
||||
If the processor returns multiple tiles, display them in a grid with optional patch grid overlay.
|
||||
"""
|
||||
if images is None:
|
||||
images = Image.open(requests.get(DEFAULT_IMAGE_URL, stream=True).raw)
|
||||
|
||||
if not isinstance(images, list):
|
||||
images = [images]
|
||||
else:
|
||||
if len(images) > 1:
|
||||
raise ValueError(
|
||||
"You passed a list of several images. Only single images are accepted by the visualizer."
|
||||
)
|
||||
|
||||
pil_images = [convert_to_rgb(to_pil_image(x)) for x in images]
|
||||
img_width, img_height = pil_images[0].size
|
||||
aspect_ratio = img_width / max(img_height, 1)
|
||||
|
||||
processed_inputs = self.processor(images=pil_images, text=self.default_prompt, return_tensors="pt")
|
||||
pixel_values = processed_inputs["pixel_values"]
|
||||
|
||||
grid_rows = None
|
||||
grid_cols = None
|
||||
patch_grid_rows = None
|
||||
patch_grid_cols = None
|
||||
|
||||
if hasattr(self.processor, "image_processor") and hasattr(
|
||||
self.processor.image_processor, "get_num_patches_from_image_size"
|
||||
):
|
||||
num_patches, grid_rows, grid_cols = self.processor.image_processor.get_num_patches_from_image_size(
|
||||
img_width, img_height
|
||||
)
|
||||
|
||||
if pixel_values.ndim == 2 and "image_grid_thw" in processed_inputs:
|
||||
num_patches, flattened_size = pixel_values.shape
|
||||
grid_thw = processed_inputs["image_grid_thw"][0]
|
||||
temporal_frames, patch_grid_h, patch_grid_w = grid_thw.tolist()
|
||||
|
||||
patch_size = getattr(self.processor.image_processor, "patch_size", 14)
|
||||
temporal_patch_size = getattr(self.processor.image_processor, "temporal_patch_size", 1)
|
||||
merge_size = getattr(self.processor.image_processor, "merge_size", 2)
|
||||
|
||||
expected_size = temporal_patch_size * 3 * patch_size * patch_size
|
||||
if flattened_size == expected_size:
|
||||
pixel_values = pixel_values.reshape(num_patches, temporal_patch_size, 3, patch_size, patch_size)
|
||||
pixel_values = pixel_values[:, 0, :, :, :]
|
||||
|
||||
super_grid_h = patch_grid_h // merge_size
|
||||
super_grid_w = patch_grid_w // merge_size
|
||||
|
||||
pixel_values = pixel_values.reshape(
|
||||
super_grid_h, super_grid_w, merge_size, merge_size, 3, patch_size, patch_size
|
||||
)
|
||||
pixel_values = pixel_values.permute(0, 2, 1, 3, 4, 5, 6).contiguous()
|
||||
pixel_values = pixel_values.reshape(
|
||||
super_grid_h * merge_size, super_grid_w * merge_size, 3, patch_size, patch_size
|
||||
)
|
||||
pixel_values = pixel_values.permute(0, 3, 1, 4, 2).contiguous()
|
||||
pixel_values = pixel_values.reshape(patch_grid_h * patch_size, patch_grid_w * patch_size, 3)
|
||||
pixel_values = pixel_values.unsqueeze(0)
|
||||
|
||||
patch_grid_rows = patch_grid_h
|
||||
patch_grid_cols = patch_grid_w
|
||||
else:
|
||||
raise ValueError(
|
||||
f"Cannot reshape pixel_values: expected flattened size {expected_size} "
|
||||
f"(temporal={temporal_patch_size} × channels=3 × patch={patch_size}×{patch_size}), "
|
||||
f"but got {flattened_size}"
|
||||
)
|
||||
elif pixel_values.ndim == 5:
|
||||
batch_size, num_tiles, num_channels, height, width = pixel_values.shape
|
||||
pixel_values = pixel_values.view(batch_size * num_tiles, num_channels, height, width)
|
||||
elif pixel_values.ndim == 4:
|
||||
pass
|
||||
elif pixel_values.ndim == 3:
|
||||
pixel_values = pixel_values.unsqueeze(0)
|
||||
else:
|
||||
raise ValueError(f"Unexpected pixel_values shape: {pixel_values.shape}")
|
||||
|
||||
unnormalized = unnormalize(pixel_values, mean=self.channel_means, std=self.channel_stds)
|
||||
display_ready = self._prepare_images_for_display(unnormalized)
|
||||
|
||||
if display_ready.shape[0] == 1:
|
||||
self._display_single_image(
|
||||
display_ready[0],
|
||||
show_patch_grid=add_grid,
|
||||
figsize=figsize,
|
||||
patch_grid_rows=patch_grid_rows,
|
||||
patch_grid_cols=patch_grid_cols,
|
||||
)
|
||||
return
|
||||
|
||||
num_tiles = display_ready.shape[0]
|
||||
global_tile = None
|
||||
|
||||
if grid_rows is not None and grid_cols is not None and grid_rows * grid_cols + 1 == num_tiles:
|
||||
global_tile = display_ready[-1]
|
||||
display_ready = display_ready[:-1]
|
||||
num_tiles = display_ready.shape[0]
|
||||
if rows is None:
|
||||
rows = grid_rows
|
||||
if cols is None:
|
||||
cols = grid_cols
|
||||
else:
|
||||
global_idx = _find_global_tile_index(display_ready, pil_images[0])
|
||||
if global_idx is not None:
|
||||
global_tile = display_ready[global_idx]
|
||||
if global_idx == 0:
|
||||
display_ready = display_ready[1:]
|
||||
else:
|
||||
display_ready = display_ready[:-1]
|
||||
num_tiles = display_ready.shape[0]
|
||||
|
||||
if rows is None or cols is None:
|
||||
tile_h, tile_w = display_ready.shape[1:3]
|
||||
tile_aspect = tile_w / tile_h if tile_h > 0 else 1.0
|
||||
target_aspect = aspect_ratio / tile_aspect
|
||||
|
||||
best_rows, best_cols = 1, num_tiles
|
||||
min_diff = float("inf")
|
||||
for r in range(1, num_tiles + 1):
|
||||
c = int(np.ceil(num_tiles / r))
|
||||
diff = abs((c / r) - target_aspect)
|
||||
if diff < min_diff:
|
||||
min_diff = diff
|
||||
best_rows, best_cols = r, c
|
||||
|
||||
rows = best_rows
|
||||
cols = best_cols
|
||||
|
||||
self._display_tiled_images(
|
||||
display_ready,
|
||||
pil_images[0],
|
||||
rows=rows,
|
||||
cols=cols,
|
||||
aspect_ratio=aspect_ratio,
|
||||
add_grid=add_grid,
|
||||
figsize=figsize,
|
||||
global_tile=global_tile,
|
||||
)
|
@ -191,6 +191,7 @@ class TrainerIntegrationFSDP(TestCasePlus, TrainerIntegrationCommon):
|
||||
for k, v in trainer.args.fsdp_config.items():
|
||||
self.assertTrue(k in self.accelerate_fsdp_config)
|
||||
self.assertEqual(v, self.accelerate_fsdp_config[k])
|
||||
self.assertEqual(os.environ.get("ACCELERATE_USE_FSDP", "false"), "true")
|
||||
|
||||
@parameterized.expand(params, name_func=_parameterized_custom_name_func)
|
||||
def test_fsdp_config(self, sharding_strategy, dtype):
|
||||
@ -211,9 +212,10 @@ class TrainerIntegrationFSDP(TestCasePlus, TrainerIntegrationCommon):
|
||||
self.assertEqual(trainer.args.fsdp[2], FSDPOption.AUTO_WRAP)
|
||||
for k, v in trainer.args.fsdp_config.items():
|
||||
self.assertEqual(v, self.fsdp_config[k])
|
||||
self.assertEqual(os.environ.get("ACCELERATE_USE_FSDP", "false"), "true")
|
||||
|
||||
@parameterized.expand(params, name_func=_parameterized_custom_name_func)
|
||||
def test_fsdp_plugin(self, sharding_strategy, dtype):
|
||||
def test_fsdp_config_transformers_auto_wrap(self, sharding_strategy, dtype):
|
||||
output_dir = self.get_auto_remove_tmp_dir()
|
||||
fsdp_config = deepcopy(self.fsdp_config)
|
||||
del fsdp_config["min_num_params"]
|
||||
@ -227,25 +229,27 @@ class TrainerIntegrationFSDP(TestCasePlus, TrainerIntegrationCommon):
|
||||
"fsdp_config": fsdp_config,
|
||||
}
|
||||
kwargs[dtype] = True
|
||||
prefix = "FSDP_"
|
||||
with mockenv_context(**self.dist_env_1_gpu):
|
||||
trainer = get_regression_trainer(**kwargs)
|
||||
self.assertEqual(trainer.args.fsdp[0], sharding_strategy)
|
||||
self.assertEqual(trainer.args.fsdp[1], FSDPOption.OFFLOAD)
|
||||
self.assertEqual(trainer.args.fsdp[2], FSDPOption.AUTO_WRAP)
|
||||
fsdp_sharding_strategy = str(FSDP_SHARDING_STRATEGY.index(sharding_strategy.upper()) + 1)
|
||||
self.assertEqual(os.environ[f"{prefix}SHARDING_STRATEGY"], fsdp_sharding_strategy)
|
||||
self.assertEqual(os.environ[f"{prefix}OFFLOAD_PARAMS"], "true")
|
||||
self.assertEqual(os.environ[f"{prefix}AUTO_WRAP_POLICY"], "TRANSFORMER_BASED_WRAP")
|
||||
self.assertEqual(
|
||||
trainer.args.fsdp_plugin.sharding_strategy.value,
|
||||
FSDP_SHARDING_STRATEGY.index(sharding_strategy.upper()) + 1,
|
||||
os.environ[f"{prefix}TRANSFORMER_CLS_TO_WRAP"], ",".join(fsdp_config["transformer_layer_cls_to_wrap"])
|
||||
)
|
||||
self.assertEqual(trainer.args.fsdp_plugin.cpu_offload.offload_params, True)
|
||||
self.assertEqual(os.environ[f"{prefix}BACKWARD_PREFETCH"], fsdp_config["backward_prefetch"])
|
||||
self.assertEqual(os.environ[f"{prefix}FORWARD_PREFETCH"], fsdp_config["forward_prefetch"])
|
||||
self.assertEqual(os.environ[f"{prefix}USE_ORIG_PARAMS"], fsdp_config["use_orig_params"])
|
||||
self.assertEqual(os.environ[f"{prefix}SYNC_MODULE_STATES"], fsdp_config["sync_module_states"])
|
||||
self.assertEqual(
|
||||
trainer.args.fsdp_plugin.transformer_cls_names_to_wrap,
|
||||
fsdp_config["transformer_layer_cls_to_wrap"],
|
||||
)
|
||||
self.assertEqual(trainer.args.fsdp_plugin.forward_prefetch, fsdp_config["forward_prefetch"])
|
||||
self.assertEqual(trainer.args.fsdp_plugin.sync_module_states, fsdp_config["sync_module_states"])
|
||||
self.assertEqual(
|
||||
trainer.args.fsdp_plugin.cpu_ram_efficient_loading, fsdp_config["cpu_ram_efficient_loading"]
|
||||
os.environ[f"{prefix}CPU_RAM_EFFICIENT_LOADING"], fsdp_config["cpu_ram_efficient_loading"]
|
||||
)
|
||||
self.assertEqual(os.environ.get("ACCELERATE_USE_FSDP", "false"), "true")
|
||||
|
||||
@parameterized.expand(params, name_func=_parameterized_custom_name_func)
|
||||
@require_torch_multi_accelerator
|
||||
|
@ -132,7 +132,7 @@ class TestImportStructures(unittest.TestCase):
|
||||
"models": {
|
||||
frozenset(): {"dummy_config": {"DummyConfig"}},
|
||||
"albert": {
|
||||
frozenset(): {"configuration_albert": {"AlbertConfig"}},
|
||||
frozenset(): {"configuration_albert": {"AlbertConfig", "AlbertOnnxConfig"}},
|
||||
frozenset({"torch"}): {
|
||||
"modeling_albert": {
|
||||
"AlbertForMaskedLM",
|
||||
@ -174,7 +174,7 @@ class TestImportStructures(unittest.TestCase):
|
||||
frozenset(): {
|
||||
"dummy_non_model": {"DummyObject"},
|
||||
"models.dummy_config": {"DummyConfig"},
|
||||
"models.albert.configuration_albert": {"AlbertConfig"},
|
||||
"models.albert.configuration_albert": {"AlbertConfig", "AlbertOnnxConfig"},
|
||||
"models.llama.configuration_llama": {"LlamaConfig"},
|
||||
"models.deprecated.transfo_xl.configuration_transfo_xl": {"TransfoXLConfig"},
|
||||
"models.deprecated.transfo_xl.tokenization_transfo_xl": {"TransfoXLCorpus", "TransfoXLTokenizer"},
|
||||
|
@ -1045,6 +1045,7 @@ def ignore_undocumented(name: str) -> bool:
|
||||
or name.endswith("Layer")
|
||||
or name.endswith("Embeddings")
|
||||
or name.endswith("Attention")
|
||||
or name.endswith("OnnxConfig")
|
||||
):
|
||||
return True
|
||||
# Submodules are not documented.
|
||||
|
@ -738,6 +738,7 @@ src/transformers/models/yoso/convert_yoso_pytorch_to_pytorch.py
|
||||
src/transformers/models/yoso/modeling_yoso.py
|
||||
src/transformers/models/zamba/configuration_zamba.py
|
||||
src/transformers/models/zamba/modeling_zamba.py
|
||||
src/transformers/onnx/config.py
|
||||
src/transformers/optimization.py
|
||||
src/transformers/pipelines/audio_classification.py
|
||||
src/transformers/pipelines/audio_utils.py
|
||||
|
Reference in New Issue
Block a user