Compare commits

...

1077 Commits

Author SHA1 Message Date
2c8b508cca Release: v4.22.1 2022-09-16 17:58:38 -04:00
654c584f38 Add tests for legacy load by url and fix bugs (#19078) 2022-09-16 17:20:26 -04:00
6d034d58c5 Note about developer mode (#19075) 2022-09-16 16:52:40 -04:00
af20bbb318 Fix tokenizer load from one file (#19073)
* Fix tokenizer load from one file

* Add a test

* Style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2022-09-16 16:52:08 -04:00
1504b5311a Fixing OPT fast tokenizer option. (#18753)
* Fixing OPT fast tokenizer option.

* Remove dependency on `pt`.

* Move it to GPT2 tokenization tests.

* Added a few tests.
2022-09-16 16:51:18 -04:00
defd039bae Move cache: expand error message (#19051) 2022-09-16 16:45:39 -04:00
ad11b79e95 Release: v4.22.0 2022-09-14 14:50:38 -04:00
21823788e3 fix GPT2 token's special_tokens_mask when used with add_bos_token=True (#19036) 2022-09-14 14:49:05 -04:00
680ad0dc4b Re-add support for single url files in objects download (#19014) 2022-09-13 13:18:28 -04:00
c6415fa10d Fix MaskFormerFeatureExtractor instance segmentation preprocessing bug (#18997)
* fix preprocessing for instance segmentation maps

* add support for per-image instance2class_id mapping

* edit docstrings for clarity
2022-09-13 13:16:17 -04:00
d5e1d213c6 Fix tokenizer for XLMRobertaXL (#19004)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-13 13:15:51 -04:00
470799b3a6 Removed issue in wav2vec link (#18945)
Fix connected to [this issue](https://github.com/huggingface/transformers/issues/18944)
2022-09-12 21:59:19 +02:00
4c2e983f44 Fixed typo (#18921)
Fixed typo itmes --> items
2022-09-12 21:03:48 +02:00
1182b945a6 TF: TF 2.10 unpin + related onnx test skips (#18995) 2022-09-12 19:30:27 +01:00
7f4708e1a2 added type hints (#18996) 2022-09-12 19:11:40 +01:00
39b5bb79d9 fix checkpoint name for wav2vec2 conformer (#18994)
* fix checkpoint name for wav2vec2 conformer

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-12 19:39:01 +02:00
8a6928e28b TF: correct TFBart embeddings weights name when load_weight_prefix is passed (#18993) 2022-09-12 18:35:45 +01:00
c126a239bc Fix tflongformer int dtype (#18907)
* Use int64 throughout TFLongFormer

* make style

* Do some more fixed casting in TFLongFormer

* Fix some wonky "is None" conditionals

* Cast all the dtypes, salt the earth

* Fix copies to TFLED as well and do some casting there

* dtype fix in TFLongformer test

* Make fixup

* Expand tolerances on the LED tests too (I think this is a TF32 thing)

* Expand test tolerances for LED a tiny bit (probably a Tensorfloat thing again)
2022-09-12 17:51:10 +01:00
f7ceda345d Align try_to_load_from_cache with huggingface_hub (#18966)
* Align try_to_load_from_cache with huggingface_hub

* Fix tests
2022-09-12 12:09:37 -04:00
cf450b776f Fix TF start docstrings (#18991)
* Update our TF 2.0 input format tip across all models

* make style
2022-09-12 16:33:56 +01:00
adbf3a40de Remove dropout in embedding layer of OPT (#18845) 2022-09-12 16:32:38 +02:00
367026000b create Past CI results as tables for GitHub issue (#18953)
* create Past CI results as tables for GitHub issue

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-12 15:20:31 +02:00
0b36970371 Remove decoder_position_ids from check_decoder_model_past_large_inputs (#18980)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-12 15:19:48 +02:00
a86acb75ad add DDP HPO support for sigopt (#18931)
only main_process will have HPO, and pass argument to other process

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2022-09-12 07:37:25 -04:00
9faa9f9dac remove unused activation dropout (#18842) 2022-09-12 11:00:24 +02:00
a26114777e Revert "TF: unpin maximum TF version (#18917)" (#18972)
This reverts commit d8cf3b20875baee97f4bea64ffd17670aa57c37b.
2022-09-10 09:11:46 -04:00
d8cf3b2087 TF: unpin maximum TF version (#18917) 2022-09-10 13:33:01 +01:00
00cbadb870 RFC: Replace custom TF embeddings by Keras embeddings (#18939) 2022-09-10 11:34:49 +01:00
855dcae8bb update black target version (#18955)
* update black target version

* add comment

as per https://github.com/huggingface/transformers/pull/18955#issuecomment-1242081649

* revert change

Will only update to 3.7 after black 2023 upgrade in January
2022-09-09 17:30:05 -04:00
645f174286 Exit early in load if no weights are in the sharded state dict (#18937) 2022-09-09 15:07:09 -04:00
660e0b97bd Fix train_step, test_step and tests for CLIP (#18684)
* Fix train_step and test_step, correctly enable CLIP fit test

* Stop using get_args on older Python versions

* Don't use get_origin either

* UnionType is actually even newer, don't use that either

* Apply the same fix to test_loss_computation

* Just realized I was accidentally skipping a bunch of tests!

* Fix test_loss_computation for models without separable labels

* Fix scalar losses in test_step and train_step

* Stop committing your breakpoints

* Fix Swin loss shape

* Fix Tapas loss shape

* Shape fixes for TAPAS, DeIT, HuBERT and ViTMAE

* Add loss computation to TFMobileBertForPreTraining

* make fixup and move copied from statement

* make fixup and move copied from statement

* Correct copied from

* Add labels and next_sentence_label inputs to TFMobileBERT

* Make sure total_loss is always defined

* Update tests/test_modeling_tf_common.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix copied from

* Ensure CTC models get labels in tests

* Ensure CTC models get labels in tests

* Fix tests for vit_mae

* Fix tests for vit_mae

* Fix tests for vit_mae

* Reduce batch size for wav2vec2 testing because it was causing OOM

* Skip some TAPAS tests that are failing

* Skip a failing HuBERT test

* make style

* Fix mobilebertforpretraining test

* Skip Wav2Vec2 tests that use huge amounts of mem

* Skip keras_fit for Wav2Vec2 as well

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2022-09-09 20:01:02 +01:00
f1a6df3210 Generate: Simplify is_pad_token_not_equal_to_eos_token_id (#18933) 2022-09-09 16:44:56 +01:00
85125fcffd Neptune.ai integration improvements (#18934)
* NeptuneCallback improvements

* After review suggestions and deduplication of initial run

* Added volatile checkpoints support due to missing post-rebase commit

* Update README per review comments

- Remove list formatting
- Correct Neptune docs link

Co-authored-by: Sabine <sabine.nyholm@neptune.ai>
2022-09-09 11:37:34 -04:00
e6f221c8d4 [JAX] Replace all jax.tree_* calls with jax.tree_util.tree_* (#18361)
* [JAX] Replace all jax.tree_* calls with jax.tree_util.tree_*

* fix double tree_util
2022-09-09 15:18:56 +02:00
22f7218560 add task_type_id to BERT to support ERNIE-2.0 and ERNIE-3.0 models (#18686)
* add_ernie

* remove Tokenizer in ernie

* polish code

* format code style

* polish code

* fix style

* update doc

* make fix-copies

* change model name

* change model name

* fix dependency

* add more copied from

* rename ErnieLMHeadModel to ErnieForCausalLM
do not expose ErnieLayer
update doc

* fix

* make style

* polish code

* polish code

* fix

* fix

* fix

* fix

* fix

* final fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-09 07:36:46 -04:00
895c528886 Update translation requests contact (#18941)
* Update TRANSLATING.md

Update the contact to @GuggerSylvain

* Update docs/TRANSLATING.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-09-09 09:15:24 +02:00
bb6f6d5338 Add X-CLIP (#18852)
* First draft

* Improve conversion script

* Make vision encoder work

* More improvements

* Improve conversion script

* Fix quality

* Add MultiframeIntegrationTransformer

* More improvements

* Make MiT output work

* Fix quality

* Add prompts generator

* Add tests

* Fix some tests

* Fix some more tests

* Fix more tests

* Improve conversion script

* Fix model outputs

* Fix more tests

* Add XClipProcessor

* Use processor in conversion script

* Fix integration test

* Update README, fix docs

* Fix all tests

* Add MIT output to XClipOutput

* Create better variable names

* Rename XClip to XCLIP

* Extend conversion script

* Add support for large models

* Add support for 16 frame models

* Add another model'

* Fix module issue

* Apply suggestions from code review

* Add figure to docs

* Fix CLIPProcessor issue

* Apply suggestions from code review

* Delete file

* Convert more checkpoints

* Convert last checkpoint

* Update nielsr to microsoft
2022-09-08 14:50:30 +02:00
9832ac7c73 Fix LayoutXLM wrong link in README (#18932)
* fix LayoutXLM wrong link in README

* fix LayoutXLM worng link in index.mdx
2022-09-08 07:32:41 -04:00
90f6fe9155 Skip some doctests in quicktour (#18927)
* skip some code examples for doctests

* make style

* fix code snippet formatting

* separate code snippet into two blocks
2022-09-07 14:45:22 -07:00
6519150c31 Add image height and width to ONNX dynamic axes (#18915) 2022-09-07 22:42:46 +02:00
737f6ad1f7 Starts on a list of external deps required for dev (#18929)
* Starts on a list of external deps required for dev

I've found that I need to install MeCab manually on my AS Mac.

* Generalizes OS nascent dependency list

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-09-07 16:33:03 -04:00
6394221871 Fix XLA fp16 and bf16 error checking (#18913)
* Fix XLA fp16 and bf16 error checking

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-09-07 15:45:17 -04:00
6690ba3f4d pin TF 2.9.1 for self-hosted CIs (#18925)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-07 19:46:14 +02:00
2ef7742117 Add DocumentQuestionAnswering pipeline (#18414)
* [WIP] Skeleton of VisualQuestionAnweringPipeline extended to support LayoutLM-like models

* Fixup

* Use the full encoding

* Basic refactoring to DocumentQuestionAnsweringPipeline

* Cleanup

* Improve args, docs, and implement preprocessing

* Integrate OCR

* Refactor question_answering pipeline

* Use refactored QA code in the document qa pipeline

* Fix tests

* Some small cleanups

* Use a string type annotation for Image.Image

* Update encoding with image features

* Wire through the basic docs

* Handle invalid response

* Handle empty word_boxes properly

* Docstring fix

* Integrate Donut model

* Fixup

* Incorporate comments

* Address comments

* Initial incorporation of tests

* Address Comments

* Change assert to ValueError

* Comments

* Wrap `score` in float to make it JSON serializable

* Incorporate AutoModeLForDocumentQuestionAnswering changes

* Fixup

* Rename postprocess function

* Fix auto import

* Applying comments

* Improve docs

* Remove extra assets and add copyright

* Address comments

Co-authored-by: Ankur Goyal <ankur@impira.com>
2022-09-07 13:38:49 -04:00
3059d80d80 [DeepSpeed ZeRO3] Fix performance degradation in sharded models (#18911)
* [DeepSpeed] Fix performance degradation in sharded models

* style

* polish

Co-authored-by: Stas Bekman <stas@stason.org>
2022-09-07 07:44:20 -07:00
10c774cf60 remvoe _create_and_check_torch_fx_tracing in specific test files (#18667)
* remvoe _create_and_check_torch_fx_tracing defined in specific model test files

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-07 16:22:09 +02:00
0eabab0998 TF: final bias as a layer in seq2seq models (replicate TFMarian fix) (#18903) 2022-09-07 14:03:02 +01:00
2b9513fdab Update TF fine-tuning docs (#18654)
* Update TF fine-tuning docs

* Fix formatting

* Add some section headers so the right sidebar works better

* Squiggly it

* Update docs/source/en/training.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/training.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Explain things in the text, not the comments

* Make the two dataset creation methods into a list

* Move the advice about collation out of a <Tip>

* Edits for clarity

* Edits for clarity

* Edits for clarity

* Replace `to_tf_dataset` with `prepare_tf_dataset` in the fine-tuning pages

* Restructure the page a little bit

* Restructure the page a little bit

* Restructure the page a little bit

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-09-07 13:30:07 +01:00
d842f2d5b9 update the train_batch_size in case HPO change batch_size_per_device (#18918)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2022-09-07 08:01:30 -04:00
4f299b2446 Accelerator end training (#18910)
* add accelerator.end_training()

Some trackers need this to end their runs.

* fixup and quality

* add space

* add space again ?!?
2022-09-07 07:46:26 -04:00
7a8118947f Add checks for more workflow jobs (#18905)
* add check for scheduled CI

* Add check to other CIs

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-07 12:51:37 +02:00
c25f27fa6a [VideoMAE] Improve code examples (#18919)
* Simplify code example

* Add seed
2022-09-07 12:24:12 +02:00
0a632f076d Fix incorrect size of input for 1st strided window length in Perplexity of fixed-length models (#18906)
* update the PPL for stride 512

* fix 1st strided window size

* linting

* fix typo

* styling
2022-09-06 15:20:12 -04:00
7d5fde991d unpin slack_sdk version (#18901)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-06 18:42:00 +02:00
71ff88fa4f Further reduce the number of alls to head for cached objects (#18871)
* Further reduce the number of alls to head for cached models/tokenizers/pipelines

* Fix tests

* Address review comments
2022-09-06 12:34:37 -04:00
6678350c01 fixes bugs to handle non-dict output (#18897) 2022-09-06 16:13:34 +03:00
998a90bc7d Fix test_tf_encode_plus_sent_to_model for LayoutLMv3 (#18898)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-06 14:51:03 +02:00
f85acb4d73 Fix decode_input_ids to bare T5Model and improve doc (#18791)
* use tokenizer to output tensor

* add preprocessing for decoder_input_ids for bare T5Model

* add preprocessing to tf and flax

* linting

* linting

* Update src/transformers/models/t5/modeling_flax_t5.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/t5/modeling_tf_t5.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/t5/modeling_t5.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-09-06 14:12:26 +02:00
3b19c0317b updating gather function with gather_for_metrics in run_wav2vec2_pretraining (#18877)
Co-authored-by: Arun Rajaram <arunrajaram@Aruns-MacBook-Pro.local>
2022-09-06 07:36:37 -04:00
Had
734b7e2a5a Mask t5 relative position bias then head pruned (#17968)
* add position bias head masking if heads pruned

* fix pruning function in t5 encoder

* make style

* make fix-copies

* Revert added folder

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-09-06 10:39:31 +02:00
d4dbd7ca59 Generate: get the correct beam index on eos token (#18851) 2022-09-05 19:35:47 +01:00
c6d3daba54 Update Chinese documentation (#18893)
* update the translation
2022-09-05 19:56:12 +02:00
cfd623a859 Add type hints to XLM-Roberta-XL models (#18475)
* Add type hints to XLM-Roberta-XL models

* Format
2022-09-05 13:38:08 +01:00
17c634fd5b Update perf_train_gpu_one.mdx (#18442) 2022-09-05 14:06:36 +02:00
badb9d2aaa Correct naming pegasus x (#18896)
* add first generation tutorial

* [Pegasus X] correct naming

* [Generation] Remove
2022-09-05 11:25:00 +02:00
591cfc6c90 Mention TF and Flax checkpoints (#18894) 2022-09-05 11:09:39 +02:00
7f27e002fd TF: TFMarianMTModel final logits bias as a layer (#18833)
* bias as a layer

* alias the bias (hah, it rhymes)

* add comment with info
2022-09-05 09:20:27 +01:00
65fb71bc76 Add Trainer to quicktour (#18723)
* 📝 update quicktour

* 📝 add trainer section

* 🖍 markdown table, apply feedbacks

*  make style

* add tf training section

* make style
2022-09-02 15:05:31 -05:00
ae32f3afef Finetune guide for semantic segmentation (#18640)
* 📝 first draft

* oops add to toctree

* make style

* 📝 add inference section

* 🖍 make style

* 📝 add images

* 🖍 apply feedbacks

* remove num_labels and pytorch block

* apply feedbacks, add colab notebook

Co-authored-by: Steven <stevhliu@gmail.com>
2022-09-02 14:29:51 -05:00
bf9d506137 Update docs landing page (#18590)
* 📝 update docs landing page

* 🖍 apply feedbacks

* apply feedbacks

* apply feedbacks, use <br> for list
2022-09-02 14:29:06 -05:00
53e33e6f1b PEGASUS-X (#18551)
* PegasusX Initial commit

* rename

* pegasus X implementation

* pegx update

* pegx fix

* pegasus-x fixes

* pegx updates

* cleanup

* cleanup

* cleanup

* tests

* stylefixes

* Documentation update

* Model hub fix

* cleanup

* update

* update

* testfix

* Check fix

* tweaks for merging

* style

* style

* updates for pr

* style

* change pegasus-x repo
2022-09-02 19:54:02 +02:00
ecdf9b06bc Remove cached torch_extensions on CI runners (#18868)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-02 18:17:58 +02:00
4e29b3f884 A script to download artifacts and perform CI error statistics (#18865)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-02 17:59:26 +02:00
9196f48b95 Generate: validate model_kwargs on TF (and catch typos in generate arguments) (#18651) 2022-09-02 16:25:26 +01:00
c5be7cae59 postpone bnb load until it's needed (#18859) 2022-09-02 08:22:46 -07:00
9e346f7436 Fix number of examples for iterable datasets in multiprocessing (#18856)
* Fix number of examples for iterable datasets in multiprocessing

* Add stronger check
2022-09-02 10:49:39 -04:00
0ab465a5d2 pin Slack SDK to 3.18.1 to avoid failing issue (#18869)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-02 16:49:08 +02:00
38c3cd52fb Clean up utils.hub using the latest from hf_hub (#18857)
* Clean up utils.hub using the latest from hf_hub

* Adapt test

* Address review comment

* Fix test
2022-09-02 10:30:06 -04:00
17981faf67 Add OWL-ViT to the appropriate section (#18867)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-09-02 15:59:25 +02:00
c60dd98e87 [LayoutLM] Add clarification to docs (#18716)
* Add clarification

* Add another clarification

* Apply suggestion

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-09-02 14:48:19 +02:00
129d73294e Fix naming issue with ImageToText pipeline (#18864)
Co-authored-by: Olivier Dehaene <olivier@huggingface.co>
2022-09-02 07:55:30 -04:00
9b3eb81014 if learning rate is a tensor, get item (float) (#18861) 2022-09-02 07:46:31 -04:00
142e12afb4 Split docs on modality (#18205)
* update

* 🖍 add missing files

* 📝 add nested sections

* 🖍 align titles with tasks

* oops

* remove quotes from titles
2022-09-01 15:19:11 -05:00
23fab60b67 Pin revision for LayoutLMForQuestionAnswering and TFLayoutLMForQuestionAnswering tests (#18854)
* Pin revision for tests

* Fixup

* Update revision in models

* Shorten revisions

Co-authored-by: Ankur Goyal <ankur@impira.com>
2022-09-01 12:52:33 -04:00
ddb69e5af8 Add Image To Text Generation pipeline (#18821)
* Add Image2TextGenerationPipeline to supported pipelines

* Add Flax and Tensorflow support

* Add Flax and Tensorflow small tests

* Add default model for Tensorflow

* Add docstring

* Fix doc style

* Add tiny models for pytorch and flax

* Remove flax from pipeline.
Fix tests

* Use ydshieh/vit-gpt2-coco-en as a default for both PyTorch and Tensorflow

* Fix Tensorflow support

Co-authored-by: Olivier Dehaene <olivier@huggingface.co>
2022-09-01 12:07:14 -04:00
c61f116b63 Tie weights after preparing the model in run_clm (#18855) 2022-09-01 12:06:56 -04:00
1c381f3600 Cache results of is_torch_tpu_available() (#18777)
* Cache results of is_torch_tpu_available()

* Update src/transformers/utils/import_utils.py

* Update src/transformers/utils/import_utils.py
2022-09-01 11:45:33 -04:00
954e18ab97 TensorFlow MobileViT (#18555)
* initial implementation.

* add: working model till image classification.

* add: initial implementation that passes intg tests.

Co-authored-by: Amy <aeroberts4444@gmail.com>

* chore: formatting.

* add: tests (still breaking because of config mismatch).

Coo-authored-by: Yih <2521628+ydshieh@users.noreply.github.com>

* add: corrected tests and remaning changes.

* fix code style and repo consistency.

* address PR comments.

* address Amy's comments.

* chore: remove from_pt argument.

* chore: add full-stop.

* fix: TFLite model conversion in the doc.

* Update src/transformers/models/mobilevit/modeling_tf_mobilevit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/modeling_tf_mobilevit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/modeling_tf_mobilevit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/modeling_tf_mobilevit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/modeling_tf_mobilevit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply formatting.

* chore: remove comments from the example block.

* remove identation in the example.

Co-authored-by: Amy <aeroberts4444@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-09-01 10:35:15 -04:00
fe58929ad6 Adds timeout argument to training_args to avoid socket timeouts in DDP (#18562)
* chore(training_args): Adds support for timeout argument.

* fix(training_args): Passes make style through changes.

* fix(training_args): Removes wrong docstring sentence.

* fix(training_args): Fixes timeout not being JSON serializable.

* fix(training_args_sm): Also updates timeout to timeout_delta.

* fix(training_args): Fixes PR according to suggestions.
2022-09-01 10:33:53 -04:00
ab663b2274 reflect max_new_tokens in Seq2SeqTrainer (#18786)
* reflect max_new_tokens in gen_kwargs to `trainer.generate()`

* reflect max_new_tokens in `Seq2SeqTrainer`

* remove unnecessary variable

* Trigger CI

* fix style
2022-09-01 09:12:38 -04:00
f719c0377f Minor typo in prose of model outputs documentation. (#18848) 2022-09-01 12:05:40 +02:00
fafbb57df1 Pin rouge_score (#18247)
* Pin rouge_score

* Pin also in dependency_versions_table

* Update excluded versions

* Revert "Update excluded versions"

This reverts commit 0d0362df30a816108835f5c061272ee2bafec270.

* Revert "Revert "Update excluded versions""

This reverts commit 66c47af8a6baff253575631b0ba392e0354b6d56.
2022-09-01 12:04:49 +02:00
e7da38f5dc add a script to get time info. from GA workflow jobs (#18822)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-01 12:02:52 +02:00
6e016634f1 Generate: smaller TF serving test (#18840) 2022-09-01 10:53:39 +01:00
563a8d58db Delete state_dict to release memory as early as possible (#18832)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-09-01 10:55:30 +02:00
a26c752353 Unpin fsspec (#18846) 2022-09-01 10:20:15 +02:00
359f7b4b8d Create pipeline_tutorial.mdx german docs (#18625)
* Create pipeline_tutorial.mdx

* Update _toctree.yml
2022-09-01 09:57:59 +02:00
5d81a56833 Owlvit memory leak fix (#18734)
* fix memory leak
* fix typos
* use singular last hidden state variable names
* eliminate double call to self.owlvit to return last hidden states
* eliminate 2nd call to self.vision_model in OwlViTModel
2022-09-01 10:31:08 +03:00
80367cd1fb Add security warning about the from_pretrained() method (#18801)
* Add security warning about from_pretrained() method

* Add sentence about malware scanner

Co-authored-by: Julien Chaumond <julien@huggingface.co>
2022-08-31 21:48:40 +02:00
7e7f743481 Add SegFormer ONNX support (#18006)
* Add ONNX support

* Make height and width dynamic axes

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-08-31 20:58:44 +02:00
89514f0541 Improve Text Generation doc (#18788)
* fix args for bram search decoding in generation utils

* fix missing PAD token in gpt2

* add PAD EOS change to TF

* Update src/transformers/generation_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/generation_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/generation_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-08-31 20:30:29 +02:00
86387fe87f Add an option to HfArgumentParser.parse_{dict,json_file} to raise an Exception when there extra keys (#18692)
* Update parser to track unneeded keys, off by default

* Fix formatting

* Fix docstrings and defaults in HfArgparser

* Fix formatting
2022-08-31 20:26:45 +02:00
f210e2a414 Improve GPT2 doc (#18787)
* Minor typo in GPT2 doc

* improve gpt2 label doc

* update dim of label in GPT2ForTokenClassification

* add change to tf
2022-08-31 19:26:39 +02:00
74690b62a1 Pin ffspec (#18837)
* Pin ffspec

* Typo
2022-08-31 19:04:04 +02:00
3b6943e7a3 [DETR] Add num_channels attribute (#18714)
* Add num_channels attribute

* Fix code quality

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-08-31 18:04:42 +02:00
811c4c9f79 fix bug: register_for_auto_class should be defined on TFPreTrainedModel instead of TFSequenceSummary (#18607) 2022-08-31 16:37:18 +02:00
ee407024c4 Update location identification (#18834) 2022-08-31 15:10:25 +02:00
e4910213be Warn on TPUs when the custom optimizer and model device are not the same (#18668)
* Check optimizer for device on TPU

* Typo
2022-08-31 08:46:31 -04:00
cdde85a0a0 oob performance improvement for cpu DDP (#18595)
* oob performance improvement for cpu DDP

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add is_psutil_available check

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2022-08-31 14:35:10 +02:00
c3be98ebab Fix cost condition in DetrHungarianMatcher and YolosHungarianMatcher to allow zero-cost (#18647)
* Fix loss condition in DetrHungarianMatcher

* Fix costs condition in YolosHungarianMatcher
2022-08-31 14:28:58 +02:00
fea4636cfa Pin max tf version (#18818) 2022-08-31 10:07:53 +02:00
5c4c869014 Add LayoutLMForQuestionAnswering model (#18407)
* Add LayoutLMForQuestionAnswering model

* Fix output

* Remove TF TODOs

* Add test cases

* Add docs

* TF implementation

* Fix PT/TF equivalence

* Fix loss

* make fixup

* Fix up documentation code examples

* Fix up documentation examples + test them

* Remove LayoutLMForQuestionAnswering from the auto mapping

* Docstrings

* Add better docstrings

* Undo whitespace changes

* Update tokenizers in comments

* Fixup code and remove `from_pt=True`

* Fix tests

* Revert some unexpected docstring changes

* Fix tests by overriding _prepare_for_class

Co-authored-by: Ankur Goyal <ankur@impira.com>
2022-08-31 10:05:33 +02:00
e88e9ff045 Disable nightly CI temporarily (#18820)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-30 18:33:09 +02:00
73c6273d48 Improving the documentation for "word", within the pipeline. (#18763)
* Improving the documentation for "word", within the pipeline.

* Quality.
2022-08-30 15:29:48 +02:00
5727dfcebe Added Docstrings for Deberta and DebertaV2 [PyTorch] (#18610)
* Added Doctest for Deberta Pytorch

* Added path in documentation test file

* Added docstrings for DebertaV2

* Revert "Added docstrings for DebertaV2"

This reverts commit 307185e62a21b3bd0923444cc8a8af1747fd2600.

* Added DebertaV2 Docstrings
2022-08-30 14:46:21 +02:00
a98f6a1da0 LayoutXLMProcessor: ensure 1-to-1 mapping between samples and images, and add test for it (#18774) 2022-08-30 14:43:14 +02:00
220da3b8a1 Adds GroupViT to models exportable with ONNX (#18628)
* groupvit to onnx

* dynamic shape for pixel values dim
2022-08-30 14:31:35 +02:00
46d0e26a27 Adds OWLViT to models exportable with ONNX (#18588)
* onnx conversion for owlvit

* .T to .t()

* dynamic shapes for pixel values
2022-08-30 14:30:59 +02:00
b83796ded7 Remove ViltForQuestionAnswering from check_repo (#18762)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-08-30 14:15:36 +02:00
ef91a2d135 Run tests if skip condition not met (#18764)
* Run tests if skip condition not met

* Update comment - remove outdated ref to TF 2.8
2022-08-30 14:03:28 +02:00
de8548ebf3 [LayoutLMv3] Add TensorFlow implementation (#18678)
Co-authored-by: Esben Toke Christensen <esben.christensen@visma.com>
Co-authored-by: Lasse Reedtz <lasse.reedtz@visma.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2022-08-30 11:48:11 +01:00
7320d95d98 [Swin, Swinv2] Fix attn_mask dtype (#18803)
* Add dtype

* Fix Swinv2 as well

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-08-30 12:31:34 +02:00
5c702175eb up (#18805) 2022-08-30 12:30:46 +02:00
da02b4035c Add docstring for BartForCausalLM (#18795)
* add docstring for BartForCausalLM

* doc-style fic
2022-08-30 12:19:03 +02:00
8c4a11493f Revert to and safely handle flag in owlvit config (#18750) 2022-08-29 18:48:24 +02:00
da5bb29219 send model to the correct device (#18800)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-29 18:46:30 +02:00
f1fd460694 Add SegFormer and ViLT links (#18808)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-08-29 18:46:07 +02:00
169b8cde47 Fix mock in test_cached_files_are_used_when_internet_is_down (#18804) 2022-08-29 15:56:08 +02:00
8b67f20935 Fix memory leak issue in torch_fx tests (#18547)
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-29 11:43:20 +02:00
b10a3b3760 fix a possible typo in auto feature extraction (#18779) 2022-08-29 11:24:53 +02:00
5f06a09b9f fix missing block when there is no failure (#18775)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-29 09:10:13 +02:00
f2fbe44753 Fix broken link DeepSpeed documentation link (#18783)
* Fix broken link

* Trigger CI

Co-authored-by: Stas Bekman <stas@stason.org>
2022-08-28 19:32:19 -07:00
21f6f58721 Fix incomplete outputs of FlaxBert (#18772)
* Fix incomplete FlaxBert outputs

* fix big_bird electra roberta
2022-08-26 21:04:18 +02:00
62ceb4d661 [Wav2vec2 + LM Test] Improve wav2vec2 with lm tests and make torch version dependent for now (#18749)
* add first generation tutorial

* remove generation

* make version dependent expected values

* Apply suggestions from code review

* Update tests/models/wav2vec2_with_lm/test_processor_wav2vec2_with_lm.py

* fix typo
2022-08-26 14:11:55 +02:00
8869bf41fe [VisionEncoderDecoder] Add gradient checkpointing (#18697)
* add first generation tutorial

* VisionEnocderDecoder gradient checkpointing

* remove generation

* add tests
2022-08-26 14:11:27 +02:00
06a6a4bd51 CLI: Improved error control and updated hub requirement (#18752) 2022-08-25 17:08:05 +01:00
e9442440fc streamlining 'checkpointing_steps' parsing (#18755) 2022-08-25 11:00:38 -04:00
fbf382c84d Determine framework automatically before ONNX export (#18615)
* Automatic detection for framework to use when exporting to ONNX

* Log message change

* Incorporating PR comments, adding unit test

* Adding tf for pip install for run_tests_onnxruntime CI

* Restoring past changes to circleci yaml and test_onnx_v2.py, tests moved to tests/onnx/test_features.py

* Fixup

* Adding test to fetcher

* Updating circleci config to log more

* Changing test class name

* Comment typo fix in tests/onnx/test_features.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Moving torch_str/tf_str to self.framework_pt/tf

* Remove -rA flag in circleci config

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2022-08-25 16:31:34 +02:00
3223d49354 Add ONNX support for Longformer (#17176)
* Implement ONNX support for Longformer

Fix repo consistency check complaints

Fix value mismatches

Add pooler output for default model

Increase validation atol to accommodate multiple-choice error

Fix copies

Fix chunking for longer sequence lengths

Add future comment

* Fix issue in mask_invalid_locations

* Remove torch imports in configuration_longformer

* Change config access to fix LED

* Push opset version to support tril

* Work in review comments (mostly style)

* Add Longformer to ONNX tests
2022-08-25 08:34:42 +02:00
c55d6e4e10 examples/run_summarization_no_trainer: fixed incorrect param to hasattr (#18720)
* fixed incorrect param to hasattr

* simplified condition checks

* code cleanup
2022-08-24 12:12:42 -04:00
6667b0d7bf add warning to let the user know that the __call__ method is faster than encode + pad for a fast tokenizer (#18693)
* add warning to let the user know that the  method is slower that  for a fast tokenizer

* user warnings

* fix layoutlmv2

* fix layout*

* change warnings into logger.warning
2022-08-24 06:27:56 -04:00
dcff504e18 fixed docstring typos (#18739)
* fixed docstring typos

* Added missing colon

Co-authored-by: 김주영 <juyoung@zezedu.com>
2022-08-24 06:20:27 -04:00
e49c71fc4c Bump nbconvert from 6.3.0 to 6.5.1 in /examples/research_projects/lxmert (#18742)
Bumps [nbconvert](https://github.com/jupyter/nbconvert) from 6.3.0 to 6.5.1.
- [Release notes](https://github.com/jupyter/nbconvert/releases)
- [Commits](https://github.com/jupyter/nbconvert/compare/6.3.0...6.5.1)

---
updated-dependencies:
- dependency-name: nbconvert
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-24 06:12:56 -04:00
5b24949669 Bump nbconvert in /examples/research_projects/visual_bert (#18741)
Bumps [nbconvert](https://github.com/jupyter/nbconvert) from 6.3.0 to 6.5.1.
- [Release notes](https://github.com/jupyter/nbconvert/releases)
- [Commits](https://github.com/jupyter/nbconvert/compare/6.3.0...6.5.1)

---
updated-dependencies:
- dependency-name: nbconvert
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-24 06:12:48 -04:00
c72d7d91bf Add TF implementation of XGLMModel (#16543)
* Add TFXGLM models 

* Add todo: self.supports_xla_generation = False

Co-authored-by: Daniel Stancl <stancld@Daniels-MacBook-Pro.local>
Co-authored-by: Daniel Stancl <stancld@daniels-mbp.home>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Daniel <daniel.stancl@rossum.ai>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-08-24 10:51:05 +01:00
cecf9f9b27 fix pipeline_tutorial.mdx doctest (#18717)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-24 05:38:03 -04:00
a442884b87 Add minor doc-string change to include hp_name param in hyperparameter_search (#18700)
* Add minor doc-string change to include hp_name

* fix: missing type-information for kwargs

* fix: missing white-space in hyperparameter_search doc-strings
2022-08-24 05:07:17 -04:00
c12dbdc246 Update perf_infer_gpu_many.mdx (#18744) 2022-08-24 10:37:52 +02:00
6faf283288 CLI: Don't check the model head when there is no model head (#18733) 2022-08-23 15:38:59 +01:00
438698085c improve add_tokens docstring (#18687)
* improve add_tokens documentation

* format
2022-08-23 07:23:51 -04:00
891704b3c2 Removing warning of model type for microsoft/tapex-base-finetuned-wtq (#18711)
and friends.
2022-08-23 13:17:06 +02:00
84beb8a49b Unpin detectron2 (#18727)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-23 11:10:07 +02:00
d90a36d192 remove check for main process for trackers initialization (#18706) 2022-08-22 11:16:27 -04:00
0f257a8774 Add missing tokenizer tests - Longformer (#17677) 2022-08-22 12:13:20 +02:00
3fa45dbd91 Fix Data2VecVision ONNX test (#18587)
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-22 11:28:23 +02:00
30992ef0d9 [Hotfix] pin detectron2 5aeb252 to avoid test fix (#18701)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-20 00:37:38 +02:00
1f3c2282b5 Temp fix for broken detectron2 import (#18699)
* add first generation tutorial

* [Circle CI] Temporary fix for broken detectron2 import

* remove generation
2022-08-19 22:55:33 +02:00
e95d433d77 Generate: add missing **model_kwargs in sample tests (#18696) 2022-08-19 16:14:27 +01:00
e54a1b49aa model.tie_weights() should be applied after accelerator.prepare() (#18676)
* `model.tie_weights()` should be applied after `accelerator.prepare`

Weight tying should be done after the model has been moved to XLA device as mentioned on PyTorch/XLA Troubleshooting guide [here](https://github.com/pytorch/xla/blob/master/TROUBLESHOOTING.md#xla-tensor-quirks)

* format code
2022-08-18 13:46:57 -04:00
bbbb453e58 Add an examples folder for code downstream tasks (#18679)
* add examples subfolder

* mention examples in codeparrot readme

* use Trainer optimizer and scheduler type and add output_dir as argument

* add example of text-to-python and python-to-text models

* mention the downstream examples in the readme

* fix typo
2022-08-18 18:24:24 +02:00
a123eee9df [bnb] Move documentation (#18671)
* fix bnb documentation

- move bnb documentation to `infer_gpu_many`

* small refactoring

- added text on infer_gpu_one
- added a small note on infer_gpu_many
- added customized multi gpu example on infer_gpu_many

* Update docs/source/en/perf_infer_gpu_many.mdx

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* apply suggestions

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2022-08-18 17:34:48 +02:00
358fc18613 Add evaluate to examples requirements (#18666) 2022-08-18 10:57:39 -04:00
d243112b65 Fix breaking change in onnxruntime for ONNX quantization (#18336)
* Fix quantization

* Save model

* Remove unused comments

* Fix formatting
2022-08-18 10:06:16 -04:00
5987c637ee Fix repo consistency (#18682) 2022-08-18 09:47:50 -04:00
76454b08c8 Rename second input dimension from "sequence" to "num_channels" for CV models (#17976) 2022-08-18 15:13:54 +02:00
780253ce3d Rename method to avoid clash with property (#18677) 2022-08-18 12:56:27 +01:00
2c947d2939 Ping detectron2 for CircleCI tests (#18680)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-18 12:57:18 +02:00
a541d97477 Generate: validate model_kwargs on FLAX (and catch typos in generate arguments) (#18653) 2022-08-18 10:56:21 +01:00
0ea53822f8 [LongT5] Correct docs long t5 (#18669)
* add first generation tutorial

* [LongT5 Docs] Correct docs

* correct expected string

* remove incorrect file
2022-08-18 10:03:50 +02:00
582c537175 Allow users to force TF availability (#18650)
* Allow users to force TF availability

* Correctly name the envvar!
2022-08-18 03:09:09 -04:00
49e44b216b Update feature extractor methods to enable type cast before normalize (#18499)
* Update methods to optionally rescale
This is necessary to allow for casting our images / videos to numpy arrays within the feature extractors' call. We want to do this to make sure the behaviour is as expected when flags like  are False. If some transformations aren't applied, then the output type can't be unexpected e.g. a list of PIL images instead of numpy arrays.

* Cast images to numpy arrays in call to enable consistent behaviour with different configs

* Remove accidental clip changes

* Update tests to reflect the scaling logic
We write a generic  function to handle rescaling of our arrays. In order for the API to be intuitive, we take some factor c and rescale the image values by that. This means, the rescaling done in normalize and to_numpy_array are now done with array * (1/255) instead of array / 255. This leads to small differences in the resulting image. When testing, this was in the order of 1e-8, and so deemed OK
2022-08-17 19:57:07 +01:00
86d0b26d6c Fix matmul inputs dtype (#18585) 2022-08-17 15:59:43 +02:00
c99e984657 Fix Yolos ONNX export test (#18606)
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-17 10:04:49 +02:00
358478e729 Examples: add Bloom support for token classification (#18632)
* examples: add Bloom support for token classification (FLAX, PyTorch and TensorFlow)

* examples: remove support for Bloom in token classication (FLAX and TensorFlow currently have no support for it)
2022-08-17 09:50:57 +02:00
6d175c1129 [bnb] Minor modifications (#18631)
* bnb minor modifications

- refactor documentation
- add troubleshooting README
- add PyPi library on DockerFile

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Apply suggestions from code review

* Apply suggestions from code review

* Apply suggestions from code review

* put in one block

- put bash instructions in one block

* update readme

- refactor a bit hardware requirements

* change text a bit

* Apply suggestions from code review

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* apply suggestions

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* add link to paper

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update tests/mixed_int8/README.md

* Apply suggestions from code review

* refactor a bit

* add instructions Turing & Amperer

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add A6000

* clarify a bit

* remove small part

* Update tests/mixed_int8/README.md

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2022-08-17 00:48:10 +02:00
25e651a2de Update run_translation_no_trainer.py (#18637)
* Update run_translation_no_trainer.py

found an error in selecting `no_decay` parameters and some small modifications when the user continues to train from a checkpoint

* fixs `no_decay` and `resume_step` issue

1. change `no_decay` list
2. if use continue to train their model from provided checkpoint, the `resume_step` will not be initialized properly if `args.gradient_accumulation_steps != 1`
2022-08-16 13:25:57 -04:00
a27195b1de Update longt5.mdx (#18634) 2022-08-16 10:20:46 -05:00
fd9aa82b07 TF: Fix generation repetition penalty with XLA (#18648) 2022-08-16 13:30:52 +01:00
81ab11124f Add checks for some workflow jobs (#18583)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-16 13:53:47 +02:00
510c2a0b32 Change scheduled CIs to use torch 1.12.1 (#18644)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-16 13:41:37 +02:00
9cf274685a mac m1 mps integration (#18598)
* mac m1 `mps` integration

* Update docs/source/en/main_classes/trainer.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* addressing comments

* Apply suggestions from code review

Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>

* resolve comment

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Dan Saattrup Nielsen <47701536+saattrupdan@users.noreply.github.com>
2022-08-16 16:34:51 +05:30
d6eeb87170 Flax Remat for LongT5 (#17994)
* [Flax] Add remat (gradient checkpointing)

* fix variable naming in test

* flip: checkpoint using a method

* fix naming

* fix class naming

* apply PVP's suggestions from code review

* add gradient_checkpointing to examples

* Add gradient_checkpointing to run_mlm_flax

* Add remat to longt5

* Add gradient checkpointing test longt5

* Fix args errors

* Fix remaining tests

* Make fixup & quality fixes

* replace kwargs

* remove unecessary kwargs

* Make fixup changes

* revert long_t5_flax changes

* Remove return_dict and copy to LongT5

* Remove test_gradient_checkpointing

Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
2022-08-14 16:27:13 +01:00
1ccd2515ed small change (#18584) 2022-08-12 20:04:38 +02:00
b3ff7c680c [fsmt] deal with -100 indices in decoder ids (#18592)
* [fsmt] deal with -100 indices in decoder ids

Fixes: https://github.com/huggingface/transformers/issues/17945

decoder ids get the default index -100, which breaks the model - like t5 and many other models add a fix to replace -100 with the correct pad index. 

For some reason this use case hasn't been used with this model until recently - so this issue was there since the beginning it seems.

Any suggestions to how to add a simple test here? or perhaps we have something similar already? user's script is quite massive.

* style
2022-08-12 10:50:52 -07:00
37c5991843 [doc] fix anchors (#18591)
the manual anchors end up being duplicated with automatically added anchors and no longer work.
2022-08-12 10:49:59 -07:00
56ef0ba447 Update BLOOM parameter counts (#18531)
* Update BLOOM parameter counts

* Update BLOOM parameter counts
2022-08-12 19:36:18 +02:00
153d1361c7 Fix URLs (#18604)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-08-12 18:52:49 +02:00
2ab790e82d Add Donut (#18488)
* First draft

* Improve script

* Update script

* Make conversion work

* Add final_layer_norm attribute to Swin's config

* Add DonutProcessor

* Convert more models

* Improve feature extractor and convert base models

* Fix bug

* Improve integration tests

* Improve integration tests and add model to README

* Add doc test

* Add feature extractor to docs

* Fix integration tests

* Remove register_buffer

* Fix toctree and add missing attribute

* Add DonutSwin

* Make conversion script work

* Improve conversion script

* Address comment

* Fix bug

* Fix another bug

* Remove deprecated method from docs

* Make Swin and Swinv2 untouched

* Fix code examples

* Fix processor

* Update model_type to donut-swin

* Add feature extractor tests, add token2json method, improve feature extractor

* Fix failing tests, remove integration test

* Add do_thumbnail for consistency

* Improve code examples

* Add code example for document parsing

* Add DonutSwin to MODEL_NAMES_MAPPING

* Add model to appropriate place in toctree

* Update namespace to appropriate organization

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-08-12 16:40:58 +02:00
a5ca56ff15 Supporting seq2seq models for bitsandbytes integration (#18579)
* Supporting seq2seq models for `bitsandbytes` integration

- `bitsandbytes` integration supports now seq2seq models
- check if a model has tied weights as an additional check

* small modification

- tie the weights before looking at tied weights!
2022-08-12 16:15:09 +02:00
ed1924e801 Generate: validate model_kwargs (and catch typos in generate arguments) (#18261)
* validate generate model_kwargs

* generate tests -- not all models have an attn mask
2022-08-12 14:53:51 +01:00
2156619f10 Add TFAutoModelForSemanticSegmentation to the main __init__.py (#18600)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-12 15:10:00 +02:00
4eed2beca0 FSDP bug fix for load_state_dict (#18596) 2022-08-12 08:48:37 -04:00
d344534bf6 typos (#18594) 2022-08-12 08:40:53 -04:00
3cdaea47ec update doc for perf_train_cpu_many, add intel mpi introduction (#18576)
* update doc for perf_train_cpu_many, add mpi introduction

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Update docs/source/en/perf_train_cpu_many.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/perf_train_cpu_many.mdx

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-08-12 08:36:27 -04:00
46d09410eb Add type hints for ViLT models (#18577)
* Add type hints for Vilt models

* Add missing return type for TokenClassification class
2022-08-12 12:11:28 +01:00
bce36ee065 Load sharded pt to flax (#18419)
* initial commit

* add small test

* add cross pt tf flag to test

* fix quality

* style

* update test with new repo

* fix failing test

* update

* fix wrong param ordering

* style

* update based on review

* update related to recent new caching mechanism

* quality

* Update based on review

Co-authored-by: sgugger <sylvain.gugger@gmail.com>

* quality and style

* Update src/transformers/modeling_flax_utils.py
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-08-12 09:48:10 +02:00
c8b6ae858d Return the permuted hidden states if return_dict=True (#18578) 2022-08-11 17:32:11 +01:00
f28f240828 fix owlvit tests, update docstring examples (#18586) 2022-08-11 19:10:25 +03:00
05d3a43c59 Bump nbconvert in /examples/research_projects/visual_bert (#18566)
Bumps [nbconvert](https://github.com/jupyter/nbconvert) from 6.0.1 to 6.3.0.
- [Release notes](https://github.com/jupyter/nbconvert/releases)
- [Commits](https://github.com/jupyter/nbconvert/compare/6.0.1...6.3.0)

---
updated-dependencies:
- dependency-name: nbconvert
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-11 10:47:31 -04:00
713ab6fde5 Bump nbconvert from 6.0.1 to 6.3.0 in /examples/research_projects/lxmert (#18565)
Bumps [nbconvert](https://github.com/jupyter/nbconvert) from 6.0.1 to 6.3.0.
- [Release notes](https://github.com/jupyter/nbconvert/releases)
- [Commits](https://github.com/jupyter/nbconvert/compare/6.0.1...6.3.0)

---
updated-dependencies:
- dependency-name: nbconvert
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-11 10:47:19 -04:00
c23cbdff4c Fix docstrings with last version of hf-doc-builder styler (#18581)
* Fix docstrings with last version of hf-doc-builder styler

* Remove empty Parameter block
2022-08-11 10:35:47 -04:00
42b8940b34 [FX] _generate_dummy_input supports audio-classification models for labels (#18580)
* Support audio classification architectures for labels generation, as well as provides a flag to print warnings or not

* Use ENV_VARS_TRUE_VALUES
2022-08-11 16:34:44 +02:00
d53dffec6e Deberta V2: Fix critical trace warnings to allow ONNX export (#18272)
* Fix critical trace warnings to allow ONNX export

* Force input to `sqrt` to be float type

* Cleanup code

* Remove unused import statement

* Update model sew

* Small refactor

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

* Use broadcasting instead of repeat

* Implement suggestion

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

* Match deberta v2 changes in sew_d

* Improve code quality

* Update code quality

* Consistency of small refactor

* Match changes in sew_d

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
2022-08-11 09:54:43 -04:00
5d3f037433 german docs translation (#18544)
* Create _config.py

* Create _toctree.yml

* Create index.mdx

not sure about "du / ihr" oder "sie"

* Create quicktour.mdx

* Update _toctree.yml

* Update build_documentation.yml

* Update build_pr_documentation.yml

* fix build

* Update index.mdx

* Update quicktour.mdx

* Create installation.mdx

* Update _toctree.yml
2022-08-11 09:52:27 -04:00
80468251bc Change BartLearnedPositionalEmbedding's forward method signature to support Opacus training (#18486)
* changing BartLearnedPositionalEmbedding forward signature and references to it

* removing debugging dead code (thanks style checker)

* blackened modeling_bart file

* removing copy inconsistencies via make fix-copies

* changing references to copied signatures in Bart variants

* make fix-copies once more

* using expand over repeat (thanks @michaelbenayoun)

* expand instead of repeat for all model copies

Co-authored-by: Daniel Jones <jonesdaniel@microsoft.com>
2022-08-11 09:45:04 -04:00
3f0707b2fe Skip broken tests 2022-08-11 09:33:41 -04:00
4c8ec66a74 Fix LayoutLMv3 documentation (#17932)
* fix typos

* fix sequence_length docs of LayoutLMv3Model

* delete trailing white spaces

* fix layoutlmv3 docs more

* apply make fixup & quality

* change to two versions of input docstring

* apply make fixup & quality
2022-08-11 08:51:39 -04:00
f762f373cc Fix resizing bug in OWL-ViT (#18573)
* Fixes resizing bug in OWL-ViT
* Defaults to square resize if size is set to an int
* Sets do_center_crop default value to False
2022-08-11 15:44:23 +03:00
76568d24b6 Segformer TF: fix output size in documentation (#18572)
* Segformer TF: fix output size in doc

* Segformer pytorch: fix output size in doc

Co-authored-by: Maxime Gardoni <maxime.gardoni@ecorobotix.com>
2022-08-11 10:59:37 +02:00
051311ff66 fix string (#18568) 2022-08-10 15:28:19 -07:00
9a9a525be8 raise atol for MT5OnnxConfig (#18560)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-10 22:41:58 +02:00
f62cb8313c Adds CLIP to models exportable with ONNX (#18515)
* onnx config for clip

* default opset as 14

* changes from the original repo

* input values order fix

* outputs fix

* remove unused import

* ran make fix-copies

* black format

* review comments: forward ref, import fix, model change revert, .to cleanup

* make style

* formatting fixes

* revert groupvit

* comment for cast to int32

* comment fix

* make .T as .t() for onnx conversion

* ran make fix-copies

* remove unneeded comment

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix copies

* remove comment

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-08-10 15:47:31 -04:00
50949fab74 Properly move cache when it is not in default path (#18563) 2022-08-10 15:46:03 -04:00
6936e7c487 Update philosophy to include other preprocessing classes (#18550)
* 📝 update philosophy to include other preprocessing classes

* 🖍 apply feedbacks
2022-08-10 13:20:39 -05:00
9d4a45509a pipeline support for device="mps" (or any other string) (#18494)
* `pipeline` support for `device="mps"` (or any other string)

* Simplify `if` nesting

* Update src/transformers/pipelines/base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix? @sgugger

* passing `attr=None` is not the same as not passing `attr` 🤯

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-08-10 18:52:15 +02:00
0d0aada564 Use commit hash to look in cache instead of calling head (#18534)
* Use commit hash to look in cache instead of calling head

* Add tests

* Add attr for local configs too

* Stupid typos

* Fix tests

* Update src/transformers/utils/hub.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Address Julien's comments

Co-authored-by: Julien Chaumond <julien@huggingface.co>
2022-08-10 11:55:18 -04:00
6eb51450fa TF Examples Rewrite (#18451)
* Finished QA example

* Dodge a merge conflict

* Update text classification and LM examples

* Update NER example

* New Keras metrics WIP, fix NER example

* Update NER example

* Update MC, summarization and translation examples

* Add XLA warnings when shapes are variable

* Make sure batch_size is consistently scaled by num_replicas

* Add PushToHubCallback to all models

* Add docs links for KerasMetricCallback

* Add docs links for prepare_tf_dataset and jit_compile

* Correct inferred model names

* Don't assume the dataset has 'lang'

* Don't assume the dataset has 'lang'

* Write metrics in text classification

* Add 'framework' to TrainingArguments and TFTrainingArguments

* Export metrics in all examples and add tests

* Fix training args for Flax

* Update command line args for translation test

* make fixup

* Fix accidentally running other tests in fp16

* Remove do_train/do_eval from run_clm.py

* Remove do_train/do_eval from run_mlm.py

* Add tensorflow tests to circleci

* Fix circleci

* Update examples/tensorflow/language-modeling/run_mlm.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update examples/tensorflow/test_tensorflow_examples.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update examples/tensorflow/translation/run_translation.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update examples/tensorflow/token-classification/run_ner.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix save path for tests

* Fix some model card kwargs

* Explain the magical -1000

* Actually enable tests this time

* Skip text classification PR until we fix shape inference

* make fixup

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2022-08-10 16:49:51 +01:00
d7e2d7b40b Preserve hub-related kwargs in AutoModel.from_pretrained (#18545)
* Preserve hub-related kwargs in AutoModel.from_pretrained

* Fix tests

* Remove debug statement
2022-08-10 08:00:18 -04:00
34aad0dac0 TF: XLA-trainable DeBERTa v2 (#18546)
* fix deberta issues

* add different code paths for gpu and tpu

* shorter gpu take along axis

* Stable Dropout without tf cond

* variable must be float
2022-08-10 12:57:21 +01:00
4a51075a96 bitsandbytes - Linear8bitLt integration into transformers models (#17901)
* first commit

* correct replace function

* add final changes

- works like charm!
- cannot implement tests yet
- tested

* clean up a bit

* add bitsandbytes dependencies

* working version

- added import function
- added bitsandbytes utils file

* small fix

* small fix

- fix import issue

* fix import issues

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refactor a bit

- move bitsandbytes utils to utils
- change comments on functions

* reformat docstring

- reformat docstring on init_empty_weights_8bit

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* revert bad formatting

* change to bitsandbytes

* refactor a bit

- remove init8bit since it is useless

* more refactoring

- fixed init empty weights issue
- added threshold param

* small hack to make it work

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_utils.py

* revmoe the small hack

* modify utils file

* make style + refactor a bit

* create correctly device map

* add correct dtype for device map creation

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply suggestions

- remove with torch.grad
- do not rely on Python bool magic!

* add docstring

 - add docstring for new kwargs

* add docstring

- comment `replace_8bit_linear` function
- fix weird formatting

* - added more documentation
- added new utility function for memory footprint tracking
- colab demo to add

* few modifs

- typo doc
- force cast into float16 when load_in_8bit is enabled

* added colab link

* add test architecture + docstring a bit

* refactor a bit testing class

* make style + refactor a bit

* enhance checks

- add more checks
- start writing saving test

* clean up a bit

* male style

* add more details on doc

* add more tests

- still needs to fix 2 tests

* replace by "or"

- could not fix it from GitHub GUI

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refactor a bit testing code + add readme

* make style

* fix import issue

* Update src/transformers/modeling_utils.py

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

* add few comments

* add more doctring + make style

* more docstring

* raise error when loaded in 8bit

* make style

* add warning if loaded on CPU

* add small sanity check

* fix small comment

* add bitsandbytes on dockerfile

* Improve documentation

- improve documentation from comments

* add few comments

* slow tests pass on the VM but not on the CI VM

* Fix merge conflict

* make style

* another test should pass on a multi gpu setup

* fix bad import in testing file

* Fix slow tests

- remove dummy batches
- no more CUDA illegal memory errors

* odify dockerfile

* Update docs/source/en/main_classes/model.mdx

* Update Dockerfile

* Update model.mdx

* Update Dockerfile

* Apply suggestions from code review

* few modifications

- lm head can stay on disk/cpu
- change model name so that test pass

* change test value

- change test value to the correct output
- torch bmm changed to baddmm in bloom modeling when merging

* modify installation guidelines

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* replace `n`by `name`

* merge `load_in_8bit` and `low_cpu_mem_usage`

* first try - keep the lm head in full precision

* better check

- check the attribute `base_model_prefix` instead of computing the number of parameters

* added more tests

* Update src/transformers/utils/bitsandbytes.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Merge branch 'integration-8bit' of https://github.com/younesbelkada/transformers into integration-8bit

* improve documentation

- fix typos for installation
- change title in the documentation

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
2022-08-10 09:13:36 +02:00
8cf4a6f0a6 📝 update documentation build section (#18548) 2022-08-09 18:22:55 -05:00
38a674599c Clean up comment 2022-08-09 15:15:01 -04:00
5e2f373705 Restore _init_weights value in no_init_weights (#18504)
* Recover _init_weights value in no_init_weights

For potential nested use. 
In addition, users might modify private no_init_weights as well.

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove private variable change check

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-08-09 14:23:30 -04:00
0c183cc2f4 📝 update metric with evaluate (#18535) 2022-08-09 11:58:11 -05:00
9f5fe63548 Adding a new align_to_words param to qa pipeline. (#18010)
* Adding a new `align_to_words` param to qa pipeline.

* Update src/transformers/pipelines/question_answering.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Import protection.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-08-09 18:50:02 +02:00
ab2006e3d6 BART - Fix attention mask device issue on copied models (#18540)
* attempt to fix attn mask device

* fix bart `_prepare_decoder_attention_mask`

- add correct device
- run `make fix-copies` to propagate the fix
2022-08-09 14:47:18 +02:00
6bea7b8178 Minor update of run_call_with_unpacked_inputs (#18541)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-09 14:33:41 +02:00
8cb5ecd912 Add mt5 onnx config (#18394)
* update features

* MT5OnnxConfig added with updated with tests and docs

* fix imports

* fix onnc_config_cls for mt5

Co-authored-by: Thomas Chaigneau <thomas.deeptools.ai>
2022-08-09 03:46:53 -04:00
fe785730dc fix: data2vec-vision Onnx ready-made configuration. (#18427)
* feat: add the data2vec conf that are missing https://huggingface.co/docs/transformers/serialization

* fix: wrong config
2022-08-09 03:35:05 -04:00
ab62a23d8c Let's not cast them all (#18471)
* add correct dtypes when checking for params dtype

* forward contrib credits

* Update src/transformers/modeling_utils.py

Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>

* more comments

- added more comments on why we cast only floating point parameters

* Update src/transformers/modeling_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: sgugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
2022-08-08 23:48:49 +02:00
499450ed75 Spanish translation of summarization.mdx (#15947) (#18477)
* Add Spanish translation of summarization.mdx

* Apply suggestions from code review

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
2022-08-08 15:54:11 -04:00
ed70f24291 Add Spanish translation of converting_tensorflow_models.mdx (#18512)
* Add file in spanish docs to be translated

* Finish translation to Spanish

* Improve Spanish  wording

* Add suggested changes from review
2022-08-08 15:53:43 -04:00
a765b68aa6 Update no_trainer.py scripts to include accelerate gradient accumulation wrapper (#18473)
* Added accelerate gradient accumulation wrapper to run_image_classification_no_trainer.py example script

* make fixup changes

* PR comments

* changed input to Acceletor based on PR comment, ran make fixup

* Added comment explaining the sync_gradients statement

* Fixed lr scheduler max steps

* Changed run_clm_no_trainer.py script to use accelerate gradient accum wrapper

* Fixed all scripts except wav2vec2 pretraining to use accelerate gradient accum wrapper

* Added accelerate gradient accum wrapper for wav2vec2_pretraining_no_trainer.py script

* make fixup and lr_scheduler step inserted back into run_qa_beam_search_no_trainer.py

* removed changes to run_wav2vec2_pretraining_no_trainer.py script and fixed using wrong constant in qa_beam_search_no_trainer.py script
2022-08-08 15:52:47 -04:00
f1f5de31ed Update perf_train_gpu_one.mdx (#18532) 2022-08-08 20:33:34 +02:00
82bb682643 [VideoMAE] Add model to doc tests (#18523)
* Add videomae to doc tests

* Add pip install decord

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-08-08 19:28:51 +02:00
3632531ec6 Add example of multimodal usage to pipeline tutorial (#18498)
* 📝 add example of multimodal usage to pipeline tutorial

* 🖍 apply feedbacks

* 🖍 apply niels feedback
2022-08-08 11:31:31 -05:00
36b37990af update to use interlibrary links instead of Markdown (#18500) 2022-08-08 10:53:52 -05:00
ec8d26248f unpin resampy (#18527)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-08 17:44:10 +02:00
47e1676255 New cache fixes: add safeguard before looking in folders (#18522) 2022-08-08 10:22:27 -04:00
7495924007 Specify en in doc-builder README example (#18526)
Co-authored-by: Ankur Goyal <ankur@impira.com>
2022-08-08 10:22:17 -04:00
aff5117f46 Remove debug statement 2022-08-08 09:54:10 -04:00
70b0d4e193 Fix compatibility with 1.12 (#17925)
* Fix compatibility with 1.12

* Remove pin from examples requirements

* Update torch scatter version

* Fix compatibility with 1.12

* Remove pin from examples requirements

* Update torch scatter version

* fix torch.onnx.symbolic_opset12 import

* Reject bad version

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-08 09:53:08 -04:00
2fecde742d update fsdp docs (#18521)
* updating fsdp documentation

* typo fix
2022-08-08 18:56:51 +05:30
377cdded7a Clean up hub (#18497)
* Clean up utils.hub

* Remove imports

* More fixes

* Last fix
2022-08-08 08:48:10 -04:00
a4562552eb [DX fix] Fixing QA pipeline streaming a dataset. (#18516)
* [DX fix] Fixing QA pipeline streaming a dataset.

QuestionAnsweringArgumentHandler would iterate over the whole dataset
effectively killing all properties of the pipeline.
This restores nice properties when using `Dataset` or `Generator` since
those are meant to be consumed lazily.

* Handling TF better.
2022-08-08 14:25:56 +02:00
88a0ce57bb Add seed setting to image classification example (#18519) 2022-08-08 08:08:11 -04:00
9129fd0377 transformers-cli login => huggingface-cli login (#18490)
* zero chance anyone's using that constant no?

* `transformers-cli login` => `huggingface-cli login`

* `transformers-cli repo create` => `huggingface-cli repo create`

* `make style`
2022-08-06 09:42:55 +02:00
8d1f9039d0 Just re-reading the whole doc every couple of months 😬 (#18489)
* Delete valohai.yaml

* NLP => ML

* typo

* website supports https

* datasets

* 60k + modalities

* unrelated link fixing for accelerate

* Ok those links were actually broken

* Fix link

* Make `AutoTokenizer` auto-link

* wording tweak

* add at least one non-nlp task
2022-08-06 09:38:55 +02:00
b8c247b6d0 Typo reported by Joel Grus on TWTR (#18493) 2022-08-05 13:29:38 -04:00
38d656041b disable Onnx test for google/long-t5-tglobal-base (#18454)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-05 19:27:19 +02:00
56a55d3ce4 Forgot one new_ for cache migration 2022-08-05 13:24:53 -04:00
9d64f7f00c Update some expected values in quicktour.mdx for resampy 0.3.0 (#18484)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-05 19:17:51 +02:00
faacdf007b Move cache folder to huggingface/hub for consistency with hf_hub (#18492)
* Move cache folder to just huggingface

* Thank you VsCode for this needless import

* Move to hub

* Forgot one
2022-08-05 13:14:00 -04:00
280db2e39c Fix test_dbmdz_english by updating expected values (#18482)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-05 16:49:54 +02:00
5cd4032368 Use new huggingface_hub tools for download models (#18438)
* Draft new cached_file

* Initial draft for config and model

* Small fixes

* Fix first batch of tests

* Look in cache when internet is down

* Fix last tests

* Bad black, not fixing all quality errors

* Make diff less

* Implement change for TF and Flax models

* Add tokenizer and feature extractor

* For compatibility with main

* Add utils to move the cache and auto-do it at first use.

* Quality

* Deal with empty commit shas

* Deal with empty etag

* Address review comments
2022-08-05 10:12:40 -04:00
70fa1a8d26 Fix pipeline tests (#18487)
* Fix pipeline tests

* Make sure all pipelines tests run with init changes
2022-08-05 09:14:51 -04:00
c7849d9efc Remove py.typed (#18485) 2022-08-05 09:12:19 -04:00
893122f666 Add TF prefix to TF-Res test class (#18481)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-05 13:59:55 +02:00
bf174f916b Refactor TFSwinLayer to increase serving compatibility (#18352)
* Refactor `TFSwinLayer` to increase serving compatibility

Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>

* Fix missed parameters while refactoring

Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>

* Fix window_reverse to calculate batch size

Signed-off-by: Seunghwan Hong <harrydrippin@gmail.com>
Co-Authored-By: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2022-08-05 07:40:14 -04:00
575aa6ef1a Fix TFSwinSelfAttention to have relative position index as non-trainable weight (#18226)
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
2022-08-05 07:39:40 -04:00
586dcf6b21 Fixing issue where generic model types wouldn't load properly with the pipeline (#18392)
* Adding a better error message when the model is improperly configured

within transformers.

* Update src/transformers/pipelines/__init__.py

* Black version.

* Overriding task aliases so that tokenizer+feature_extractor

values are correct.

* Fixing task aliases by overriding their names early

* X.

* Fixing feature-extraction.

* black again.

* Normalizing `translation` too.

* Fixing last few corner cases.

translation need to use its non normalized name (translation_XX_to_YY,
so that the task_specific_params are correctly overloaded).
This can be removed and cleaned up in a later PR.

`speech-encode-decoder` actually REQUIRES to pass a `tokenizer` manually
so the error needs to be discarded when the `tokenizer` is already
there.

* doc-builder fix.

* Fixing the real issue.

* Removing dead code.

* Do not import the actual config classes.
2022-08-05 08:45:07 +02:00
14928921e2 Add TF_MODEL_FOR_SEMANTIC_SEGMENTATION_MAPPING (#18469)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-04 20:41:15 +02:00
0bf1e1aca4 Update no trainer examples for QA and Semantic Segmentation (#18474)
* swag_no_trainer updated for with gather_metrics

* Removed unused variable samples_seen

* updated examples with gather_for_metrics
2022-08-04 13:22:19 -04:00
d2704c4143 Add machine type in the artifact of Examples directory job (#18459)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-04 18:52:01 +02:00
f9a0008d2d Add VideoMAE (#17821)
* First draft

* Add VideoMAEForVideoClassification

* Improve conversion script

* Add VideoMAEForPreTraining

* Add VideoMAEFeatureExtractor

* Improve VideoMAEFeatureExtractor

* Improve docs

* Add first draft of model tests

* Improve VideoMAEForPreTraining

* Fix base_model_prefix

* Make model take pixel_values of shape (B, T, C, H, W)

* Add loss computation of VideoMAEForPreTraining

* Improve tests

* Improve model testsé

* Make all tests pass

* Add VideoMAE to main README

* Add tests for VideoMAEFeatureExtractor

* Add integration test

* Improve conversion script

* Rename patch embedding class

* Remove VideoMAELayer from init

* Update design of patch embeddings

* Improve comments

* Improve conversion script

* Improve conversion script

* Add conversion of pretrained model

* Add loss verification of pretrained model

* Add loss verification of unnormalized targets

* Add integration test for pretraining model

* Apply suggestions from code review

* Fix bug to make feature extractor resize only shorter edge

* Address more comments

* Improve normalization of videos

* Add doc examples

* Move constants to dedicated script

* Remove scripts

* Transfer checkpoints, fix docs

* Update script

* Update image mean and std

* Fix doc tests

* Set return_tensors to NumPy by default

* Revert the previous change

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-08-04 18:02:55 +02:00
672b66262a Add FX support for torch.baddbmm andd torch.Tensor.baddbmm (#18363) 2022-08-04 16:02:16 +02:00
df28de0581 Fix load of model checkpoints in the Trainer (#18470) 2022-08-04 08:22:25 -04:00
330247ede2 Update no trainer scripts for multiple-choice (#18468)
* swag_no_trainer updated for with gather_metrics

* Removed unused variable samples_seen
2022-08-04 07:29:32 -04:00
c74befc9e3 HFTracer.trace can now take callables and torch.nn.Module (#18457)
* Enable HFTracer to trace with custom dummy inputs instead of pre-computed ones

* Add HFTracer.trace docstring, and make it possible to handle callable and torch.nn.Module in general

* Remove pdb comment

* Apply suggestions
2022-08-04 13:29:18 +02:00
fc1d841b2d change shape to support dynamic batch input in tf.function XLA generate for tf serving (#18372)
* change shape to support dynamic batch input in tf.generate

* add tests

Co-authored-by: nlpcatcode <nlpcodecat@gmail.com>
2022-08-04 11:26:11 +01:00
b69a62d579 [BLOOM] Clean modeling code (#18344)
* Cleanup some code

* Improve signatures

* Try to reduce the number of reshape/copies

* I don't think we actually need the layer_num scaling trick

* No need for duplication

* Try to fix beam_search

* Fix beam search

* Removing layer num normalization seems to be breaking

* Not sure self.layer_number normalization actually matters

* Try and be backward compatible

* Try to fix beam_search

* Revert attempt to be backward compatible

* Improve documentation on past_key_values format

* Optimize the device allocation in case of hidden_states in multiple devices

* No need to manually cast the values to a specific device

* Rename with long version of variables

* Improve type hinting

* Add comment that explains that some methods return views

* Actually i think the attention casting only makes sense when we use torch.float16

* We don't actually need layer_number to be passed anymore

* Fix FX test

* Bypass torch.baddbmm

* Apply suggestions from code review

* Add comment about support for torchScript v1.11

* fix ONNX support for bloom (#18456)

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
2022-08-04 11:08:03 +02:00
02b176c4ce Fix torch version comparisons (#18460)
Comparisons like
version.parse(torch.__version__) > version.parse("1.6")
are True for torch==1.6.0+cu101 or torch==1.6.0+cpu

version.parse(version.parse(torch.__version__).base_version) are preferred (and available in pytorch_utils.py
2022-08-03 13:37:18 -04:00
be41eaf55f fix: keras fit tests for segformer tf and minor refactors. (#18412)
* fix: keras fit tests for segformer tf and minor refactors.

* refactor: test_keras_fit to make it simpler using the existing one.

* fix: styling issues.
2022-08-03 16:39:54 +01:00
fc546332d7 add zero-shot obj detection notebook to docs (#18453) 2022-08-03 17:14:39 +03:00
8fb7c908c8 Fix failing tests for XLA generation in TF (#18298)
* Fix failing test_xla_generate_slow tests

* Fix failing speech-to-text xla_generate tests
2022-08-03 09:45:15 -04:00
a507908cd3 Update pinned hhub version (#18448)
* Update pinned hhub version

* Make style
2022-08-03 08:37:42 -04:00
3db4378bd7 Update no trainer scripts for language modeling and image classification examples (#18443)
* Update no_trainer script for image-classification

* Update no_trainer scripts for language-modeling examples

* Remove unused variable

* Removing truncation from losses array for language modeling examples
2022-08-03 08:33:18 -04:00
10e1ec9a8c Add Spanish translation of run_scripts.mdx (#18415)
* Add file in spanish docs to be translated

* Translate first two sections to Spanish

* Translate four additional sections to Spanish

* Finish translation to Spanish

* Improve writing style in Spanish

* Add suggested changes from reviewer
2022-08-03 07:32:20 -04:00
9d7b70bcd7 support ONNX export of XDropout in deberta{,_v2} and sew_d (#17502)
* support ONNX export of XDropout in deberta{,_v2}

* black

* copy to sew_d

* add test

* isort

* use pytest.mark.filterwarnings

* review comments
2022-08-03 06:33:44 -04:00
92915ebec2 Update _toctree.yml (#18440)
This PR moves GroupViT and LXMert to their correct sections. As pointed out by @NielsRogge and @LysandreJik, GroupViT and LXMert are both multimodal models.
2022-08-03 12:26:01 +02:00
22a0dd2ef7 fixing error when using sharded ddp (#18435) 2022-08-03 08:39:58 +05:30
5096a654b7 Add programming languages (#18434)
The current wording makes it sound as if the programming languages are part of the 46 natural languages.
2022-08-02 16:02:25 -04:00
042f420364 Update pipeline word heuristic to work with whitespace in token offsets (#18402)
* Update pipeline word heuristic to work with whitespace in token offsets

This change checks for whitespace in the input string at either the
character preceding the token or in the first character of the token.
This works with tokenizers that return offsets excluding whitespace
between words or with offsets including whitespace.

fixes #18111

starting

* Use smaller model, ensure expected tokenization

* Re-run CI (please squash)
2022-08-02 15:31:01 -04:00
c382ed8a2f Accept trust_remote_code and ignore it in PreTrainedModel.from_pretrained (#18428)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-02 21:03:59 +02:00
dbd9641c8c Improve generate docstring (#18198)
* improve generate docstring

* Remove 'defaults to None' comment
2022-08-02 13:22:55 -04:00
5546fb61ab fix run_clip README (#18332)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-02 19:14:46 +02:00
2959d09072 Fix test_load_default_pipelines_tf test error (#18422)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-02 18:51:10 +02:00
8ae7784256 update maskformer docs (#18423)
* update maskformer docs

* fix typo
2022-08-02 18:43:58 +03:00
0b8c1b6994 Change audio kwarg to images in TROCR processor (#18421)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-02 15:04:45 +02:00
dd21fb378f Fix the hub user name in a longformer doctest checkpoint (#18418)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-02 15:04:10 +02:00
68a894a587 Fix uninitialized parameter in conformer relative attention. (#18368)
`torch.Tensor` creates an unitialized tensor (as via `torch.empty`), this leads to undeterministic behavior, poor initialization, and nans if you have unlucky init. The paper does not specify the initialization for bias terms, so I guess zero seems like a good choice - no bias initially. `torch.Tensor` is usually populated with zeros, so this fix will be close to the intended behavior:

```
>>> torch.Tensor(100, 100).sum()
tensor(0.)
>>> torch.Tensor(100, 100).sum()
tensor(nan)
>>> torch.Tensor(100, 100).sum()
tensor(0.)
```
2022-08-02 10:34:10 +01:00
df5e4232f5 fix: create a copy for tokenizer object (#18408) 2022-08-01 15:32:12 -04:00
24845aeb6d Layoutlmv2 tesseractconfig (#17733)
* Added option for users to modify config parameter used by pytesseract during feature extraction

- Added optional 'tess_config' kwarg when setting up LayoutLMV2 processor that is used by pytesseract during feature extraction
- Eg. Can be used to modify psm values by setting tess_config to '--psm 7'
- Different psm values significantly influences the output of layoutlmv2

* Update src/transformers/models/layoutlmv2/feature_extraction_layoutlmv2.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/layoutlmv2/feature_extraction_layoutlmv2.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Updated variable names to be more explicit

* Fixed styles

* Added option for users to modify config parameter when calling pytesseract during feature extraction

- Added option to set "tesseract_config" parameter during LayoutLMV3 processor initialization
- Can be used to modify PSM values, eg. by setting tesseract_config="--psm 6"

* Removed  from function signature

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-08-01 12:24:43 -04:00
151a2aaa4e Split model list on modality (#18328)
* 📝 split up model list

* Adapt script to reorg

* apply niels feedback

Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
2022-08-01 11:10:20 -05:00
01db72abd4 Rewrite push_to_hub to use upload_files (#18366)
* Rewrite push_to_hub to use upload_files

* Adapt the doc a bit

* Address review comments and clean doc
2022-08-01 12:07:30 -04:00
3909d7f139 Add Flax BART pretraining script (#18297)
* add bart pretraining flax script

* fixup

* add bart pretraining flax script

* add BART to README

* add BART to README

* add BART to README

* add BART to README

* add BART to README

* add bos eos document

* Update README.md

* Update README.md

* Update examples/flax/language-modeling/run_bart_dlm_flax.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* final

* final

* final

* remove use_auth_token ing from_config

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2022-08-01 12:06:30 -04:00
941d233153 Fix ROUGE add example check and update README (#18398)
* Fix ROUGE add example check and update README

* Stay consistent in values
2022-08-01 11:14:49 -04:00
62098b9348 Adding fine-tuning models to LUKE (#18353)
* add LUKE models for downstream tasks

* add new LUKE models to docs

* fix typos

* remove commented lines

* exclude None items from tuple return values
2022-08-01 11:09:47 -04:00
7b9e995b70 Fix docs (#18399)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-08-01 17:02:51 +02:00
e0bc4c73e8 Add balanced strategies for device_map in from_pretrained (#18349)
* Add balanced strategies for device_map in from_pretrained

* Add safeguards for Accelerate version

* Update src/transformers/modeling_utils.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Style

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2022-08-01 10:28:26 -04:00
39e76d76fd Fix doc tests (#18397)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-08-01 15:56:10 +02:00
1141371103 Fix OPT doc tests (#18365) 2022-08-01 15:19:45 +02:00
af1e6b4d87 Add evaluate to test dependencies (#18396) 2022-08-01 08:55:44 -04:00
bd6d1b4300 Add a check regarding the number of occurrences of ``` (#18389)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-08-01 14:23:02 +02:00
1cd7c6f154 Fix from_pretrained kwargs passing (#18387)
Fix #18385
I don't know whether `use_auth_token`, `cache_dir` and `local_files_only` should be passed to `(cls.slow_tokenizer_class)._from_pretrained`, but I guess it should.
2022-08-01 08:16:24 -04:00
96b5d7db9c Remove pt-like calls on tf tensor (#18393) 2022-08-01 13:06:30 +01:00
679d68a11b Correct the spelling of bleu metric (#18375) 2022-08-01 07:51:27 -04:00
1f84399171 Migrate metric to Evaluate in Pytorch examples (#18369)
* Migrate metric to Evaluate in pytorch examples

* Remove unused imports
2022-08-01 07:40:25 -04:00
25ec12eaf7 Bump mistune from 0.8.4 to 2.0.3 in /examples/research_projects/lxmert (#18370)
Bumps [mistune](https://github.com/lepture/mistune) from 0.8.4 to 2.0.3.
- [Release notes](https://github.com/lepture/mistune/releases)
- [Changelog](https://github.com/lepture/mistune/blob/master/docs/changes.rst)
- [Commits](https://github.com/lepture/mistune/compare/v0.8.4...v2.0.3)

---
updated-dependencies:
- dependency-name: mistune
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-01 04:46:57 -04:00
a7360385f4 Bump mistune in /examples/research_projects/visual_bert (#18371)
Bumps [mistune](https://github.com/lepture/mistune) from 0.8.4 to 2.0.3.
- [Release notes](https://github.com/lepture/mistune/releases)
- [Changelog](https://github.com/lepture/mistune/blob/master/docs/changes.rst)
- [Commits](https://github.com/lepture/mistune/compare/v0.8.4...v2.0.3)

---
updated-dependencies:
- dependency-name: mistune
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-01 04:46:31 -04:00
b2e4b091f0 fix FSDP ShardedGradScaler (#18358)
renaming it
2022-07-30 10:07:56 +05:30
51227e26ab Fix TFSegformerForSemanticSegmentation doctest (#18362)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-29 16:30:59 +02:00
4e2f4a92dd [FX] Symbolic trace for Bloom (#18356)
* Bloom model can now be traced

* Bloom traced model can be torch scripted and serialized

* Bloom can be traced with variable keyword arguments

* Enable XLNet support

* Disable XLNet for now
2022-07-29 16:12:27 +02:00
1763770bd9 Fix some doctests (#18359)
* Fix some doctests

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-29 14:13:28 +02:00
986526a0e4 Replace as_target context managers by direct calls (#18325)
* Preliminary work on tokenizers

* Quality + fix tests

* Treat processors

* Fix pad

* Remove all uses of  in tests, docs and examples

* Replace all as_target_tokenizer

* Fix tests

* Fix quality

* Update examples/flax/image-captioning/run_image_captioning_flax.py

Co-authored-by: amyeroberts <amy@huggingface.co>

* Style

Co-authored-by: amyeroberts <amy@huggingface.co>
2022-07-29 08:09:09 -04:00
a64bcb564d Fix OwlViT torchscript tests (#18347)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-29 10:36:04 +02:00
a4ee463d95 [Docs] Fix Speech Encoder Decoder doc sample (#18346)
* [Docs] Fix Speech Encoder Decoder doc sample

* improve pre-processing comment

* make style
2022-07-29 09:11:28 +01:00
da503ea02f Migrate metrics used in flax examples to Evaluate (#18348)
Currently, tensorflow examples use the `load_metric` function from
Datasets library, commit migrates function call to `load` function
from Evaluate library.
2022-07-28 15:06:23 -04:00
a2586795e5 Migrate metric to Evaluate library for tensorflow examples (#18327)
* Migrate metric to Evaluate library in tf examples

Currently tensorflow examples use `load_metric` function from Datasets
library , commit migrates function call to `load` function to
Evaluate library.

Fix for #18306

* Migrate metric to Evaluate library in tf examples

Currently tensorflow examples use `load_metric` function from Datasets
library , commit migrates function call to `load` function to
Evaluate library.

Fix for #18306

* Migrate `metric` to Evaluate for all tf examples

Currently tensorflow examples use `load_metric` function from Datasets
library , commit migrates function call to `load` function to
Evaluate library.
2022-07-28 14:24:27 -04:00
7b0908769b [BLOOM] Deprecate position_ids (#18342) 2022-07-28 20:21:43 +02:00
9c336657a9 Include tensorflow-aarch64 as a candidate (#18345)
Co-authored-by: Ankur Goyal <ankur@impira.com>
2022-07-28 12:45:02 -04:00
b53dab601c Remove Flax OPT from doctest for now (#18338)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-28 11:50:44 -04:00
286a18fa00 Fix codeparrot deduplication - ignore whitespaces (#18023)
* ignore whitspaces for hash

* reformat code

* Update README.md
2022-07-28 15:58:26 +02:00
5d1fed0740 Update automatic_speech_recognition.py (#18339) 2022-07-28 09:53:03 -04:00
985c7e3ac9 Updated _toctree.yml (#18337) 2022-07-28 09:04:32 -04:00
a8e279579b updated translation (#18333)
Left the term fine-tuning since there is no correct translation into Italian and the English term is generally used. The same was done with some terms like "learning rate"
2022-07-28 08:14:15 -04:00
1e380c7dcb fixed typo (#18331) 2022-07-28 06:14:56 -04:00
96be1b7f49 Update feature extractor docs (#18324)
As pointed out by @NielsRogge, a feature extractor is used to prepare inputs for a model with a single modality rather than multimodal models.
2022-07-27 15:32:57 -05:00
2b81f72be9 start from 1.12, torch_ccl is renamed as oneccl_bindings_for_pytorch … (#18229)
* start from 1.12, torch_ccl is renamed as oneccl_bindings_for_pytorch and should import it before use

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add doc for perf_train_cpu_many

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update doc

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2022-07-27 11:15:41 -04:00
e87ac9d18b Add swin transformer v2 (#17469)
* Add files generated using transformer-cli add-new-model-like command

* Add changes for swinv2 attention and forward method

* Add fixes

* Add modifications for weight conversion and remaining args in swin model

* Add changes for patchmerging

* Add changes for SwinV2selfattention

* Update conversion script

* Add final fixes for the swin_v2 model

* Add changes for conversion script for pretrained window size case

* Add pretrained window size value from config in SwinV2Encoder class

* Make fixup

* Add swinv2 to models_not_in_readme to utils/check_copies.py

* Modify Swinv2v2 to Swin Transformer V2

* Remove copied from, to run make fixup command

* Add updates to swinv2tf from main branch

* Add pretrained_window_size to config, to make tests pass

* Add modified weights from nandwalritik profile for swinv2

* Update model weights from swinv2 from nandwalritik profile

* Add fix for build_pr_documentation CI fix

* Add fixes for weight conversion

* Add change to make input with padding work

* Add fixes for test cases

* Add few changes from swin to swinv2 to pass test cases

* Remove tests for tensorflow as swinv2 for TF is not added yet

* Overide test_pt_tf_model_equivalence function as TF implementation for swinv2 is not added yet

* Add modeling_tf_swinv2 to _ignore_modules as test file is removed for this one right now.

* Update docs url for swinv2 in README.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Undo changes for check_repo

* Update url in readme.md

* Remove overrided function to test pt_tf_model_equivalence

* Remove TF model imports for Swinv2 as its not implemented in this PR

* Add changes for index.mdx

* Add swinv2 papers link,abstract and contributors details

* Rename cpb_mlp to continous_position_bias_mlp

* Add tips for swinv2 model

* Update src/transformers/models/swinv2/configuration_swinv2.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/swinv2/configuration_swinv2.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Fix indentation for docstring example in src/transformers/models/swinv2/configuration_swinv2.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update import order in src/transformers/models/swinv2/configuration_swinv2.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Add copyright statements in weights conversion script.

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Remove Swinv2 from models_not_in_readme

* Reformat code

* Remove TF implementation file for swinv2

* Update start docstring.

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Add changes for docstring

* Update orgname for weights to microsoft

* Remove to_2tuple function

* Add copied from statements wherever applicable

* Add copied from to Swinv2ForMaskedImageModelling class

* Reformat code.

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Add unittest.skip(with reason.) for test_inputs_embeds test case.

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Add updates for test_modeling_swinv2.py

* Add @unittest.skip() annotation for clarity to create_and_test_config_common_properties function

* Add continuous_position_bias_mlp parameter to conversion script

* Add test for testing masked_image_modelling for swinv2

* Update Swinv2 to Swin Transformer v2 in docs/source/en/model_doc/swinv2.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update Swinv2 to Swin Transformer v2 in docs/source/en/model_doc/swinv2.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/swinv2.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/swinv2.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Add suggested changes

* Add copied from to forward methods of Swinv2Stage and Swinv2Encoder

* Add push_to_hub flag to weight conversion script

* Change order or Swinv2DropPath class

* Add id2label mapping for imagenet 21k

* Add updated url for SwinV2 functions and classes used in implementation

* Update input_feature dimensions format, mentioned in comments.

Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

* Add suggested changes for modeling_swin2.py

* Update docs

* Remove create_and_test_config_common_properties function, as test_model_common_attributes is sufficient.

* Fix indentation.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add changes for making Nit objects in code style

* Add suggested changes

* Add suggested changes for test_modelling_swinv2

* make fix-copies

* Update docs/source/en/model_doc/swinv2.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-27 11:14:47 -04:00
c89a592e87 Dev version 2022-07-27 17:13:57 +02:00
7490a97cac [Flax] Fix incomplete batches in example scripts (#17863)
* [Flax] Fix incomplete batches in example scripts

* fix dataloader batching

* convert jnp batch idxs to np array

* add missing `pad_shard_unpad` to final prediction generate step

* only `pad_shard_unpad` at inference time

* merge conflicts

* remove incomplete batch step from eval

* fix run_qa.py

* add `pad_shard_unpad` to run_flax_ner.py

* add `pad_shard_unpad` to run_flax_glue.py

* add `pad_shard_unpad` to run_image_classification.py

* make style

* fix mlm flax eval batches

* remove redundant imports
2022-07-27 15:50:47 +01:00
9caf68a638 Owlvit test fixes (#18303)
* fix owlvit test assertion errors

* fix gpu test error

* remove redundant lines

* fix styling
2022-07-27 17:26:27 +03:00
0077360d67 Fix sacremoses sof dependency for Transformers XL (#18321)
* Fix sacremoses sof dependency for Transofmers XL

* Add function to the submodule init
2022-07-27 09:37:02 -04:00
5c5676cdf9 sentencepiece shouldn't be required for the fast LayoutXLM tokenizer (#18320) 2022-07-27 09:09:32 -04:00
cf32b2ee42 Remove all uses of six (#18318)
* Remove all uses of six

* fix quality
2022-07-27 08:39:09 -04:00
170fcaa604 Generalize decay_mask_fn to apply mask to all LayerNorm params (#18273)
* generalize decay_mask_fn to find all layernorm params

* fixup

* generalising decay_mask_fn
2022-07-27 12:23:57 +01:00
83d2d74509 fix loading from pretrained for sharded model with `torch_dtype="auto" (#18061) 2022-07-27 07:20:35 -04:00
7996ef74dd fix module order (#18312)
- put gelu before 4h to h
2022-07-27 07:06:01 -04:00
70e7d1d656 Fixes torch jit tracing for LayoutLMv2 model (re-open) (#18313)
* Fixes torch jit tracing for LayoutLMv2 model.
Pytorch seems to reuse memory for input_shape which caused a mismatch in shapes later in the forward pass.

* Fixed code quality

* avoid unneeded allocation of vector for shape
2022-07-27 06:38:40 -04:00
1d71ad8905 Update CodeParrot readme to include training in Megatron (#17798)
* add info about megatron training

* upload models and datasets from CodeParrot organization

* upload models and datasets from CodeParrot organization

* Update examples/research_projects/codeparrot/README.md

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Update examples/research_projects/codeparrot/README.md

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Update examples/research_projects/codeparrot/README.md

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Update examples/research_projects/codeparrot/README.md

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Update examples/research_projects/codeparrot/README.md

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* fix typo and add comment about codeparrot vs megatron

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
2022-07-27 11:59:08 +02:00
d5610b53fa [XLA] Improve t5 model performance (#18288) 2022-07-27 10:44:14 +02:00
e318cda9ee Apply type correction to TFSwinModelOutput (#18295)
Signed-off-by: Seunghwan Hong <seunghwan@scatterlab.co.kr>
2022-07-27 04:35:56 -04:00
ccd4180f8a [EncoderDecoder] Improve docs (#18271)
* Improve docs

* Improve docs of speech one as well

* Apply suggestions from code review

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-07-27 10:08:59 +02:00
5dfec704da Remove duplicated line (#18310)
Removes a duplicated instantiation of device. I removed the second instance of the line to maintain code alignment with the GPT-J implementation of forward.
2022-07-27 04:00:47 -04:00
47c2af0951 [DETR] Improve code examples (#18262)
* Improve doc test

* Improve code example of segmentation model

* Apply suggestion

* Update src/transformers/models/detr/modeling_detr.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-27 09:54:41 +02:00
ee67e7ad4f patch for smddp import (#18244)
* add import

* format
2022-07-26 16:00:24 -04:00
68097dcce0 Fix Sylvain's nits on the original KerasMetricCallback PR (#18300)
* Fix Sylvain's nits on the original PR

* Update src/transformers/keras_callbacks.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Re-add "optional" to docstring

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-26 17:08:16 +01:00
6649133124 Add PYTEST_TIMEOUT for CircleCI test jobs (#18251)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-26 17:57:59 +02:00
a5d504834d Add Spanish translation of custom_models.mdx (#17807)
* Update index

* Translate to Spanish two sections from custom_models

* Translate to Spanish custom models documentation

* Fixing typos and grammatical errors

* Add requested changes from reviewer
2022-07-26 10:10:37 -04:00
7ea7eba39d Add Italian translation of sharing_custom_models.mdx (#17631)
* work in progress: custom_models

* Update custom_models.mdx

* Update custom_models.mdx

* Update _toctree.yml

* Update _toctree.yml

* Update custom_models.mdx

* Update custom_models.mdx

* Update _toctree.yml

* Update _toctree.yml

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-26 09:48:58 -04:00
c4c6b4dbda Add PyTorch 1.11 to past CI (#18302)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-26 15:47:23 +02:00
bbc28106e0 Add Italian translation of converting_tensorflow_models.mdx (#18283)
* Add Italian translation of converting_tensorflow_models.mdx

* Update _toctree.yml

* Update converting_tensorflow_models.mdx

* Update docs/source/it/_toctree.yml

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-26 08:37:34 -04:00
a649de5551 Raise a TF-specific error when importing Torch classes (#18280)
* Raise a TF-specific error when importing Torch classes

* Update src/transformers/utils/import_utils.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Add an inverse error for PyTorch users

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2022-07-26 13:28:59 +01:00
5e0ffd9183 [ create_a_model.mdx ] translate to pt (#18098)
* [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial

* Delete docs/source/pt-br directory

* [ fast_tokenizers.mdx ] - Continuing work on file

* [ fast_tokenizers.mdx ] - Continuing work on file

* Add fast tokenizers to _toctree.yml

* Eliminated config and toctree.yml

* Nits in fast_tokenizers.mdx

* Finishing create_a_model

* [ create_a_model.mdx ] finishing create a model in pt-br

* [ Changing _toctree.yml ] adding create a model in pt

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
2022-07-26 08:01:08 -04:00
f58b9c0522 Update translation.mdx (#18169)
* Update translation.mdx

* update translation.mdx by running make style
2022-07-26 07:56:40 -04:00
b51695274a Add TFAutoModelForImageClassification to pipelines.py (#18292)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-26 13:44:54 +02:00
f374d3918f Adding type hints of TF:OpenAIGPT (#18263) 2022-07-26 12:30:06 +01:00
5bb211be6e Adding type hints of TF:CTRL (#18264) 2022-07-26 12:27:02 +01:00
c8ed1b8b59 Replace false parameter by a buffer (#18259) 2022-07-26 13:02:58 +02:00
2844c5de10 Fix ORTTrainer failure on gpt2 fp16 training (#18017)
* Ensure value and attn weights have the same dtype

* Remove prints

* Modify decision transformers copied from gpt2

* Nit device

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Fix style

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2022-07-26 04:14:08 -04:00
2b09650885 Add ViltForTokenClassification e.g. for Named-Entity-Recognition (NER) (#17924)
* Add ViltForTokenClassification e.g. for Named-Entity-Recognition (NER)

* Add ViltForTokenClassification e.g. for Named-Entity-Recognition (NER)

* provide classifier only text hidden states

* add test_for_token_classification

* Update src/transformers/models/vilt/modeling_vilt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/vilt/modeling_vilt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/vilt/modeling_vilt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/vilt/modeling_vilt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* add test_for_token_classification

Co-authored-by: gfuchs <gfuchs@ebay.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-07-26 10:11:32 +02:00
002915aa2a Owlvit docs test (#18257)
* fix docs and add owlvit docs test

* fix minor bug in post_process, add to processor

* improve owlvit code examples

* fix hardcoded image size
2022-07-26 10:55:14 +03:00
d32558cc7a Good difficult issue override for the stalebot (#18094) 2022-07-26 03:39:14 -04:00
f65307e498 Fix dtype of input_features in docstring (#18258)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-26 09:34:06 +02:00
bd87480d20 Fix command of doc tests for local testing (#18236)
* Fix command of doc tests for local testing

* Fix command for after running doc tests locally
2022-07-26 03:07:11 -04:00
45a1475462 Fix TF bad words filter with XLA (#18286)
* Fix bad words filter in XLA generation

* Remove my cool debug breakpoints (again)
2022-07-25 20:19:39 +01:00
f4e172716b Allows KerasMetricCallback to use XLA generation (#18265)
* Allows `KerasMetricCallback` to use XLA generation

* make fixup

* Slightly reword docstring
2022-07-25 12:51:37 +01:00
bbb62f2924 Skip passes report for --make-reports (#18250)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-25 11:09:23 +02:00
7e44226fc7 Generate: deprecate default max_length (#18018) 2022-07-23 18:02:03 +01:00
8e8384663d Update serving code to enable saved_model=True (#18153)
* Add serving_output and serving methods to some vision models

* Add serving outputs for DeiT

* Don't convert hidden states - differing shapes

* Make saveable

* Fix up

* Make swin saveable

* Add in tests

* Fix funnel tests (can't convert to tensor)

* Fix numpy call

* Tidy up a bit

* Add in hidden states - resnet

* Remove numpy

* Fix failing tests - tensor shape and skipping tests

* Remove duplicated function

* PR comments - formatting and var names

* PR comments
Add suggestions made by Joao Gante:
* Use tf.shape instead of shape_list
* Use @tooslow decorator on tests
* Simplify some of the logic

* PR comments
Address Yih-Dar Sheih comments - making tensor names consistent and make types float

* Types consistent with docs; disable test on swin (slow)

* CI trigger

* Change input_features to float32

* Add serving_output for segformer

* Fixup

Co-authored-by: Amy Roberts <amyeroberts@users.noreply.github.com>
2022-07-22 18:05:38 +01:00
07505358ba Change how take_along_axis is computed in DeBERTa to stop confusing XLA (#18256)
* Change how `take_along_axis` is computed in DeBERTa to stop confusing XLA

* Greatly simplify take_along_axis() since the code wasn't using most of it
2022-07-22 17:01:30 +01:00
d95a32cc60 Fix torch version check in Vilt (#18260)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-22 16:24:49 +02:00
7cb4da13fe change bloom parameters to 176B (#18235) 2022-07-22 10:17:48 -04:00
1fc4b2a132 TF: use the correct config with (...)EncoderDecoder models (#18097) 2022-07-22 13:31:45 +01:00
4935409757 Add Italian translation of create_model.mdx and serialization.mdx (#17640)
* First commit

* final changes

* Changed create_model to create_a_model
Translated into crea un'architettura personalizzata in the file it/_toctree.yml

* Added _toctree.yml in the italian translation loca: serialization title Esporta modelli transformers

* Edit translation for create_model.mdx

* t with '#' will be ignored, and an empty message aborts the commit.

* Added file serialization for translation in italian

* Fix toctree serialization position

I checked the eng toctree and realized I made a mistake.

* Update _toctree.yml

Correct spacing

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-22 13:53:54 +02:00
06d98e272e Fix OwlViT tests (#18253)
* Fix OwlViT tests

* Forgot one
2022-07-22 13:32:19 +02:00
12d66b4701 Add OWL-ViT model for zero-shot object detection (#17938)
* add owlvit model skeleton

* add class and box predictor heads

* convert modified flax clip to pytorch

* fix box and class predictors

* add OwlViTImageTextEmbedder

* convert class and box head checkpoints

* convert image text embedder checkpoints

* add object detection head

* fix bugs

* update conversion script

* update conversion script

* fix q,v,k,out weight conversion conversion

* add owlvit object detection output

* fix bug in image embedder

* fix bugs in text embedder

* fix positional embeddings

* fix bug in inference mode vision pooling

* update docs, init tokenizer and processor files

* support batch processing

* add OwlViTProcessor

* remove merge conflicts

* readd owlvit imports

* fix bug in OwlViTProcessor imports

* fix bugs in processor

* update docs

* fix bugs in processor

* update owlvit docs

* add OwlViTFeatureExtractor

* style changes, add postprocess method to feature extractor

* add feature extractor and processor tests

* add object detection tests

* update conversion script

* update config paths

* update config paths

* fix configuration paths and bugs

* fix bugs in OwlViT tests

* add import checks to processor

* fix docs and minor issues

* fix docs and minor issues

* fix bugs and issues

* fix bugs and issues

* fix bugs and issues

* fix bugs and issues

* update docs and examples

* fix bugs and issues

* update conversion script, fix positional embeddings

* process 2D input ids, update tests

* fix style and quality issues

* update docs

* update docs and imports

* update OWL-ViT index.md

* fix bug in OwlViT feature ext tests

* fix code examples, return_dict by default

* return_dict by default

* minor fixes, add tests to processor

* small fixes

* add output_attentions arg to main model

* fix bugs

* remove output_hidden_states arg from main model

* update self.config variables

* add option to return last_hidden_states

* fix bug in config variables

* fix copied from statements

* fix small issues and bugs

* fix bugs

* fix bugs, support greyscale images

* run fixup

* update repo name

* merge OwlViTImageTextEmbedder with obj detection head

* fix merge conflict

* fix merge conflict

* make fixup

* fix bugs

* fix bugs

* add additional processor test
2022-07-22 13:35:32 +03:00
99eb9b523f Fix no_trainer CI (#18242)
* Fix all tests
2022-07-21 14:44:57 -04:00
561b9a8c00 [SegFormer] TensorFlow port (#17910)
* add: segformer utils and img. classification.

* add: segmentation layer.

* feat: working implementation of segformer.

* chore: remove unused variable.

* add test, remaining modifications.

* remove: unnecessary files.

* add: rest of the files.

Co-authored-by: matt <rocketknight1@gmail.com>

* chore: remove ModuleList comment.

* chore: apply make style.

* chore: apply make fixup-copies.

* add  to check_repo.py

* add decode head to IGNORE_NON_TESTED

* chore: run make style.

* chore: PR comments.

* chore: minor changes to model doc.

* tests: reduction across samples.

* add a note on the space.

* sort importats.

* fix: reduction in loss computation.

* chore: align loss function with that of NER.

* chore: correct utils/documentation_tests.txt

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* chore: simplify the interpolation of logits in loss computation.

* chore: return transposed logits when return_dict=False.

* chore: add link to the tf fine-tuning repo.

* address pr comments.

* address niels's comments.

* remove from_pt=True since tf weights are in.

* remove comment from pt model.

* address niels's comments.

Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2022-07-21 18:22:37 +01:00
2c5747edfe Update notification service (#17921)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-21 15:03:50 +02:00
07575e869d Italian/accelerate (#17698)
* Add 'accelerate' to _toctree file

* Fix 'training with a nb' title

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-21 14:23:47 +02:00
8881e58b22 Italian/model sharing (#17828)
* Add Italian translation of the doc file model_sharing.mdx

* Fix style

* Fix typo

* Update docs/source/it/_toctree.yml

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-21 14:07:53 +02:00
0d971be84f Italian translation of run_scripts.mdx gh-17459 (#17642)
* Run_scripts Italian translation gh-17459

* Updated run_scripts gh-17642

* Updated run_scripts gh-17642

Made the text more gender-neutral.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-21 12:02:08 +02:00
ba552dd027 Make errors for loss-less models more user-friendly (#18233) 2022-07-21 11:52:33 +02:00
43a5375cc1 Fix TrainingArguments help section (#18232) 2022-07-21 11:03:25 +02:00
9f787ce874 Translation/debugging (#18230)
* added debugging.mdx

* updated debugging.mdx

* updated translation

* updated translation debugging

* translated debugging

* updated _toctree.yml
2022-07-21 11:02:26 +02:00
5e2f2d7dd2 Better messaging and fix for incorrect shape when collating data. (#18119)
* More informative error message

* raise dynamic error

* remove_excess_nesting application

* incorrect shape assertion for collator & function to remove excess nesting from DatasetDict

* formatting

* eliminating datasets import

* removed and relocated remove_excess_nesting to the datasets library and updated docs accordingly

* independent assert instructions

* inform user of excess nesting
2022-07-21 10:35:41 +02:00
d23cf5b1f1 Add support for Sagemaker Model Parallel >= 1.10 new checkpoint API (#18221)
* Add support for Sagemaker Model Parallel >= 1.10 new checkpoint API

* Support loading checkpoints saved with SMP < 1.10 in SMP < 1.10 and SMP >= 1.10

* Support loading checkpoints saved with SMP >= 1.10 in SMP >= 1.10

* Fix bug and styling

* Update based on reviewer feedback
2022-07-21 07:56:20 +02:00
dbfeffd7c9 Update add_new_pipeline.mdx (#18224)
fix typo
2022-07-21 07:55:30 +02:00
ff56b8fbff Add custom config to quicktour (#18115)
* 📝 first draft of new quicktour

* make style

* 🖍 edit and review

* 🖍 small fixes

* 🖍 only add custom config section

* 🖍 use autoclass instead
2022-07-20 12:23:03 -05:00
9edff45362 skip some test_multi_gpu_data_parallel_forward (#18188)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-20 15:54:44 +02:00
bc6fe6fbcf Change to FlavaProcessor in PROCESSOR_MAPPING_NAMES (#18213)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-20 12:30:14 +02:00
dcec4c4387 Adding OPTForSeqClassification class (#18123)
* Adding OPTForSeqClassification class

* Fix import issues

* Add documentation for optforseqclassification

* Remove checkout

* fix failing tests

* fix typo

* Fix code formatting

* Incorporating the PR feedbacks

* Incorporate PR Feedbacks

* Fix failing test and add new test for multi label setup

* Fix formatting issue

* Fix failing tests

* Fix formatting issues

* Fix failing tests

* Fix failing tests

* Fix failing tests

* Fix failing tests

* PR feedback
2022-07-20 10:14:21 +02:00
0ed4d0dfb6 Fix LayoutXLM docstrings (#17038)
* Fix docstrings

* Fix legacy issue

* up

* apply suggestions

* up

* quality
2022-07-20 09:49:57 +02:00
4b1ed7979f update cache to v0.5 (#18203)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-20 08:14:10 +02:00
8a61fe0234 Reduce console spam when using the KerasMetricCallback (#18202)
* Reduce console spam when using the KerasMetricCallback

* Switch to predict_on_batch to improve performance
2022-07-19 17:00:35 +01:00
ec6cd7633f TF: Add missing cast to GPT-J (#18201)
* Fix TF GPT-J tests

* add try/finally block
2022-07-19 15:58:42 +01:00
05ed569c79 Use next-gen CircleCI convenience images (#18197)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-19 15:43:05 +02:00
9f12ec7d87 Typo in readme (#18195) 2022-07-19 15:28:37 +02:00
dc9147ff36 Custom pipeline (#18079)
* Initial work

* More work

* Add tests for custom pipelines on the Hub

* Protect import

* Make the test work for TF as well

* Last PyTorch specific bit

* Add documentation

* Style

* Title in toc

* Bad names!

* Update docs/source/en/add_new_pipeline.mdx

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Auto stash before merge of "custom_pipeline" and "origin/custom_pipeline"

* Address review comments

* Address more review comments

* Update src/transformers/pipelines/__init__.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2022-07-19 12:02:35 +02:00
3bb6356d4d [From pretrained] Allow download from subfolder inside model repo (#18184)
* add first generation tutorial

* [from_pretrained] Allow loading models from subfolders

* remove gen file

* add doc strings

* allow download from subfolder

* add tests

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply comments

* correct doc string

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-19 11:53:53 +02:00
ce0152819d Update docs README with instructions on locally previewing docs (#18196)
* Update docs README with instructions on locally previewing docs

* Add instructions to install `watchdog` before previewing the docs
2022-07-19 11:47:26 +02:00
798384467b bugfix: div-->dim (#18135) 2022-07-19 10:24:56 +02:00
e630dad555 Add vision example to README (#18194) 2022-07-19 09:46:18 +02:00
4bea6584e3 Remove use_auth_token from the from_config method (#18192)
* remove use_auth_token from from_config

* restore use_auth_token from_pretrained run_t5_mlm_flax
2022-07-19 08:13:20 +02:00
29fd471556 Use smaller variant of BLOOM for doc to fix tests 2022-07-18 15:17:29 -04:00
bc8e30bab9 FSDP integration enhancements and fixes (#18134)
* FSDP integration enhancements and fixes

* resolving comments

* fsdp fp16 mixed precision requires `ShardedGradScaler`
2022-07-19 00:02:10 +05:30
8e445ca51d Translation/training: italian translation training.mdx (#17662)
* added training.mdx

* updated training.mdx

* updated training.mdx

* updated training.mdx

* updated _toctree.yml

* fixed typos after review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-18 19:21:07 +02:00
6a1b1bf7a6 BLOOM minor fixes small test (#18175)
* minor fixes

- add correct revision
- corrected dosctring for test
- removed a test

* contrib credits

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
2022-07-18 19:18:19 +02:00
c4cc894086 Translation italian: multilingual.mdx (#17768)
* added multilingual.mdx

* updated multilingual.mdx

* italian translation multilingual.mdx

* updated _toctree.yml

* fixed typos _toctree.yml

* fixed typos after review

* fixed error after review
2022-07-18 19:09:08 +02:00
0a5b61d004 Added preprocessing.mdx italian translation (#17600)
* updated _toctree.yml

* added preprocessing

* updated preprocessing.mdx

* updated preprocessing.mdx

updated after review
2022-07-18 19:06:10 +02:00
ced1f1f5db fix typo inside bloom documentation (#18187) 2022-07-18 17:43:52 +02:00
edadfc58af Better default for offload_state_dict in from_pretrained (#18183) 2022-07-18 16:02:41 +02:00
aeeab1ffd0 Fix template for new models in README (#18182) 2022-07-18 16:01:51 +02:00
45255814a2 FIX: Typo (#18156) 2022-07-18 15:46:08 +02:00
6561fbcc6e Update TF(Vision)EncoderDecoderModel PT/TF equivalence tests (#18073)
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-18 15:29:14 +02:00
cb19c2afdc Fix expected loss values in some (m)T5 tests (#18177)
* fix expected loss values

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-18 15:26:21 +02:00
7417f3acb7 [HPO] update to sigopt new experiment api (#18147)
* [HPO] update to sigopt new experiment api
* follow https://docs.sigopt.com/experiments

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* [HPO] use new API if sigopt version >= 8.0.0

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2022-07-18 15:19:40 +02:00
8c14b342aa add ONNX support for LeVit (#18154)
Co-authored-by: Guilhem Chéron <guilhemc@authentifier.com>
2022-07-18 15:17:07 +02:00
c1c79b0655 NLLB tokenizer (#18126)
* NLLB tokenizer

* Apply suggestions from code review - Thanks Stefan!

Co-authored-by: Stefan Schweter <stefan@schweter.it>

* Final touches

* Style :)

* Update docs/source/en/model_doc/nllb.mdx

Co-authored-by: Stefan Schweter <stefan@schweter.it>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* PR reviews

* Auto models

Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-18 08:12:34 -04:00
a4f97e6ce0 Fix incorrect type hint for lang (#18161) 2022-07-18 09:53:18 +02:00
c46d39f390 Fix check for falsey inputs in run_summarization (#18155) 2022-07-18 09:50:32 +02:00
ccc0897804 Adding support for device_map directly in pipeline(..) function. (#17902)
* Adding support for `device_map` directly in `pipeline(..)` function.

* Updating the docstring.

* Adding a better docstring

* Put back type hints.

* Blacked. (`make fixup` didn't work ??!!)
2022-07-15 15:54:26 +02:00
fca66ec4ef Fixing a hard to trigger bug for text-generation pipeline. (#18131)
* Fixing a bug where attention mask was not passed to generate.

* Fixing zero-size prompts.

* Comment on top.
2022-07-15 15:54:07 +02:00
8581a798c0 Add TF DeiT implementation (#17806)
* Initial TF DeiT implementation

* Fix copies naming issues

* Fix up + docs

* Properly same main layer

* Name layers properly

* Initial TF DeiT implementation

* Fix copies naming issues

* Fix up + docs

* Properly same main layer

* Name layers properly

* Fixup

* Fix import

* Fix import

* Fix import

* Fix weight loading for tests whilst not on hub

* Add doc tests and remove to_2tuple

* Add back to_2tuple
Removing to_2tuple results in many downstream changes needed because of the copies checks

* Incorporate updates in Improve vision models #17731 PR

* Don't hard code num_channels

* Copy PyTorch DeiT embeddings and remove pytorch operations with mask

* Fix patch embeddings & tidy up

* Update PixelShuffle to move logic into class layer

* Update doc strings - remove PT references

* Use NHWC format in internal layers

* Fix up

* Use linear activation layer

* Remove unused import

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Move dataclass to top of file

* Remove from_pt now weights on hub

* Fixup

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Amy Roberts <amyeroberts@users.noreply.github.com>
2022-07-13 18:04:08 +01:00
Wei
7ea6ccc2b3 Enable torchdynamo with torch_tensorrt(fx path) (#17765)
* enable fx2trt

* Update perf_train_gpu_one.mdx

* Update perf_train_gpu_one.mdx

* add lib check

* update

* format

* update

* fix import check

* fix isort

* improve doc

* refactor ctx manager

* fix isort

* black format

* isort fix

* fix format

* update args

* update black

* cleanups

* Update perf_train_gpu_one.mdx

* code refactor

* code refactor to init

* remove redundancy

* isort

* replace self.args with args

Co-authored-by: Stas Bekman <stas@stason.org>
2022-07-13 12:43:28 -04:00
37aeb5787a Make sharded checkpoints work in offline mode (#18125)
* Make sharded checkpoints work in offline mode

* Add test
2022-07-13 12:43:08 -04:00
0a21a48564 Revert "Make sharded checkpoints work in offline mode"
This reverts commit 3564c6578630a3bef29d2c7c36c7d29b68acd874.
2022-07-13 10:53:25 -04:00
3564c65786 Make sharded checkpoints work in offline mode 2022-07-13 10:51:56 -04:00
56e6487c40 add dataset split and config to model-index in TrainingSummary.from_trainer (#18064)
* added metadata to training summary

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-13 16:07:20 +02:00
fde22c75a1 Add summarization name mapping for MultiNews (#18117)
* Add summarization name mapping for MultiNews

* Add summarization name mapping for MultiNews
2022-07-13 08:19:20 -04:00
195133363e supported python versions reference (#18116)
* supported python versions reference

* Update CONTRIBUTING.md

removing commit hash from link

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-13 08:18:44 -04:00
20509ab0e0 TF: unpack_inputs decorator independent from main_input_name (#18110) 2022-07-13 10:43:41 +01:00
fcefa200b2 TF: remove graph mode distinction when processing boolean options (#18102) 2022-07-12 19:05:31 +01:00
bc34c21191 Fix BLOOM dtype (#17995)
* Add fp16 option

* Fix BLOOM dtype

* Formatting

* Remove torch_dtype arg

* Revert formatting

* Apply formatting

* Add n_embed backward compat
2022-07-12 10:36:08 -04:00
981714efe1 CLI: reenable pt_to_tf test (#18108) 2022-07-12 13:38:05 +01:00
f5221c06e4 Report value for a step instead of epoch. (#18095)
* Report value for a step instead of epoch.

Report an objective function value for a step instead of epoch to optuna.
I made this modification for the following reason:
If "eval_steps" is less than steps per epoch, there maybe warnings like this: "optuna/trial/_trial.py:592: UserWarning: The reported value is ignored because this `step` 0 is already reported.". So "step" are more appropriate than "epoch" here.

* MOD: make style.

Co-authored-by: zhaowei01 <zhaowei01@yuanfudao.com>
2022-07-12 08:18:35 -04:00
d4ebd4e112 speed up test (#18106) 2022-07-12 04:28:28 -04:00
b7d8bd378c Enhance IPEX integration in Trainer (#18072)
* enhance ipex import

* refine codes

* refine style

* add link

* style

Co-authored-by: Stas Bekman <stas@stason.org>
2022-07-11 21:34:09 -07:00
a462fc9232 Bloom Optimize operations (#17866)
* fix tolerance for a bloom slow test

* enhance alibi padding

- get rid of for loops
- deals better with padded batched input
- avoid useless cpu/gpu communication when creating alibi

Co-authored-by: justheuristic <justheuristic@gmail.com>

* optimize attention mask

* fix scaled softmax limit values

* optimize building alibi tensor

Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* fix attention_mask shape when it's None

* minor fixes

- fix docstring + arg names

* remove colons in docstring

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* apply suggestion

* remove unsued arg

* refactor a bit

- use [:, None] for consistency

* refactor attention block

Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>

* quick fixes

* first attempt

* refactor attention block and fix all tests except "test_simple_generation"

- added comments to better explain attention block

* remove debug lines and add TODO comment

* change `torch.bmm` to `torch.baddbmm`
- fixes `test_simple_generation`but breaks `test_batch_generation_padd`

* styling

* all tests are passing now
- use `bmm`
- add explanation for `allow_fp16_reduced_precision_reduction`

Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* styling

Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* fix support for accelerate

Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove attn softmax in fp32

* refactor comments

* refactor a bit

- remove warning message
- remove print on test

* refer to pytorch t5

* change the slow tests

- do the tests in fp32
- remove some comments
- keep large comments

* update expected output for `test_simple_generation`
- we now test using fp32

* make style + change comments a bit

* fix dtype padd test

Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-11 13:16:13 -04:00
5ff6f853d7 Mark slow test as such 2022-07-11 12:48:57 -04:00
b1b8222d80 Add filename to info diaplyed when downloading things in from_pretrained (#18099) 2022-07-11 12:45:06 -04:00
6c8017a5c8 Fix image segmentation and object detection pipeline tests (#18100) 2022-07-11 12:41:56 -04:00
b0520f594c Skip failing tests 2022-07-11 10:16:54 -04:00
1e8140caad Fix RESOURCE_EXHAUSTED error when dealing with large datasets in Flax example scripts (#18069)
* Fix RESOURCE_EXHAUSTED error for large datasets on Flax example scripts

* using np.permutation for creating batch_idx

* train_samples_idx -> training_samples_idx

* fix type hints
2022-07-11 15:59:08 +02:00
ac98a88fbc Fix torchscript tests for GPT-NeoX (#18012)
* fix dtype issue in _attn

* fix RotaryEmbedding

* fix RotaryEmbedding 2

* clean up

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-11 05:02:54 -04:00
95113d1365 Fix some typos. (#17560)
* Fix some typos.

Signed-off-by: Yulv-git <yulvchi@qq.com>

* Fix typo.

Signed-off-by: Yulv-git <yulvchi@qq.com>

* make fixup.
2022-07-11 05:00:13 -04:00
ad28ca291b [bloom] fix alibi device placement (#18087) 2022-07-10 09:11:46 -07:00
8b332a6a16 Make predict() close progress bars after finishing (#17952) (#18078)
* Make Trainer.predict call on_evaluate (#17952)

* Add on_predict

* Small fix

* Small and different fix

* Add tests
2022-07-08 16:44:24 -04:00
7c046c5c22 Update localized READMES when template is filled. (#18062) 2022-07-08 11:08:52 -04:00
94ca7d2faa Fix type issue in using bucketing with Trainer (#18051)
* Fix type issue in using bucketing with Trainer

- Fix type issues in LengthGrouperSampler,
  DistributedLengthGroupedSampler

refs: #18003

* Change logging type in LengthGroupedSampler

- Change `logger.warning` to `logger.info`

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Change logging type in DistributedLengthGroupedSampler

- Change `logger.warning` to `logger.info`

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove adundant clause in LengthGroupedSampler

- Use `elif`

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove adundant clause in DistributedLengthGroupedSampler

- Use `elif`

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply black, isort to modified codes in the script

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-08 11:06:00 -04:00
9bd3968509 Fix slow CI by pinning resampy (#18077)
* Fix slow CI by pinning resampy

* Actually put it in the speech dependencies
2022-07-08 10:51:24 -04:00
de46cde14b Drop columns after loading samples in prepare_tf_dataset (#17967)
* Drop columns after loading samples, rather than before, to avoid breaking transforms

* make fixup

* Add workaround so this PR can work with current datasets version
2022-07-07 18:02:22 +01:00
2544c1434f [Generate Tests] Make sure no tokens are force-generated (#18053) 2022-07-07 15:08:34 +02:00
91c4a3ab1a Added Command for windows VENV activation in installation docs (#18008)
* Added command for windows VENV activation

* changed linux and macos  specification
2022-07-07 08:18:44 -04:00
1b749a7f8d Sort doc toc (#18034)
* Add script to sort doc ToC

* Style and fixes

* Add check to quality job
2022-07-07 08:17:58 -04:00
1b5ea74783 Place inputs on device when include_inputs_for_metrics is True (#18046) 2022-07-07 08:17:49 -04:00
870ff9e1da Skip failing test until @gante fix it. 2022-07-06 15:13:28 -04:00
2e90c3df8f Doc to dataset (#18037)
* Link to the Datasets doc

* Remove unwanted file
2022-07-06 12:10:06 -04:00
be79cd7d8e Protect TFGenerationMixin.seed_generator so it's not created at import (#18044) 2022-07-06 16:36:28 +01:00
360719a6a4 TF: GPT-J compatible with XLA generation (#17986) 2022-07-06 15:02:07 +01:00
bf37e5c7f6 Fix T5 incorrect weight decay in Trainer and official summarization example (#18002)
* Add ALL_LAYERNORM_LAYERS for LayerNorm

* fix bug of appending layer norm
2022-07-06 09:44:19 -04:00
22edb68d49 Squash commits (#17981)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-07-06 08:11:48 -04:00
f681437203 Enable Past CI (#17919)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-05 18:08:36 +02:00
5ae087cf8e Fix T5/mT5 tests (#18029) 2022-07-05 16:22:03 +01:00
ec07eccc7d [Flax] Bump to v0.4.1 (#17966) 2022-07-05 15:17:17 +01:00
97db5b4223 Update expected values in DecisionTransformerModelIntegrationTest (#18016)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-05 14:53:43 +02:00
f0982682bd TF: T5 can now handle a padded past (i.e. XLA generation) (#17969)
* get the right slicing index for position_bias
2022-07-04 19:47:43 +01:00
e3139ad301 fixed calculation of ctc loss in TFWav2Vec2ForCTC (#18014)
Co-authored-by: Sreyan-G@NVIDIA <sreyang@nvidia.com>
2022-07-04 17:36:36 +01:00
96d833b211 Return scalar losses instead of per-sample means (#18013)
* Return scalar losses instead of per-sample means

* Make loss shape (1,) instead of scalar

* Allow scalar losses in test_loss_computation

* Allow scalar losses in test_loss_computation

* Allow scalar losses in test_loss_computation

* Remove XLA loss function for RAG
2022-07-04 17:26:19 +01:00
6cb19540c9 sort list of models (#18011) 2022-07-04 09:20:55 -04:00
7498db06a1 Replace BloomTokenizer by BloomTokenizerFast in doc (#18005) 2022-07-04 08:40:13 -04:00
3cfdefaa4d Fix typo in error message in generation_utils (#18000) 2022-07-04 06:04:58 -04:00
cf2578ae00 Refactor to inherit from nn.Module instead of nn.ModuleList (#17501)
* Refactor to inherit from nn.Module instead of nn.ModuleList

* Fix typo

* Empty to trigger CI re-run

Blender Bot tests failing (should be unrelated to this PR) and pass locally). I don't have sufficient permisisons to re-run the CI workflow (totally or from failed)
2022-07-04 06:03:42 -04:00
77ea5130a1 Add TF ResNet model (#17427)
* Rought TF conversion outline

* Tidy up

* Fix padding differences between layers

* Add back embedder - whoops

* Match test file to main

* Match upstream test file

* Correctly pass and assign image_size parameter

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Add in MainLayer

* Correctly name layer

* Tidy up AdaptivePooler

* Small tidy-up

More accurate type hints and remove whitespaces

* Change AdaptiveAvgPool

Use the AdaptiveAvgPool implementation by @Rocketknight1, which correctly pools if the output shape does not evenly divide by input shape c.f. 9e26607e22 (r900109509)

Co-authored-by: From: matt <rocketknight1@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

* Use updated AdaptiveAvgPool

Co-authored-by: matt <rocketknight1@gmail.com>

* Make AdaptiveAvgPool compatible with CPU

* Remove image_size from configuration

* Fixup

* Tensorflow -> TensorFlow

* Fix pt references in tests

* Apply suggestions from code review - grammar and wording

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Add TFResNet to doc tests

* PR comments - GlobalAveragePooling and clearer comments

* Remove unused import

* Add in keepdims argument

* Add num_channels check

* grammar fix: by -> of

Co-authored-by: matt <rocketknight1@gmail.com>

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Remove transposes - keep NHWC throughout forward pass

* Fixup look sharp

* Add missing layer names

* Final tidy up - remove from_pt now weights on hub

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2022-07-04 10:59:15 +01:00
7b18702ca7 Add link to existing documentation (#17931) 2022-07-04 04:13:05 -04:00
a045cbd6c9 only a stupid typo, but it can lead to confusion (#17930) 2022-07-04 04:04:16 -04:00
49c8c67fb8 Exclude Databricks from notebook env only if the runtime is below 11.0 (#17988)
* Exclude Databricks from notebook env only if the runtime is below 11.0

* Dummy commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI

* Empty commit to trigger CI
2022-07-01 16:17:40 -04:00
6890d1960f Shifting labels for causal LM when using label smoother (#17987)
* Shifting labels for causal LM when using label smoother

When training CausalLM, loss is computed within model's foward() function and
labels are shifted internally. However, if label smoothing is applied, loss is
computed in trainer's compute_loss function and labels are not shifted.
This causes unintended confusion during the alignment of labels and corresponding
inputs. This commit is for resolving this confusion.

Resolves #17960

On branch shift_labels_for_causalLM
Changes to be committed:
	modified:   src/transformers/trainer.py
	modified:   src/transformers/trainer_pt_utils.py

* Update trainer.py

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-01 14:55:35 -04:00
6f0723a9be Restore original task in test_warning_logs (#17985)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-01 20:44:27 +02:00
009171d1ba Ensure PT model is in evaluation mode and lightweight forward pass done (#17970) 2022-07-01 19:33:47 +01:00
d6cec45801 XLA train step fixes (#17973)
* Copy inputs to train and test step before modifying them, as this breaks things

* Add XLA tests, fix our loss functions to be XLA-compatible

* make fixup

* Update loss computation test to expect vector of per-sample losses

* Patch loss for TFLED

* Patch loss for TFAlbert

* Add a tf_legacy_loss config flag that enables old loss functions

* Stop using config.get() because it's not a dict

* Skip loss computation test for RAG because its loss is very strange and I'm afraid to rewrite it

* make fixup

* Add XLA-compatible RAG loss

* Fix dtype of loss mask for TFAlbert

* Fix test for XLNet too because it overrides the default one

* make fixup

* Fix config test

* No more depending on GPU NaN behaviour

* Add test, avoid potential zero division

* Fix test item assignment

* Fix loss computation masking test

* make fixup

* Fix dtype bugs
2022-07-01 19:11:14 +01:00
485bbe79d5 [Flax] Add remat (gradient checkpointing) (#17843)
* [Flax] Add remat (gradient checkpointing)

* fix variable naming in test

* flip: checkpoint using a method

* fix naming

* fix class naming

* apply PVP's suggestions from code review

* make fix-copies

* fix big-bird, electra, roberta

* cookie-cutter

* fix flax big-bird

* move test to common
2022-07-01 18:33:54 +01:00
664688b94f higher atol to avoid flaky trainer test failure (#17979)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-01 17:53:16 +02:00
8bb2c387f4 Fix FlaxBigBirdEmbeddings (#17842)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-01 16:46:01 +02:00
b68d408f1b add ONNX support for BLOOM (#17961)
* add onnx support for BLOOM

* use TYPE_CHECKING for type annotations

* fix past_shape for bloom (different from gpt2)

* use logical_or instead of `+` for onnx support

* bigger `atol_for_validation` for larger bloom models

* copied -> taken because it's no longer an exact copy

* remove "copied from" comment

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-07-01 10:44:42 -04:00
462b7f3a94 fixing fsdp autowrap functionality (#17922)
* fixing fsdp autowrap functionality

* update version and quality

* update torch version to latest stable version
2022-07-01 19:40:55 +05:30
3a064bd4dd fix bias keyword argument in TFDebertaEmbeddings (#17940) 2022-07-01 14:48:43 +01:00
569b679adb Update expected values in CodeGen tests (#17888)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-01 15:33:36 +02:00
cb42502410 Fix typo in perf_train_gpu_one.mdx (#17983) 2022-07-01 09:19:13 -04:00
14fb8a63b9 skip some gpt_neox tests that require 80G RAM (#17923)
* skip some gpt_neox tests that require 80G RAM

* remove tests

* fix quality

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-07-01 09:04:38 -04:00
49cd736a28 feat: add pipeline registry abstraction (#17905)
* feat: add pipeline registry abstraction

- added `PipelineRegistry` abstraction
- updates `add_new_pipeline.mdx` (english docs) to reflect the api addition
- migrate `check_task` and `get_supported_tasks` from
  transformers/pipelines/__init__.py to
  transformers/pipelines/base.py#PipelineRegistry.{check_task,get_supported_tasks}

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* fix: update with upstream/main

chore: Apply suggestions from sgugger's code review

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* chore: PR updates

- revert src/transformers/dependency_versions_table.py from upstream/main
- updates pipeline registry to use global variables

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* tests: add tests for pipeline registry

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* tests: add test for output warning.

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* chore: fmt and cleanup unused imports

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* fix: change imports to top of the file and address comments

Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-06-30 12:11:08 -04:00
9cb7cef285 Add ONNX support for LayoutLMv3 (#17953)
* Add ONNX support for LayoutLMv3

* Update docstrings

* Update empty description in docstring

* Fix imports and type hints
2022-06-30 12:09:52 -04:00
fe14046421 skip some ipex tests until it works with torch 1.12 (#17964)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-30 18:05:29 +02:00
91e1f24ef3 CLI: convert sharded PT models (#17959)
* sharded conversion; add flag to control max hidden error

* better hidden name matching

* Add test: load TF from PT shards

* fix test (PT data must be local)
2022-06-30 16:51:03 +01:00
f25457b273 Fix number of examples for iterable dataset in distributed training (#17951) 2022-06-30 11:01:40 -04:00
e4d2588573 [Pipelines] Add revision tag to all default pipelines (#17667)
* trigger test failure

* upload revision poc

* Update src/transformers/pipelines/base.py

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* up

* add test

* correct some stuff

* Update src/transformers/pipelines/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* correct require flag

Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-06-30 16:37:18 +02:00
4f8361afe7 Unifying training argument type annotations (#17934)
* doc: Unify training arg type annotations

* wip: extracting enum type from Union

* blackening
2022-06-30 08:53:32 -04:00
205bc4152c Fix GPT-NeoX-20B past handling, attention computation (#17811)
* Fix GPT-NeoX-20B past handling, swap attention computation to hopefully avoid NaN, update docs

* 20B tests
2022-06-30 08:47:40 -04:00
692e61e91a Flax t5 Encoder (#17784)
* first draft adding Flax-t5-encoder and Flax-mt5-encoder

* imports

* after make fixup

* flax t5 encoder test

* black on test

* make fix-copies

* clean

* all_model_classes -> tuple

* clean test

* is_encoder_decoder=False in t5-enc tester

* remove file docstring before FlaxT5Encoder

* black

* isort

* commit suggestions on src/transformers/models/t5/modeling_flax_t5.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* commit suggestions on src/transformers/models/t5/modeling_flax_t5.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* remove _get_encoder_module

* self.decoder_seq_length -> self.encoder_seq_length as t5-enc does not have decoder

* bugfix - self.module_class is class itself, not instance;

* docs for mt5 and t5

* call -> __call__ in t5 doc

* FlaxMT5EncoderModel to TYPE_HINT

* run doc-builder to allow change the files

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2022-06-30 00:49:02 +02:00
eb1493b15d Fix #17893, removed dead code (#17917)
* Removed dead position_id code, fix #17893

* Removed unused var

* Now ignores removed (dead) dict key for backward comp
2022-06-29 17:54:26 -04:00
fbc7598bab add MobileViT model (#17354)
* add MobileViT

* fixup

* Update README.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* remove empty line

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* use clearer variable names

* rename to MobileViTTransformerLayer

* no longer inherit from nn.Sequential

* fixup

* fixup

* not sure why this got added twice

* rename organization for checkpoints

* fix it up

* Update src/transformers/models/mobilevit/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/configuration_mobilevit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/configuration_mobilevit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/configuration_mobilevit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/models/mobilevit/test_modeling_mobilevit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/modeling_mobilevit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/modeling_mobilevit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/modeling_mobilevit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mobilevit/modeling_mobilevit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* code style improvements

* fixup

* Update docs/source/en/model_doc/mobilevit.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/mobilevit/configuration_mobilevit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/mobilevit/configuration_mobilevit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* download labels from hub

* rename layers

* rename more layers

* don't compute loss in separate function

* remove some nn.Sequential

* replace nn.Sequential with new MobileViTTransformer class

* replace nn.Sequential with MobileViTMobileNetLayer

* fix pruning since model structure changed

* fixup

* fix doc comment

* remove custom resize from feature extractor

* fix ONNX import

* add to doc tests

* use center_crop from image_utils

* move RGB->BGR flipping into image_utils

* fix broken tests

* wrong type hint

* small tweaks

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-06-29 16:07:51 -04:00
5feac3d080 Fix prepare_tf_dataset when drop_remainder is not supplied (#17950) 2022-06-29 19:23:39 +01:00
bc019b0e5f ExplicitEnum subclass str (JSON dump compatible) (#17933)
* ExplicitEnum subclass str (JSON dump compatible)

* allow union if one of the types is str
2022-06-29 13:49:31 -04:00
b089cca347 PyTorch 1.12.0 for scheduled CI (#17949)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-29 19:32:19 +02:00
d444edb3f6 OPT - Fix Softmax NaN in half precision mode (#17437) 2022-06-29 19:15:32 +02:00
9fe2403bc5 Use explicit torch version in deepspeed CI (#17942)
* use explicit torch version

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-29 18:20:34 +02:00
4c722e9e22 fix regexes with escape sequence (#17943) 2022-06-29 08:55:22 -07:00
7c4c6f6084 Fix all is_torch_tpu_available issues (#17936)
* Fix all is_torch_tpu_available
2022-06-29 11:03:33 -04:00
77b76672e2 Fix img seg tests (load checkpoints from hf-internal-testing) (#17939)
* Revert "Skip failing test until they are fixed."

This reverts commit 8f400775fc5bc1011a2674dcfd5408d30d69f678.

* Use `tiny-detr` checkpts from `hf-internal-testing`
2022-06-29 10:19:37 -04:00
3cff4cc587 Add MVP model (#17787)
* Add MVP model

* Update README

* Remove useless module

* Update docs

* Fix bugs in tokenizer

* Remove useless test

* Remove useless module

* Update vocab

* Remove specifying

* Remove specifying

* Add #Copied ... statement

* Update paper link

* Remove useless TFMvp

* Add #Copied ... statement

* Fix style in test mvp model

* Fix some typos

* Fix properties of unset special tokens in non verbose mode

* Update paper link

* Update MVP doc

* Update MVP doc

* Fix README

* Fix typos in docs

* Update docs
2022-06-29 09:30:55 -04:00
8f400775fc Skip failing test until they are fixed. 2022-06-29 09:11:29 -04:00
47b9165109 Remove imports and use forward references in ONNX feature (#17926) 2022-06-29 09:02:53 -04:00
5cdfff5df3 Fix job links in Slack report (#17892)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-29 14:53:13 +02:00
a7eba83161 TF implementation of RegNets (#17554)
* chore: initial commit

Copied the torch implementation of regnets and porting the code to tf step by step. Also introduced an output layer which was needed for regnets.

* chore: porting the rest of the modules to tensorflow

did not change the documentation yet, yet to try the playground on the model

* Fix initilizations (#1)

* fix: code structure in few cases.

* fix: code structure to align tf models.

* fix: layer naming, bn layer still remains.

* chore: change default epsilon and momentum in bn.

* chore: styling nits.

* fix: cross-loading bn params.

* fix: regnet tf model, integration passing.

* add: tests for TF regnet.

* fix: code quality related issues.

* chore: added rest of the files.

* minor additions..

* fix: repo consistency.

* fix: regnet tf tests.

* chore: reorganize dummy_tf_objects for regnet.

* chore: remove checkpoint var.

* chore: remov unnecessary files.

* chore: run make style.

* Update docs/source/en/model_doc/regnet.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* chore: PR feedback I.

* fix: pt test. thanks to @ydshieh.

* New adaptive pooler (#3)

* feat: new adaptive pooler

Co-authored-by: @Rocketknight1

* chore: remove image_size argument.

Co-authored-by: matt <rocketknight1@gmail.com>

Co-authored-by: matt <rocketknight1@gmail.com>

* Empty-Commit

* chore: remove image_size comment.

* chore: remove playground_tf.py

* chore: minor changes related to spacing.

* chore: make style.

* Update src/transformers/models/regnet/modeling_tf_regnet.py

Co-authored-by: amyeroberts <aeroberts4444@gmail.com>

* Update src/transformers/models/regnet/modeling_tf_regnet.py

Co-authored-by: amyeroberts <aeroberts4444@gmail.com>

* chore: refactored __init__.

* chore: copied from -> taken from./g

* adaptive pool -> global avg pool, channel check.

* chore: move channel check to stem.

* pr comments - minor refactor and add regnets to doc tests.

* Update src/transformers/models/regnet/modeling_tf_regnet.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* minor fix in the xlayer.

* Empty-Commit

* chore: removed from_pt=True.

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: amyeroberts <aeroberts4444@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-06-29 13:45:14 +01:00
e6d27ca5c8 TF: XLA beam search + most generation-compatible models are now also XLA-generate-compatible (#17857)
* working beam search 🎉

* XLA generation compatible with ALL classes

* add xla generation slow test
2022-06-29 12:41:01 +01:00
b8142753f9 Add missing comment quotes (#17379) 2022-06-29 06:16:36 -04:00
e113c5cb64 Remove render tags (#17897)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-06-29 06:06:42 -04:00
90415475bb Fix the Conda package build (#16737)
* Fix the Conda package build

* Update build.sh

* Update release-conda.yml
2022-06-29 06:03:16 -04:00
babd7b1a92 Remove DT_DOUBLE from the T5 graph (#17891) 2022-06-29 10:23:49 +01:00
6aae59d0b5 Compute min_resolution in prepare_image_inputs (#17915)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-29 10:30:20 +02:00
776855c752 Fixing a regression with return_all_scores introduced in #17606 (#17906)
Fixing a regression with `return_all_scores` introduced in #17606

- The legacy test actually tested `return_all_scores=False` (the actual
  default) instead of `return_all_scores=True` (the actual weird case).

This commit adds the correct legacy test and fixes it.

Tmp legacy tests.

Actually fix the regression (also contains lists)

Less diffed code.
2022-06-28 17:24:45 -04:00
5f1e67a566 Pin PyTorch in requirements as well 2022-06-28 15:56:10 -04:00
5a3d0cbdda Pin PyTorch while we fix compatibility with 1.12 2022-06-28 15:07:26 -04:00
6c8f4c9a93 Adding GroupViT Models (#17313)
* add group vit and fixed test (except slow)

* passing slow test

* addressed some comments

* fixed test

* fixed style

* fixed copy

* fixed segmentation output

* fixed test

* fixed relative path

* fixed copy

* add ignore non auto configured

* fixed docstring, add doc

* fixed copies

* Apply suggestions from code review

merge suggestions

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* resolve comment, renaming model

* delete unused attr

* use fix copies

* resolve comments

* fixed attn

* remove unused vars

* refactor tests

* resolve final comments

* add demo notebook

* fixed inconsitent default

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* rename stage->stages

* Create single GroupViTEncoderLayer class

* Update conversion script

* Simplify conversion script

* Remove cross-attention class in favor of GroupViTAttention

* Convert other model as well, add processor to conversion script

* addressing final comment

* fixed args

* Update src/transformers/models/groupvit/modeling_groupvit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-06-28 20:51:47 +02:00
b424f0b4a3 Mrbean/codegen onnx (#17903) 2022-06-28 14:57:53 +02:00
76d13de5ae Add ONNX support for DETR (#17904) 2022-06-28 14:48:43 +02:00
bfcd5743ee In group_texts function, drop last block if smaller than block_size (#17908) 2022-06-28 08:34:55 -04:00
f71895a633 Move logic into pixelshuffle layer (#17899)
* Move all pixelshuffle logic into layer

* Rename layer

* Use correct input to function
2022-06-28 13:04:19 +01:00
0094565fc5 Fix loss computation in TFBertForPreTraining (#17898) 2022-06-28 12:44:56 +01:00
1dfa03f12b Pin black to 22.3.0 to benefit from a stable --preview flag (#17918) 2022-06-28 04:32:18 -04:00
9eec4e937e [M2M100] update conversion script (#17916) 2022-06-28 10:15:07 +02:00
db2644b9eb Fix PyTorch/TF Auto tests (#17895)
* add loading_info

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-28 08:56:24 +02:00
f717d47fe0 Fix test_number_of_steps_in_training_with_ipex (#17889)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-28 08:55:02 +02:00
0b0dd97737 Update expected values in constrained beam search tests (#17887)
* fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-28 08:53:53 +02:00
e02037b352 Fix bug in gpt2's (from-scratch) special scaled weight initialization (#17877)
* only special scale init each gpt2 c_proj weight once, on exact match

* fix double quotes

Co-authored-by: leandro <leandro.vonwerra@spoud.io>
2022-06-27 15:01:49 -04:00
6dd00f6bd4 Update README_zh-hans.md (#17861) 2022-06-27 13:09:20 -04:00
71b2839fd3 bert: add conversion script for BERT Token Dropping TF2 checkpoints (#17142)
* bert: add conversion script for BERT Token Dropping TF2 checkpoints

* bert: rename conversion script for BERT Token Dropping checkpoints

* bert: fix flake errors in BERT Token Dropping conversion script

* bert: make doc-builder happy!!1!11

* bert: fix pytorch_dump_path of BERT Token Dropping conversion script
2022-06-27 13:08:32 -04:00
98742829d3 Fix add new model like frameworks (#17869)
* Add new model like adds only the selected frameworks object in init

* Small fix
2022-06-27 13:07:34 -04:00
afb71b6726 Add type annotations for RoFormer models (#17878) 2022-06-27 14:50:43 +01:00
9a3453846b fix (#17890)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-27 14:36:11 +02:00
3ec7d4cfe4 fix mask (#17837) 2022-06-27 14:08:18 +02:00
ee0d001de7 Add a TF in-graph tokenizer for BERT (#17701)
* Add a TF in-graph tokenizer for BERT

* Add from_pretrained

* Add proper truncation, option handling to match other tokenizers

* Add proper imports and guards

* Add test, fix all the bugs exposed by said test

* Fix truncation of paired texts in graph mode, more test updates

* Small fixes, add a (very careful) test for savedmodel

* Add tensorflow-text dependency, make fixup

* Update documentation

* Update documentation

* make fixup

* Slight changes to tests

* Add some docstring examples

* Update tests

* Update tests and add proper lowercasing/normalization

* make fixup

* Add docstring for padding!

* Mark slow tests

* make fixup

* Fall back to BertTokenizerFast if BertTokenizer is unavailable

* Fall back to BertTokenizerFast if BertTokenizer is unavailable

* make fixup

* Properly handle tensorflow-text dummies
2022-06-27 12:06:21 +01:00
401fcca6c5 Fix TF GPT2 test_onnx_runtime_optimize (#17874)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-27 09:27:30 +02:00
cc5c061e34 CLI: handle multimodal inputs (#17839) 2022-06-25 16:17:11 +01:00
e8eb699ee8 Properly get tests deps in test_fetcher (#17870)
* Properly get tests deps in test_fetcher

* Remove print
2022-06-24 16:56:46 -04:00
b03be78a4b Fix test_inference_instance_segmentation_head (#17872)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-24 19:36:45 +02:00
494aac65a7 Skip test_multi_gpu_data_parallel_forward for MaskFormer (#17864)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-24 19:35:00 +02:00
0e0f1f4692 Use higher value for hidden_size in Flax BigBird test (#17822)
* Use higher value for hidden_size in Flax BigBird test

* remove 5e-5

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-24 19:31:30 +02:00
2ef94ee039 Fix: torch.utils.checkpoint import error. (#17849) 2022-06-24 13:23:29 -04:00
ef28a402a9 Add type hints for gptneox models (#17858)
* feat: Add type hints for GPTNeoxForCausalLM and GPTNeoXModel

* fix: removed imported Dict type

* fix: Removed unused List import
2022-06-24 17:12:36 +01:00
061a73d16f [CodeGen] support device_map="auto" for sharded checkpoints (#17871) 2022-06-24 18:06:30 +02:00
d6b6fb9963 Add CodeGen model (#17443)
* Add CodeGen model

* Add missing key and switch order of super()

* Fix torch.ones init with uint8 instead of bool

* Address comments: copy statements and doc

* update tests

* remove old model parallel

* fix batch gen tests

* fix batch gen test

* update test_gpt2_sample_max_time

* fix codgen test and revert gpt2 test change

* Fix incorrect tie_word_embedding value, typo, URL

* Fix model order in README and styling

* Reorder model list alphabetically

* Set tie_word_embedding to False by default

* Apply suggestions from code review

* Better attn mask name & remove attn masked_bias

* add tokenizer for codegen

* quality

* doc tokenizer

* fix-copies

* add CodeGenTokenizer in converter

* make truncation optional

* add test for truncation

* add copyright

* fix-copies

* fix fast tokenizer decode

* Update src/transformers/models/codegen/tokenization_codegen.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* increase vocab_size in tests

Co-authored-by: patil-suraj <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-06-24 17:10:38 +02:00
447490015a Fix Splinter test (#17854)
* fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-24 16:26:14 +02:00
73a0496c2f [tests/VisionEncoderDecoder] import to_2tuple from test utils (#17865) 2022-06-24 15:23:30 +02:00
NaN
bc7a6fdc02 Fix Constrained beam search duplication and weird output issue (#17814)
* fix(ConstrainedBeamSearchScorer.step_sentence_constraint): avoid hypothesis duplication between topk and advance

* fix(GenerationMixin.constrained_beam_search): appropriately assign beam scores instead of token scores
2022-06-24 14:56:08 +02:00
c2c0d9db5f Improve encoder decoder model docs (#17815)
* Copied all the changes from the last PR

* added in documentation_tests.txt

* Update docs/source/en/model_doc/encoder-decoder.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/encoder-decoder.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/encoder-decoder.mdx

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* Update docs/source/en/model_doc/encoder-decoder.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/encoder-decoder.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/encoder-decoder.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/encoder-decoder.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

Co-authored-by: vishwaspai <vishwas.pai@emplay.net>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2022-06-24 14:48:19 +02:00
0917870510 Improve vision models (#17731)
* Improve vision models

* Add a lot of improvements

* Remove to_2tuple from swin tests

* Fix TF Swin

* Fix more tests

* Fix copies

* Improve more models

* Fix ViTMAE test

* Add channel check for TF models

* Add proper channel check for TF models

* Apply suggestion from code review

* Apply suggestions from code review

* Add channel check for Flax models, apply suggestion

* Fix bug

* Add tests for greyscale images

* Add test for interpolation of pos encodigns

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-06-24 11:34:51 +02:00
893ab12452 Auto-build Docker images before on-merge if setup.py was changed (#17573)
* Auto-build on setup modification

* Modify push-caller

* Make adjustments based on code review
2022-06-23 16:51:33 -04:00
75259b44bf Properly calculate the total train iterations and recalculate num epochs in no_trainer scripts (#17856) 2022-06-23 15:46:01 -04:00
7c1b91281f Index RNG states by global rank in saves (#17852) 2022-06-23 12:53:50 -04:00
7cf52a49de Nezha Pytorch implementation (#17776)
* wip

* rebase

* all tests pass

* rebase

* ready for PR

* address comments

* fix styles

* add require_torch to pipeline test

* remove remote image to improve CI consistency

* address comments; fix tf/flax tests

* address comments; fix tf/flax tests

* fix tests; add alias

* repo consistency tests

* Update src/transformers/pipelines/visual_question_answering.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* address comments

* Update src/transformers/pipelines/visual_question_answering.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* merge

* wip

* wip

* wip

* most basic tests passes

* all tests pass now

* relative embedding

* wip

* running make fixup

* remove bert changes

* fix doc

* fix doc

* fix issues

* fix doc

* address comments

* fix CI

* remove redundant copied from

* address comments

* fix broken test

Co-authored-by: Sijun He <sijunhe@Sijuns-MacBook-Pro.local>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-06-23 12:36:22 -04:00
acb709d551 Change no trainer image_classification test (#17635)
* Adjust test arguments and use a new example test
2022-06-23 11:11:16 -04:00
e70abdad1b Update modeling_cvt.py (#17846)
As shown in the colab notebook I added the missing type hints for " CvtForImageClassification
CvtModel
"
2022-06-23 16:08:36 +01:00
1a7ef3349f Fix broken test for models with batchnorm (#17841)
* Fix tests that broke when models used batchnorm

* Initializing the model twice does not actually...
...give you the same weights each time.
I am good at machine learning.

* Fix speed regression
2022-06-23 15:59:53 +01:00
18c263c4b6 BLOOM minor changes on tokenizer (#17823)
* few fixes:

- hardcode tokenizer padding side
- remove unused args

* few fixes:

- added new attribute on TokenizerTesterMixin
- added new slow test
- remove unused arg on tokenizer class

* make style

* Update src/transformers/models/bloom/tokenization_bloom_fast.py

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* make quality

* apply changes

- remove new attribute
- redefine test on the class

* add comments

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
2022-06-23 15:57:12 +02:00
6f29029b05 Improve performance docs (#17750)
* add skeleton files

* fix cpu inference link

* add hint to make clear that single gpu section contains general info

* add new files to ToC

* update toctree to have subsection for performance

* add "coming soon" to the still empty sections

* fix missing title

* fix typo

* add reference to empty documents

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2022-06-23 14:51:54 +02:00
5bc779ae28 Fix an error message in BigBird (#17840)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-23 14:43:53 +02:00
3eed5530ec Fix properties of unset special tokens in non verbose mode (#17797)
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
2022-06-23 14:40:13 +02:00
b2fdbaccdd change message (#17836) 2022-06-23 14:39:48 +02:00
d37a68e685 Add missing type hints for QDQBertModel (#17783)
* Feat: add missing type hints for QDQBertModel

* fix: ran black and isort

* feat: Add missing output type for QDQBertModel

* feat: Add type hints for QDQBertLMHeadModel and models starting with QDQBertFor

* fix: add missing return type for QDQBertModel

* fix: remove wrong return type for QDQBertEmbeddings

* fix: readded config argument to load_tf_weights_in_qdqbert

* fix: add BertConfig type to BertEmbeddings config due t checko error in ci

* fix: removed config type hints to avoid copy checks
2022-06-23 12:58:43 +01:00
4297f44b63 Update type hints modeling_yoso.py (#17827)
* Update modeling_yoso.py

* make fixup

* Update modeling_yoso.py

That should be it copied from previous PR
2022-06-23 12:37:29 +01:00
5cce3076c4 TF: generate without tf.TensorArray (#17801) 2022-06-23 12:28:08 +01:00
ab223fc148 add doctests for DETR (#17786)
* add: check labels for detr object detection doctests

* add: check shapes

* add: add detr to documentation_tests.py

* fix: make fixup output

* fix: add a comment
2022-06-23 13:26:14 +02:00
8d634b70e0 Fix push CI artifact path (#17788)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-23 12:31:22 +02:00
df8e6804c0 Offload fixes (#17810)
* Offload fixes

* Add a test
2022-06-22 12:23:07 -04:00
0d0c392c45 CLI: use hub's create_commit (#17755)
* use create_commit

* better commit message and description

* touch setup.py to trigger cache update

* add hub version gating
2022-06-22 16:50:21 +01:00
c366ce1011 Bump numpy from 1.21.0 to 1.22.0 in /examples/research_projects/lxmert (#17817)
Bumps [numpy](https://github.com/numpy/numpy) from 1.21.0 to 1.22.0.
- [Release notes](https://github.com/numpy/numpy/releases)
- [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst)
- [Commits](https://github.com/numpy/numpy/compare/v1.21.0...v1.22.0)

---
updated-dependencies:
- dependency-name: numpy
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-22 09:29:40 -04:00
af0d21e741 Bump numpy in /examples/research_projects/visual_bert (#17816)
Bumps [numpy](https://github.com/numpy/numpy) from 1.21.0 to 1.22.0.
- [Release notes](https://github.com/numpy/numpy/releases)
- [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst)
- [Commits](https://github.com/numpy/numpy/compare/v1.21.0...v1.22.0)

---
updated-dependencies:
- dependency-name: numpy
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-22 09:29:28 -04:00
56b83cf049 initial commit (#17818) 2022-06-22 14:26:03 +02:00
1357038164 Add logits_processor parameter, used by generate, to Seq2SeqTrainer methods evaluate and predict (#17805)
* Add logits_processor parameter, used by `generate`, to `Seq2SeqTrainer` methods `evaluate` and `predict`

* Add all generate parameters to `Seq2SeqTrainer`, and also to `QuestionAnsweringSeq2SeqTrainer` which overrides it

* Remove `self._num_beams` from trainer classes

* - Run fixup
- Fix "Constraint" not exposed
- Fix synced_gpus to actually read from param

* Use kwargs

* Copy kwargs before making changes to it

* Fix style issues unused imports
2022-06-22 08:11:39 -04:00
16c6eb7ca1 Flax sharded (#17760) 2022-06-22 07:04:35 +02:00
3b00b623b7 Fix top_k_top_p_filtering having unexpected behavior (#17744)
- Fix `top_k_top_p_filtering` not passing `filter_value` to
   `TopPLogitsWarper` causing any top-p filtered logits to be -inf
   instead of specified value

 - Add corresponding test
2022-06-21 21:35:55 +02:00
3ccff0d400 Remove duplicate code (#17708) 2022-06-21 21:30:40 +02:00
26a6a42608 Improve error message Union not allowed (#17769)
* Improve error message Union not allowed

* make style

* Update src/transformers/hf_argparser.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-06-21 14:27:01 -04:00
abc400b06a Add final_layer_norm to OPT model (#17785)
* Add final_layer_norm to OPT model

* Add JAX and TF version

* Fix Keras name

* Woops

* Allow for non breaking change

* Apply suggestions from code review

* add tests

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-06-21 20:26:36 +02:00
52404cbad4 Properly check for a TPU device (#17802) 2022-06-21 13:39:55 -04:00
ef23fae596 Fix test for BF16 detection (#17803) 2022-06-21 18:31:15 +02:00
7cced021fa TF Sharded (#17713)
* initial commit

* update modeeling tf utils

* quality

* clean and update args

* update

* remove potential bug

* code quality

* update

* update max shard

* update tests for sharding from pretrained

* fix remaining test

* make style

* h5py if tf available

* update and fix test

* fix test

* style

* modified push to hub to support shard for TF

* quick fix

* update code

* merge branch main and style

* Apply suggestions from code review

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* update based on reviews

* update doc

* update and style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update based on reviews

* fix typo

* style

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-06-21 18:01:08 +02:00
f47afefb21 Use 5e-5 For BigBird PT/Flax equivalence tests (#17780)
* rename to check_pt_flax_outputs

* update check_pt_flax_outputs

* use 5e-5 for BigBird PT/Flax test

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-21 17:55:26 +02:00
6a5272b205 Prepare transformers for v0.8.0 huggingface-hub release (#17716)
* Prepare CI for v0.8.0

* pin hfh (revert before merge)

* Revert "pin hfh (revert before merge)"

This reverts commit a0103140e1c77b810ffcb735192968bc03be3e1f.

* Test rc3

* Test latest rc

* Unpin to the RC

Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
2022-06-21 11:51:18 -04:00
7bc88c0511 Fix forward reference imports in DeBERTa configs (#17800) 2022-06-21 11:21:06 -04:00
27e907386a Fix Automatic Download of Pretrained Weights in DETR (#17712)
* added use_backbone_pretrained

* style fixes

* update

* Update detr.mdx

* Update detr.mdx

* Update detr.mdx

* update using doc py

* Update detr.mdx

* Update src/transformers/models/detr/configuration_detr.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-06-21 16:45:35 +02:00
b681e12d59 [ViTMAE] Fix docstrings and variable names (#17710)
* Fix docstrings and variable names

* Rename x to something better

* Improve messages

* Fix docstrings and add test for greyscale images

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-06-21 15:56:00 +02:00
3fab17fce8 Add link to notebook (#17791)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-06-21 14:53:08 +02:00
da2bd2ae96 [CodeParrot] Near-deduplication with jaccard similarity (#17054)
* deduplication draft

* update style

* update style test

* dummy test main

* rename modules

* rename functions

* return extremes in deduplicate_clusters

* update style

* cast str for gzip

* update doc string

* time processing

* use dataset map to compute minhash

* fill value for short token

* remove da map method

* update style

* use share object to multiprocess

* update style

* use f-string and minor fix

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Loubna Ben Allal <44069155+loubnabnl@users.noreply.github.com>

* update style

* use module parameters

* change ds_dedup to ds_filter

* save ds_dedup

* mv test to script tests

* make jaccard threshold a parameter of deduplicate_dataset

* update style

* add doc strings

* update style

* add doc string for DuplicationIndex

* save files into data dir

* update readme

* Update examples/research_projects/codeparrot/README.md

Co-authored-by: Loubna Ben Allal <44069155+loubnabnl@users.noreply.github.com>

* make near deduplication optional

* move near deduplication in README

* Update examples/research_projects/codeparrot/README.md

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* use f string

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Loubna Ben Allal <44069155+loubnabnl@users.noreply.github.com>
2022-06-21 14:23:36 +02:00
eb16be415a add onnx support for deberta and debertav2 (#17617)
* add onnx support for debertav2

* debertav2 -> deberta-v2 in onnx features file

* remove causal lm

* add deberta-v2-xlarge to onnx tests

* use self.type().dtype() in xsoftmax

Co-authored-by: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>

* remove hack for deberta

* remove unused imports

* Update src/transformers/models/deberta_v2/configuration_deberta_v2.py

Co-authored-by: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>

* use generate dummy inputs

* linter

* add imports

* add support for deberta v1 as well

* deberta does not support multiple choice

* Update src/transformers/models/deberta/configuration_deberta.py

Co-authored-by: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>

* Update src/transformers/models/deberta_v2/configuration_deberta_v2.py

Co-authored-by: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>

* one line ordered dict

* fire build

Co-authored-by: Jingya HUANG <44135271+JingyaHuang@users.noreply.github.com>
2022-06-21 11:04:15 +02:00
8fcbe275c3 Add UL2 (just docs) (#17740)
* Add UL2
Co-authored-by: Daniel Hesslow <Daniel.Hesslow@gmail.com>

* Correct naming

* sort better

* up

* apply sylvains suggestion
2022-06-21 10:24:50 +02:00
da27c4b398 Update modeling_longt5.py (#17777)
On line 180, `torch.tensor(-1.0, xxx)` gives the error "TypeError: 'float' object cannot be interpreted as an integer" 
This is because the dtype here is `int64`.  For `dtype=int64`, this needs to simply be `-1`.  
This impacts the long-t5-tglogbal-x model.  It does not impact the long-t5-local-x version which does not appear to call this line.
2022-06-20 18:49:08 +02:00
d3cb28886a Not use -1e4 as attn mask (#17306)
* Use torch.finfo(self.dtype).min

* for GPTNeoX

* for Albert

* For Splinter

* Update src/transformers/models/data2vec/modeling_data2vec_audio.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix -inf used in Bart-like models

* Fix a few remaining -inf

* more fix

* clean up

* For CLIP

* For FSMT

* clean up

* fix test

* Add dtype argument and use it for LayoutLMv3

* update FlaxLongT5Attention

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-06-20 16:16:16 +02:00
fdb120805c Fix cache for GPT-Neo-X (#17764)
* Fix cache for GPT-Neo-X

* Add more tests
2022-06-20 08:43:36 -04:00
a2d34b7c04 deprecate is_torch_bf16_available (#17738)
* deprecate is_torch_bf16_available

* address suggestions
2022-06-20 08:40:11 -04:00
132402d752 TF: BART compatible with XLA generation (#17479)
* Also propagate changes to blenderbot, blenderbot_small, marian, mbart, and pegasus
2022-06-20 11:07:46 +01:00
6589e510fa Attempt to change Push CI to workflow_run (#17753)
* Use workflow_run event for push CI

* change to workflow_run

* Add comments

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-18 08:35:03 +02:00
0d92798b45 Added translation of index.mdx to Portuguese Issue #16824 (#17565)
* Added translation of installation.mdx to Portuguese, as well
as default templates of _toctree.yml and _config.py

* [ build_documentation.yml ] - Updated doc_builder to build
documentation in Portuguese.
[ pipeline_tutorial.mdx ] - Created translation for the pipeline_tutorial.mdx.

* [ build_pr_documentation.yml ] - Added pt language to pr_documentation builder.

[ pipeline_tutorial.mdx ] - Grammar changes.

* [ accelerate.mdx ] - Translated to Portuguese the acceleration tutorial.

* [ multilingual.mdx ] - Added portuguese translation for multilingual tutorial.

[ training.mdx ] - Added portuguese translation for training tutorial.

* [ preprocessing.mdx ] - WIP

* Update _toctree.yml

* Adding Pré-processamento to _toctree.yml

* Update accelerate.mdx

* Nits and eliminate preprocessing file while it is ready

* [ index.mdx ] - Translated to Portuguese the index apresentation page.

* [ docs/source/pt ] - Updated _toctree.yml to match newest translations.

* Fix build_pr_documentation.yml

* Fix index nits

* nits in _toctree

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
2022-06-17 20:06:05 -04:00
522a9ece4b Save huggingface checkpoint as artifact in mlflow callback (#17686)
* Fix eval to compute rouge correctly for rouge_score

* styling

* moving sentence tokenization to utils from run_eval

* saving ckpt in mlflow

* use existing format of args

* fix documentation

Co-authored-by: Swetha Mandava <smandava@nvidia.com>
2022-06-17 14:14:03 -04:00
21a772426d Migrate HFDeepSpeedConfig from trfrs to accelerate (#17623)
* Migrate HFDeepSpeedConfig from trfrs to accelerate

* add `accelerate` to testing dep

* addressing comments

* addressing comments

Using `_shared_state` and avoiding object creation. This is necessary as `notebook_launcher` in `launcers.py` checks `len(AcceleratorState._shared_state)>0` to throw an error.

* resolving comments

1. Use simple API from accelerate to manage the deepspeed config integration
2. Update the related documentation

* reverting changes and addressing comments

* docstring correction

* addressing nits

* addressing nits

* addressing nits 3

* bumping up the accelerate version to 0.10.0

* resolving import

* update setup.py to include deepspeed dependencies

* Update dependency_versions_table.py

* fixing imports

* reverting changes to CI dependencies for "run_tests_pipelines_tf*" tests

These changes didn't help with resolving the failures and I believe this needs to be addressed in another PR.

* removing `accelerate` as hard dependency

Resolves issues related to CI Tests

* adding `accelerate` as dependency for building docs

resolves failure in Build PR Documentation test

* adding `accelerate` as dependency in "dev" to resolve doc build issue

* resolving comments

1. adding `accelerate` to extras["all"]
2. Including check for accelerate too before import HFDeepSpeedConfig from there

Co-Authored-By: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* resolving comments

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-06-17 23:29:35 +05:30
e44a569fef Bump notebook in /examples/research_projects/lxmert (#17743)
Bumps [notebook](http://jupyter.org) from 6.4.10 to 6.4.12.

---
updated-dependencies:
- dependency-name: notebook
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-17 12:10:33 -04:00
5089a2d412 Bump notebook in /examples/research_projects/visual_bert (#17742)
Bumps [notebook](http://jupyter.org) from 6.4.10 to 6.4.12.

---
updated-dependencies:
- dependency-name: notebook
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-17 12:10:17 -04:00
2d7c1bb192 feat: add num_workers arg to DataLoader (#17751) 2022-06-17 10:53:45 -04:00
ca169dbdf1 Enable PyTorch nightly build CI (#17335)
* nightly build pytorch CI

* fix working dir

* change time and event name

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-17 16:42:27 +02:00
3c7e56fbb1 Remove needless file 2022-06-16 12:21:12 -04:00
7c6ec195ad v4.21.0.dev0 2022-06-16 12:20:53 -04:00
36d4647993 Refine Bf16 test for deepspeed (#17734)
* Refine BF16 check in CPU/GPU

* Fixes

* Renames
2022-06-16 11:27:58 -04:00
f44e2c2b6f Fix tf shared embedding (#17730)
* fix the naming

* from pt in test for now

* make style

* slow test and removed from_pt
2022-06-16 14:17:47 +02:00
2eadb7e54a Fix mask token in the example (#17725)
VIsualBert uses bert-base-uncased tokenizer, therefore, instead of {mask}, the mask token should be [MASK]
2022-06-16 07:54:45 -04:00
3981ee8650 Sort the model doc Toc Alphabetically (#17723) 2022-06-15 16:11:56 -04:00
66f893320c normalize keys_to_ignore (#17722) 2022-06-15 11:59:11 -07:00
c3c62b5d2c CLI: Add flag to push TF weights directly into main (#17720)
* Add flag to push weights directly into main
2022-06-15 19:25:50 +01:00
6ebeeeef81 Update requirements.txt (#17719) 2022-06-15 13:51:41 -04:00
50415b84d6 Revert "Change push CI to run on workflow_run event (#17692)" (#17717)
This reverts commit b76290f44ce432e2ee7678a76036e8509167bae6.
2022-06-15 18:42:43 +02:00
7f14839f55 [Wav2Vec2Conformer] Official release (#17709)
* [Wav2Vec2Conformer] Official release

* remove from not-in-readme
2022-06-15 18:34:15 +02:00
242cc6e265 Documentation: RemBERT fixes (#17641)
* rembert: fix python codeblock

* rembert: use correct google/rembert checkpoint name in documentation

* rembert: use correct google/rembert checkpoint name in TF documentation
2022-06-15 18:17:59 +02:00
b76290f44c Change push CI to run on workflow_run event (#17692)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-15 17:43:31 +02:00
d453ea6120 fix tolerance for a bloom slow test (#17634) 2022-06-14 18:14:12 +02:00
120649bf3a [LongT5] disable model parallel test (#17702) 2022-06-14 17:27:39 +02:00
7ec9128e5a FX function refactor (#17625)
* Function refactor

* Update src/transformers/utils/fx.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-06-14 17:22:21 +02:00
edb672ac5e Add BloomForSequenceClassification and BloomForTokenClassification classes (#17639)
* add new bloom classes

* (feat) add bloom classification tests; make style

* style: change import in test

* add some typehints to bloom classes

* merge main into branch

* fix: input checking in bloom seq classification

* fix tests

* change model class tests

* fix few tests

- more tests should pass
- one test left

* make token classifier return hidden states

* style: make BLOOM typehints consistent

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2022-06-14 17:10:12 +02:00
bd43151af4 Swin main layer (#17693)
* Swin models call TFSwinMainLayer

* Tidy up
2022-06-14 14:28:12 +01:00
3960ce917f Include a comment to reflect Amy's contributions (#17689)
* Add note on amy's contribution.

Co-authored-by: Amy Roberts <aeroberts4444@gmail.com>

* remove non-tech comment.

Co-authored by: Amy Roberts <aeroberts4444@gmail.com>

Co-authored-by: Amy Roberts <aeroberts4444@gmail.com>
2022-06-14 09:15:39 -04:00
9068fa6c57 Rag end2end new (#17650)
* check

* update the RAG-end2end with new PL and RAY

* removed unwanted comments
2022-06-14 14:56:32 +02:00
53496ac510 [LongT5] Rename checkpoitns (#17700) 2022-06-14 14:10:50 +02:00
3b29c9fdb7 Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference (#17153)
* add jit mode option and model wrap

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refine code

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add ut and refine code

* code refine

* refine code

* add inference doc

* Update src/transformers/trainer.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/trainer.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add cpu inference performance doc

* Update perf_infer_cpu.mdx

* Update perf_infer_cpu.mdx

* Update performance.mdx

* Update _toctree.yml

* refine jit func naming

* Update _toctree.yml

* Delete perf_infer_gpu_one.mdx

* Update perf_infer_cpu.mdx

* Update docs/source/en/perf_infer_cpu.mdx

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add none check before jit

* Update docs/source/en/perf_infer_cpu.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/perf_infer_cpu.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2022-06-14 07:56:47 -04:00
df15703b42 Fix doc builder Dockerfile (#17435)
* Fix doc builder Dockerfile

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-14 09:58:48 +02:00
a72f1c9f5b Add LongT5 model (#16792)
* Initial commit

* Make some fixes

* Make PT model full forward pass

* Drop TF & Flax implementation, fix copies etc

* Add Flax model and update some corresponding stuff

* Drop some TF things

* Update config and flax local attn

* Add encoder_attention_type to config

* .

* Update docs

* Do some cleansing

* Fix some issues -> make style; add some docs

* Fix position_bias + mask addition + Update tests

* Fix repo consistency

* Fix model consistency by removing flax operation over attn_mask

* [WIP] Add PT TGlobal LongT5

* .

* [WIP] Add flax tglobal model

* [WIP] Update flax model to use the right attention type in the encoder

* Fix flax tglobal model forward pass

* Make the use of global_relative_attention_bias

* Add test suites for TGlobal model

* Fix minor bugs, clean code

* Fix pt-flax equivalence though not convinced with correctness

* Fix LocalAttn implementation to match the original impl. + update READMEs

* Few updates

* Update: [Flax] improve large model init and loading #16148

* Add ckpt conversion script accoring to #16853 + handle torch device placement

* Minor updates to conversion script.

* Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM

* gpu support + dtype fix

* Apply some suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* * Remove (de)parallelize stuff
* Edit shape comments
* Update README.md
* make fix-copies

* Remove caching logic for local & tglobal attention

* Apply another batch of suggestions from code review

* Add missing checkpoints
* Format converting scripts
* Drop (de)parallelize links from longT5 mdx

* Fix converting script + revert config file change

* Revert "Remove caching logic for local & tglobal attention"

This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46.

* Stash caching logic in Flax model

* Make side relative bias used always

* Drop caching logic in PT model

* Return side bias as it was

* Drop all remaining model parallel logic

* Remove clamp statements

* Move test files to the proper place

* Update docs with new version of hf-doc-builder

* Fix test imports

* Make some minor improvements

* Add missing checkpoints to docs
* Make TGlobal model compatible with torch.onnx.export
* Replace some np.ndarray with jnp.ndarray

* Fix TGlobal for ONNX conversion + update docs

* fix _make_global_fixed_block_ids and masked neg  value

* update flax model

* style and quality

* fix imports

* remove load_tf_weights_in_longt5 from init and fix copies

* add slow test for TGlobal model

* typo fix

* Drop obsolete is_parallelizable and one warning

* Update __init__ files to fix repo-consistency

* fix pipeline test

* Fix some device placements

* [wip]: Update tests -- need to generate summaries to update expected_summary

* Fix quality

* Update LongT5 model card

* Update (slow) summarization tests

* make style

* rename checkpoitns

* finish

* fix flax tests

Co-authored-by: phungvanduy <pvduy23@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patil-suraj <surajp815@gmail.com>
2022-06-13 22:36:58 +02:00
1690094bdb Add FP16 Support for SageMaker Model Parallel (#17386)
* Add FP16 supporot for sagemaker model parallel

* minor fix

* fix indentation

* handle mix precision exception for smmp

* minor fix

* remove amp implementation on SMMP

* remove redundant stuff

* reformat trainer

* restyling

* reformat
2022-06-13 13:45:25 -04:00
4aabf9b52c enable cpu distribution training using mpirun (#17570)
* enable cpu distribution training using mpirun

*command like
*    mpirun -n 2 python3 run_qa.py --no_cuda --xpu_backend ccl xxxx
*MASTER_ADDR and MASTER_PORT should be set as env
*export MASTER_ADDR=127.0.0.1
*export MASTER_PORT=29500

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix according to the review comment

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* use accelerate logic for cpu distribution training to set "RANK","LOCAL_RANK","WORLD_SIZE" environment

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2022-06-13 13:34:07 -04:00
457d4a3245 Add Ray's scope to training arguments (#17629)
* allow scope from trainer arg

* add ray_scope to training args

* escape double quotes

* make style && quality

* attempt to solve doc style issues

* splitting up URLs for style

* make fixup

* Update src/transformers/training_args.py

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

* make style

Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2022-06-13 10:44:06 -04:00
5483388631 Update modeling_gpt_neox.py (#17575)
I'm guessing that the intention was to have the `_no_split_modules` class attribute for `GPTNeoXPreTrainedModel` to be set to `["GPTNeoXLayer"]`, akin to how its set as `["GPTJBlock"]` for `GPTJPreTrainedModel`.

If this is incorrect, please feel free to just close the PR.

Thanks!
2022-06-13 09:59:27 -04:00
a1344dbfb9 Fix dtype getter (#17668)
* Fix dtype getters

* Proper fix for dtype getter

* Style and commant

* Always use last for consistency

* Quality
2022-06-13 09:34:45 -04:00
73083581a4 explicitly set utf8 for Windows (#17664) 2022-06-13 08:05:45 -04:00
c1daf724ea Fixed documentation typo, parameter name is evaluation_strategy, not eval_strategy (#17669)
Co-authored-by: Saint <saint@st-mini.local>
2022-06-13 08:02:06 -04:00
66336dc183 Add Visual Question Answering (VQA) pipeline (#17286)
* wip

* rebase

* all tests pass

* rebase

* ready for PR

* address comments

* fix styles

* add require_torch to pipeline test

* remove remote image to improve CI consistency

* address comments; fix tf/flax tests

* address comments; fix tf/flax tests

* fix tests; add alias

* repo consistency tests

* Update src/transformers/pipelines/visual_question_answering.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* address comments

* Update src/transformers/pipelines/visual_question_answering.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* merge

* Update src/transformers/models/auto/modeling_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* merge

Co-authored-by: Sijun He <sijunhe@Sijuns-MacBook-Pro.local>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-06-13 07:49:44 -04:00
a5282ab4bc Fix typo in adding_a_new_model README (#17679) 2022-06-13 03:22:07 -04:00
224bde91ca Avoid GPU OOM for a TF Rag test (#17638)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-10 18:50:29 +02:00
39e146146b fix typo from emtpy to empty (#17643) 2022-06-10 18:50:11 +02:00
13e875cc07 [Generation Test] Make fast test actually fast (#17661) 2022-06-10 18:49:03 +02:00
b4eef63a1d [Data2Vec] Speed up test (#17660) 2022-06-10 18:48:58 +02:00
5e428b71b4 [BigBirdFlaxTests] Make tests slow (#17658)
* [BigBirdFlaxTests] Make tests slow

* up

* correct black with new version
2022-06-10 16:54:14 +02:00
3114df41f4 update README.md (#17657)
- use CodeParrot scores of v1.1
- change evaluation command to use accelerate
2022-06-10 15:55:24 +02:00
c99ddcc441 🐛 Properly raise RepoNotFoundError when not authenticated (#17651)
* Raise RepoNotFoundError in case of 401

* Include changes from revert-17646-skip_repo_not_found

* Add a comment

* 💄 Code quality

* 💚 Update `get_from_cache` test

* 💚 Code quality & skip failing test
2022-06-10 15:41:53 +02:00
35b16032cb Fixes #17128 . (#17356)
VisibleDeprecationWarning is addressed by specifying dtype=object when creating numpy array.
Update code based on review feedback.
Undo whitespace changes to tokenization_utils_base.py.

Co-authored-by: I like data <ilikedata@nym.hush.com>
2022-06-10 09:36:48 -04:00
b88090914d Fix dtype getters (#17656) 2022-06-10 07:43:13 -04:00
fd1e67033e Add skip logic for attentions test - Levit (#17633) 2022-06-10 12:46:30 +02:00
cdaed367b0 Fix style 2022-06-10 11:53:44 +02:00
2bc305107a Fix style 2022-06-10 11:20:14 +02:00
1d463303fe Bump cookiecutter in /examples/research_projects/decision_transformer (#17645)
Bumps [cookiecutter](https://github.com/cookiecutter/cookiecutter) from 1.7.2 to 2.1.1.
- [Release notes](https://github.com/cookiecutter/cookiecutter/releases)
- [Changelog](https://github.com/cookiecutter/cookiecutter/blob/master/HISTORY.md)
- [Commits](https://github.com/cookiecutter/cookiecutter/compare/1.7.2...2.1.1)

---
updated-dependencies:
- dependency-name: cookiecutter
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-10 04:27:51 -04:00
49becbaa55 Enable crop_center method to handle (W, H, C) images (#17626)
* enable crop_center method to handle (W, H, C) images

* minor style and comment edits
2022-06-10 09:18:42 +03:00
6e93d94792 Move Clip image utils to image_utils.py (#17628)
* move clip image utils to image_utils.py

* dont default to square images

* fix typo, revert change to test file

* edit convert_rgb comments
2022-06-10 09:12:17 +03:00
af4a1ecad0 Skip tests until bug is fixed. (#17646) 2022-06-09 21:32:19 -04:00
e0b58fb5ba Translation/autoclass (#17615)
* Add Italian translation for autoclass_tutorial.mdx

* Fix synthesis

Co-authored-by: martina.fumanelli <martina.fumanelli@MBP-di-martinafumanelli.local>
2022-06-09 20:56:44 -04:00
df1ec6b122 didn't exist in pt-1.9 (#17644) 2022-06-09 16:01:01 -07:00
fba0b6a820 convert assertion to raised exception in debertav2 (#17619)
* convert assertion to raised exception in debertav2

* change assert to raise exception in deberta

* fix messages
2022-06-09 18:18:29 -04:00
da0bed5f4a Pre-build DeepSpeed (#17607)
* pre-build deepspeed

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-09 23:02:33 +02:00
75343de938 [modeling_utils] torch_dtype/auto floating dtype fixes (#17614)
* [modeling_utils] torch_dtype/auto fixes

* add test

* apply suggestions

* add missing fallback

* Renaming things

* Use for else

Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
2022-06-09 10:18:26 -07:00
c38f4e1f1c Running a pipeline of float16. (#17637)
When we're preparing the tensors for CPU for postprocessing, we need
to upgrade the `float16` to `float32` since CPUs don't have instructions
for `[b]float16`.
2022-06-09 19:04:42 +02:00
90ed9ae2d1 fix use_amp rename after pr 17138 (#17636) 2022-06-09 09:38:48 -07:00
c70dacde94 Fix very long job failure text in Slack report (#17630)
* Fix very long job failure text in Slack report

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-09 18:37:48 +02:00
2351729f7d Adding top_k argument to text-classification pipeline. (#17606)
* Adding `top_k` and `sort` arguments to `text-classification` pipeline.

- Deprecate `return_all_scores` as `top_k` is more uniform with other
  pipelines, and a superset of what `return_all_scores` can do.
  BC is maintained though.
  `return_all_scores=True` -> `top_k=None`
  `return_all_scores=False` -> `top_k=1`

- Using `top_k` will imply sorting the results, but using no argument
  will keep the results unsorted for backward compatibility.

* Remove `sort`.

* Fixing the test.

* Remove bad doc.
2022-06-09 18:33:10 +02:00
29080643eb Mention in the doc we drop support for fairscale (#17610) 2022-06-09 12:20:39 -04:00
9fc34235fa Use shape_list to safely get shapes for Swin (#17591)
* Use shape_list to safely get shapes

* Add relevant test

* Tidy and add metrics

* Resolve dynamic shaping issues and move test

* Tidy up and all samples in batch

* Formatting
2022-06-09 15:50:50 +02:00
e0be053e43 Add ONNX support for ConvNeXT (#17627) 2022-06-09 09:31:02 -04:00
5323094a22 Add ONNX support for ResNet (#17585)
* Add ONNX support for ResNet

* Add ONNX test

* make fix-copies
2022-06-09 08:44:27 -04:00
ca2a55e9df BLOOM (#17474)
* adding template

* update model

* model update

* update conf for debug model

* update conversion

* update conversion script

* update conversion script

* fix missing keys check

* add tests to test the tokenizer in the local machine

* Change variable name

* add tests on xnli dataset

* add more description

* add descriptions + clearer code

* clearer code

* adding new tests + skipping few tests because of env problems

* change comment

* add dtype on the configuration

* add test embeddings

* add hardcoded test

* fix dtype issue

* adding torch.float16 to config

* adding more metrics (min, max, mean)

* add sum

* now the test passes with almost equal

* add files for conversion - test passes on cpu  gpu

* add final changes

* cleaning code

* add new args in the docstring

* fix one liner function

* remove macros

* remove forward attention

* clean up init funtion

* add comments on the issue

* rm scale mask softmax

* do make style

* fix dtype in init

* fixing for loop on att probs

* fix style with black

* fix style + doc error

* fix and debug CI errors (docs + style)

* some updates

- change new operations
- finally add scaled softmax
- added new args in the config

* make use cache working

* add changes

- save sharded models
- final changes on the modeling script

* add changes

- comment on alibi
- add TODO on seq length

* test commit

- added a text to test the commit

Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* final changes

- attention mask change
- generation works on BS176b

Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* changes - model + conversion

* move to correct dir

* put ,

* fex fixes

* fix tokenizer autodoc

* fix minor CI issues

* fix minor CI issues

* fix minor CI issues

* fix style issue

* fix minor import issues

* fix few issues

* remove def main on the test

* add require torch

* replace decorator with 'with'

* fix style

* change to bloom

* add quick fix tokenizer

* fix tokenizer file

* fix tokenizer

- merge tests
- small fixes

* fix import issue

* add bloom to readme

* fix consistency

* Update docs/source/en/model_doc/bloom.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

fix comment issues on file headers

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix doc issue

* small fix - modeling test

* some changes

- refactor some code
- taking into account reviews
- more tests should pass
- removed pruning tests

* remove useless division

* more tests should pass

* more tests should pass

* more tests should pass

* let's try this one

-add alibi offset
- remove all permutes to make the grad operations work
- finger crossed

* refactor

- refactor code
- style changes
- add new threshold for test

* major changes

- change BLOOM to Bloom
- add quick doc on bloom.mdx
- move embeddings test on modeling test

* modify readme

* small fixes

* small fix

- better threshold for a test

* remove old test file from fetcher

* fix small typo

* major change

- change BloomLMHead to BloomForCausalLM

* remove onnx config

* major changes

- refactor the code
- remove asserts
- change tol for test

* make style

* small change

* adding a slow test + commenting old ones for now

* make style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make style

* fix duplicates

* cleaning comments on config

* clean a bit conversion file

* refacor a bit modeling file

* refactor tokenizer file

* fix tokenization test issue

* fix tokenization issue #2

* fix tokenization issue second try

* fix test issue

* make style + add suggestions

* change test fetcher

* try this one

- slow tests should pass
- finger crossed

* possible final changes

* make style

* try fix padding side issue

* fix side

* fix padding issue

* fix ko-readme

* fix config auto

* cleaning modeling file

* keep bloom in caps in ko

* update config docs

* remove pretraining_pp

* remove model parallel

* update config

- add correct config files

* fix duplicates

* fix fetcher

* fix refactor issue

- remove divide function

* try to remove alibi

* small fixes

- fix alibi
- remove seq length
- refactor a bit the code

* put correct values

- fix bos and eos token ids

* fix attention mask loop

Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>

* small fixes:

- remove skip bias add

* small fixes

- fix typo in readme
- fix typos in config

* small changes

- remove a test
- add reconstruction test
- change config

* small changes

- change Scaled Softmax to BloomScaledSoftmax

* small fixes

- fix alibi dtype

* major changes

- removing explicit dtype when loading modules
- fixing test args (torch_dtype=auto)
- add dosctring

* fix readmes

* major changes

- now bloom supports alibi shifting
- refactor a bit the code
- better test tolerance now

* refactor a bit

* refactor a bit

* put correct name on test

* change docstring

* small changes

- fix docstring modeling
- fix test tolerance

* fix small nit

- take dtype from tensors in the conversion script

* minor fix

- fix mdx issue

* minor fix

- change config docstring

* forward contrib credits from PR14084

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* apply modifications

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* resolve softmax upcast

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>

* final changes modeling

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Merge commit 'd156898f3b9b2c990e5963f5030a7143d57921a2'

* merge commit

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* apply suggestions

Apply suggestions from Stas comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Fix gradient checkpointing

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* add slow but exact

* add accelerate compatibility

Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>

* forward contrib credits

Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix torch device on tests

* make style

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix nits

Co-authored-by: patrickvonplaten<patrickvonplaten@users.noreply.github.com>

* remove final nits

* fix doc

- add more details on the doc
- add links to checkpoints

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply suggestions

Co-authored-by: sgugger <sgugger@users.noreply.github.com>

* put test torchscript to false

* Update src/transformers/models/bloom/modeling_bloom.py

Co-authored-by: justheuristic <justheuristic@gmail.com>

* fix alibi

- create alibi only once

* add small doc

* make quality

* replace torch.nn

* remove token type emb

* fix fused op + output bias

* add fused op

- now can control fused operation from config

* remove fused op

* make quality

* small changes

- remove unsed args on config
- removed bias gelu file
- make the model torchscriptable
- add torchscript slow tests

* Update src/transformers/models/bloom/modeling_bloom.py

* fix slow

* make style

* add accelerate support

* add bloom to deepspeed tests

* minor changes

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* minor change

* slow tests pass

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/model_doc/bloom.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* minor changes:

- change docstring
- add link to paper

Co-authored-by: Thomwolf <thomwolf@gmail.com>
Co-authored-by: Thomas Wolf <thomas@huggingface.co>
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sIncerass <sheng.s@berkeley.edu>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2022-06-09 12:00:40 +02:00
dfc76b2542 has_attentions - consistent test skipping logic and tf tests (#17495) 2022-06-09 09:50:03 +02:00
66e8656778 CLI: Print all different tensors on exception (#17612) 2022-06-08 18:30:03 +01:00
e9d5138768 TF: Merge PT and TF behavior for Bart when no decoder_input_ids are passed (#17593)
* Merge PT and TF behavior
2022-06-08 17:42:23 +01:00
e160a5dd62 Fix telemetry URL (#17608) 2022-06-08 11:34:05 -04:00
7d0b6fc340 CLI: Properly detect encoder-decoder models (#17605) 2022-06-08 16:15:59 +01:00
ee82c86bdc Fix link for community notebooks (#17602)
* Fix link for community notebooks

This fixes the link for community notebooks due to reorganization.

* Replace old link with fully link to the doc page

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-06-08 10:51:39 -04:00
34097b3304 Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch (#17138)
* init PR

* fix import ipex

* minor fix on bf16

* refine optimizer

* refine args notes

* refine code

* refine ipex optimize args

* refine half_precision_backend

* black format

* isort format

* isort format files

* flake8 format

* doc builder format

* refine codes

* remove jit and optim bits

* black preview format

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refine code

* refine notes

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* code refine

* add ipex ut

* add performance cpu doc

* link to the cpu doc from main perf doc

* install ipex into CI's docker

* Update perf_train_cpu.mdx

* Update docs/source/en/perf_train_cpu.mdx

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update perf_train_cpu.mdx

* Update perf_train_cpu.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2022-06-08 09:41:57 -04:00
ae7bae8fe7 fix train_new_from_iterator in the case of byte-level tokenizers (#17549) 2022-06-08 15:30:41 +02:00
264128cb9d Explicit versions in docker files (#17586)
* Update docker file

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-08 15:04:22 +02:00
9d99489f2f Add TFData2VecVision for semantic segmentation (#17271)
* feat: initial implementation of data2vec segmentation model in TF.

* chore: minor corrections to make the segmenter work.

* chore: removed unncessary files.

* chore: add tests and other modifications.

* fix: loss computation for segmentation.

* chore: remove unused variable.

* chore: formatting.

* added a dummy adaptive pooling layer.

* removed unnecessary file.

* potentially add identifiers to layer names.

* fix: layer naming.

* chore: removed unnecessary print.

* Skipping unneeded test

* chore: add logging to debug tolerance.

* fix: segmentation tests for tfdata2vecvision

* chore: make style.

* fix: layer names, assertion to be resolved.

* Bumping test tolerance a bit

* chore: bump the tol in PT test.

Co-authored-by: matt <rocketknight1@gmail.com>
2022-06-08 14:03:18 +01:00
78c695eb62 CLI: add stricter automatic checks to pt-to-tf (#17588)
* Stricter pt-to-tf checks; Update docker image for related tests

* check all attributes in the output

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-06-08 10:45:10 +01:00
c6cea5a78c fix (#17589)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-08 01:50:59 +02:00
119e3c0fc8 M-CTC-T Model (#16402)
* added cbs to notebooks, made copy-paste error fix in generation_utils

* initial push for mctc model

* mctc feature extractor done

* added processor, tokenizer and their tests for MCTC. Have added an MCTC modeling test, adjusting model code accordingly.

* added processor, tokenizer and their tests for MCTC. Have added an MCTC modeling test, adjusting model code accordingly.

* passing attention, now struggling to figure out how attention masks make sense here

* works when excluding attention masks. ask later how one would integrate attention maskshere

* bizarre configuration error (model prefix comes first in config dict json and messes up the order)

* all passing but bizzarre config dict ordering issue when to_dict

* passing all major tests

* feature extraction, processor, tokenizer added & tests passing

* style & consistency & other logistical fixes

* copy paste fix

* model after feature extraction working

* commiting final feature extraction results; need to fix normalization

* feature extraction passing tests; probably should add tests on the specific flashlight-copied functions?

* delete print ; format code a bit

* fixing tests

* passing major tests

* fixing styles

* completed tokenization test with real example; not sure if these values are entirely correct.

* last test fixes from local

* reverting accidentally included custom setup configs

* remove load tf weights; fix config error

* testing couldnt import featureextractor

* fix docs

* fix docs

* resolving comments

* style fixes

* style fixes

* Update to MCTCConv1dSubSampler

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* relposemb fixes

* conv1d name issue; expecting config fail with paraentheses

* fix config issue

* fix config issue

* fix config issue

* change everything to MCTCT

* fixing naming change errors

* archive list

* copyrights and docs

* copyrights and docs

* copyrights and docs

* merge resolution

* move tests, fix to changed optionaldependency structure

* test directories changed

* fixing tests

* how to avoid tf tests?

* how to avoid tf tests?

* tests passing locally

* allow mctctprocessor imported any env

* allow mctctprocessor imported any env

* fixed second round of feedback, need to fix docs

* doc changes not being applied

* all fixed

* style fix

* feedback fixes

* fix copies and feature extraction style fix

* Update tests/models/visual_bert/test_modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* copy paste huggingface:main visual bert

* added eof newline to visual bert; all tests are passing otherwise

* fix slow tests by adding attention mask

* change model id to speechbrain

* make fix-copies

* fix readme unwanted deletes

* fixing readmes, make fix-copies

* consistent M-CTC-T naming

* Update src/transformers/models/mctct/__init__.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* all fixed but variable naming

* adjust double quotes

* fixed variable names

* copyright and mr quilter

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* correct slow tests

* make fix-copies

* Update src/transformers/models/mctct/configuration_mctct.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/mctct/configuration_mctct.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* m-ctc-t not mctct

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-06-08 00:33:07 +02:00
706bb8364d quicktour.mdx en -> pt translation (#17074)
* Quicktour Portuguese Translation

Translated quicktour.mdx until line 161

* Finished translating quicktour.mdx

Ready to upload and adjust eventual .mdx or translation mistakes.

* Add _toctree.yml and fix nits

* Fixed pt-br mdx syntax problem

Closed <frameworkcontent> instance

* Changed </frameworkcontent> line

* Copied missing block from english version of quicktour.mdx

* Reviwed the entire file once again. It should be working now.

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
2022-06-07 17:35:05 -04:00
5c8f601007 Fx support for Deberta-v[1-2], Hubert and LXMERT (#17539)
* Support for deberta and deberta-v2

* Support for LXMert

* Support for Hubert

* Fix for pt1.11

* Trigger CI
2022-06-07 18:05:20 +02:00
3cab90279f Add examples telemetry (#17552)
* Add examples telemetry

* Alternative approach

* Add to all other examples

* Add to templates as well

* Put framework separately

* Same for TensorFlow
2022-06-07 11:57:52 -04:00
9e72eb4416 Skip disk offload test for T5 2022-06-07 11:11:40 -04:00
b118730745 Fix gendered sentence in Spanish translation(#17558) 2022-06-07 14:09:39 +02:00
b6a65ae52a Fix circular import in onnx.utils (#17577)
* Fix circular import in onnx.utils

* Add comment for test fetcher

* Here too

* Style
2022-06-07 08:00:36 -04:00
9aa230aa2f Use latest stable PyTorch/DeepSpeed for Push & Scheduled CI (#17417)
* update versions

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-07 11:53:05 +02:00
ad71965246 Remove circular imports in layoutlm/__init__.py (#17576) 2022-06-06 22:41:41 +02:00
19a8a3036d Add magic method to our TF models to convert datasets with column inference (#17160)
* Add method to call to_tf_dataset() with column inference

* Add test for dataset creation

* Add a default arg for data collator

* Fix test

* Fix call with non-dev version of datasets

* Test correct column removal too

* make fixup

* More tests to make sure we remove unwanted columns

* Fix test to avoid predicting on unbuilt models

* Fix test to avoid predicting on unbuilt models

* Fix test to remove unwanted head mask columns from inputs

* Stop pushing your debug breakpoints to the main repo of the $2bn company you work for

* Skip the test in convnext because no grouped conv support

* Drop bools from the dataset dict

* Make style

* Skip the training test for models whose input dicts don't give us labels

* Skip transformerXL in the test because it doesn't return a simple loss

* Skip TFTapas because of some odd NaN losses

* make style

* make fixup

* Add docstring

* fixup

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove breakpoint from tests

* Fix assert, add requires_backends

* Protect tokenizer import with if TYPE_CHECKING

* make fixup

* Add noqa, more fixup

* More rearranging for ~* aesthetics *~

* Adding defaults for shuffle and batch_size to match to_tf_dataset()

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-06-06 15:53:49 +01:00
d28b7aa8cb [deepspeed / testing] reset global state (#17553)
* [deepspeed] fix load_best_model test

* [deepspeed] add state reset on unittest tearDown
2022-06-06 07:49:25 -07:00
34a886fce3 Translation/italian: added pipeline_tutorial.mdx [Issue: #17459] (#17507)
* added toctree.yml file

* first translation

* added pipeline_tutorial.mdx translation

added pipeline_tutorial.mdx
updated _toctree.yml

* updated pipeline_tutorial.mdx

* updated _toctree.yml

Updated preprocessing and training

* updated preprocessing.mdx

start translation

* Update _toctree.yml

* Delete preprocessing.mdx

* Update _toctree.yml

* updated _toctree.yml

* added preprocessing

* Update _toctree.yml

* updated _toctree.yml

* undo

* Revert "undo"

This reverts commit 5d38d768752dc80918bf60ada9d185f98b742520.

* Revert "Revert "undo""

This reverts commit 8aa0830b587f915ca7d154ebca282b782e82bd92.
2022-06-06 10:35:20 -04:00
2e37ef35d1 Remove RuntimeErrors for NaN-checking in 20B (#17563) 2022-06-06 09:29:06 -04:00
f6ad0e0556 Add installation.mdx Italian translation (#17530)
* Add the Italian translation of the file installation.mdx and edit _toctree

* Add the Italian translation of the file installation.mdx and edit _toctree
2022-06-06 07:48:08 -04:00
4aed1dc81b Adding the Portuguese version of the tasks/token_classification.mdx documentation (#17492)
* add tasks/token_classification pt doc structure

* add tasks/token_classification pt doc translation

* add tasks/token_classification pt doc translation
2022-06-06 07:47:34 -04:00
da71df1afc fix integration test levit (#17555) 2022-06-06 13:47:32 +02:00
26e5e129b4 [deepspeed] fix load_best_model test (#17550) 2022-06-03 11:19:03 -07:00
72f5b94984 Update index.mdx (#17547)
This PR updates our Expert Acceleration Program image with a new image featuring our experts.

This is similar to our Transformers/README.md image update that has proven to be successful.
2022-06-03 12:56:37 -05:00
c4e58cd8ba Clean imports to fix test_fetcher (#17531)
* Clean imports to fix test_fetcher

* Add dependencies printer

* Update utils/tests_fetcher.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Fix Perceiver import

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2022-06-03 12:34:41 -04:00
254d9c068e Update run_glue_no_trainer.py (#17546) 2022-06-03 12:29:37 -04:00
8343901263 Fix all offload and MP tests (#17533) 2022-06-03 09:59:13 -04:00
1c57242d7b Fix bug - layer names and activation from previous refactor (#17524)
* Fix activation and layers in MLP head

* Remove unused import
2022-06-03 09:31:10 -04:00
babeff5524 Add support for Perceiver ONNX export (#17213)
* Start adding perceiver support for ONNX

* Fix pad token bug for fast tokenizers

* Fix formatting

* Make get_preprocesor more opinionated (processor priority, otherwise tokenizer/feature extractor)

* Clean docs format

* Minor cleanup following @sgugger's comments

* Fix typo in docs

* Fix another docs typo

* Fix one more typo in docs

* Update src/transformers/onnx/utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/onnx/utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/onnx/utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-06-03 07:40:22 -04:00
5c17918fe4 Allow from transformers import TypicalLogitsWarper (#17477)
* Allow from transformers import TypicalLogitsWarper

* Added TypicalLogitsWarper

* Allow from transformers import TypicalLogitsWarper

* Allow from transformers import TypicalLogitsWarper

* Allow from transformers import TypicalLogitsWarper

* Allow from transformers import TypicalLogitsWarper

Added TypicalLogitsWarper

Allow from transformers import TypicalLogitsWarper

Allow from transformers import TypicalLogitsWarper

Allow from transformers import TypicalLogitsWarper
2022-06-03 11:08:35 +02:00
607acd4fbd Add Gated-SiLU to T5 (#17420)
* Add gated-silu to t5 architecture to support UL2

* Fix error message

* formatting

* formatting again

* refactor

* fix classnames in _init_weights

* remove is_gated

* add test

* fix test

* Try without the test?

* Add back the test.

* Improve error message.

Co-authored-by: Daniel Hesslow <daniel@lighton.ai>
2022-06-03 10:56:37 +02:00
1c220ced8e Update URL for Hub PR docs (#17532) 2022-06-02 21:52:30 +02:00
013462c57b fix OPT-Flax CI tests (#17512) 2022-06-02 18:52:46 +02:00
2f59ad1609 [trainer/deepspeed] load_best_model (reimplement re-init) (#17151)
* [trainer/deepspeed] load_best_model

* to sync with DS PR #1947

* simplify

* rework load_best_model test

* cleanup

* bump deepspeed>=0.6.5

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
2022-06-02 09:14:21 -07:00
046c5ea906 Implemented loss for training AudioFrameClassification (#17513)
* Implemented loss for training AudioFrameClassification

* reported changes in wav2vec2 main class and used make copies to propagate

* running black for code formatting
2022-06-02 17:40:02 +02:00
085321c9a1 Update configuration_auto.py (#17527) 2022-06-02 10:37:00 -04:00
048dd73bba Check list of models in the main README and sort it (#17517)
* Script for README

* Fix copies

* Complete error message
2022-06-02 08:10:08 -04:00
588d8f1f26 Fix when Accelerate is not installed (#17518) 2022-06-02 07:45:41 -04:00
f128ccb997 Clean README in post release job as well. (#17519) 2022-06-02 07:44:03 -04:00
216499bfcc Fix CI tests hang forever (#17471)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-02 10:30:54 +02:00
659b27fd26 Print more library versions in CI (#17384)
* print more lib. versions and just befor test runs

* update print_env_pt.py

* rename to print_env

* Disable warning + better job name

* print python version

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-02 10:24:16 +02:00
0932adb3e8 Split push CI into 2 workflows (#17369)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-02 10:19:26 +02:00
58fb3c9f98 Fix Tapas tests (#17510)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-01 21:01:32 +02:00
ca1f1c8685 CLI: tool to convert PT into TF weights and open hub PR (#17497) 2022-06-01 18:52:07 +01:00
3766df4fe1 Fix flakey no-trainer test (#17515) 2022-06-01 13:40:49 -04:00
028d4b7c8b Deal with the error when task is regression (#16330) 2022-06-01 11:15:53 -04:00
84aaadd8c5 Adding LeViT Model by Facebook (#17466)
* levit files

* levit tests

* weights script

* weights script

* update

* style fixes

* few minor corrections

* Added teacher model

* edit docs

* fix-copies

* style fixes

* pr error resolved

* Update README.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/index.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/levit.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/levit.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/levit.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/levit.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/levit/__init__.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/levit/configuration_levit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/levit/configuration_levit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/levit/feature_extraction_levit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* suggested pr changes

* style fixes

* minor bug

* update

* minor doc edit

* style

* Update src/transformers/models/levit/feature_extraction_levit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/levit/feature_extraction_levit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/models/levit/test_modeling_levit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/levit/modeling_levit.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/levit/feature_extraction_levit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* residual layer readable

* style

* Update docs/source/en/model_doc/levit.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/levit/feature_extraction_levit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/levit/feature_extraction_levit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/levit/feature_extraction_levit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/levit/feature_extraction_levit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/levit/modeling_levit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/levit/modeling_levit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/levit/modeling_levit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update tests/models/levit/test_feature_extraction_levit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* change checkpoints and style

* update

* minor changes

* Update src/transformers/models/levit/modeling_levit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/levit/modeling_levit.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-06-01 17:06:20 +02:00
1d2b57b8a2 Fix CTRL tests (#17508)
* fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-01 16:27:23 +02:00
693720e567 Fix LayoutXLMProcessorTest (#17506)
* fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-06-01 16:26:37 +02:00
4d1ce39683 Debug LukeForMaskedLM (#17499)
* add a test for a word only input

* make LukeForMaskedLM work without entity inputs

* update test

* add LukeForMaskedLM to MODEL_FOR_MASKED_LM_MAPPING_NAMES

* restore pyproject.toml

* empty line at the end of pyproject.toml
2022-06-01 10:03:06 -04:00
4390151ba2 Fix MP and CPU offload tests for Funnel and GPT-Neo (#17503) 2022-06-01 09:59:40 -04:00
6813439fdc Exclude Databricks from notebook env (#17496) 2022-06-01 09:00:11 -04:00
3042ea4f6f Fix tokenizer type annotation in pipeline(...) (#17500)
I think you mean to accept either an instance of `PreTrainedTokenizer` or `PreTrainedTokenizerFast` inside of the `pipeline(...)` factory function, if the `tokenizer` argument isn't a `str`.
2022-06-01 08:43:28 -04:00
bdc01711d6 Refactor classes to inherit from nn.Module instead of nn.Sequential (#17493)
* Adapt Maskformer, VAN, ResNet and RegNet modules to inherit from nn.Module
2022-06-01 13:36:19 +01:00
b1160c0b56 Fix wav2vec2 export onnx model with attention_mask error (#16004)
* Fix wav2vec2 export onnx model with attention_mask error

* fix repository_consistency
2022-06-01 13:30:58 +02:00
d91da4c6df Add warning when using older version of torch for ViltFeatureExtractor (#16756)
* Update feature_extraction_vilt.py

* apply black

* Update imports

* Change warning to logging

* Use logger instead of logging.logging

* make fixup

* Move error message

* Update src/transformers/models/vilt/feature_extraction_vilt.py

Co-authored-by: Xing Han Lu <xhlperso@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2022-06-01 07:15:38 -04:00
24092b1464 Fix typo of variable names for key and query projection layer (#17155)
self.pos_proj and self.pos_q_proj should be changed to self.pos_key_proj and self.pos_query_proj as same as PyTorch implements.
2022-06-01 11:38:44 +01:00
811da2b8c2 Fixed wrong error message for missing weight file (#17216) 2022-06-01 06:24:20 -04:00
4f38808e9e Add OnnxConfig for SqueezeBert iss17314 (#17315)
* add onnx config for SqueezeBert

* add test for onnx config for SqueezeBert

* add automatically updated doc for onnx config for SqueezeBert

* Update src/transformers/onnx/features.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update src/transformers/models/squeezebert/configuration_squeezebert.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2022-06-01 06:16:15 -04:00
ba286fe7d5 [GPT2Tokenizer] Fix GPT2 with bos token (#17498) 2022-05-31 20:06:48 +02:00
7822a9b7a7 Opt in flax and tf (#17388)
* initial commit

* add init file

* update globakl init

* update index and dummy objects

* style

* update modelling auto

* fix initi typo in src/transformers

* fix typo in modeling tf auto, opt was in wrong mapping name

* fixed a slow test : saved_model

* style

* fix positionnal embedding if no position id is provided

* update tf test

* update test flax requirements

* fixed serialization

* update

* update tf name to allow smooth convertion

* update flax tests

* style

* fix test typo

* fix tf typo test

* add xla for generate support in causal LM

* fixed bug

* cleaned tf tests

* style

* removed from PT for slow tests

* fix typp

* opt test as slow

* trying to fix GPT2 undefined

* correct documentation and add to test doc

* update tf doc

* fix doc

* fake commit

* Apply suggestions from code review

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* update test based on review

* merged main layer for functionning test

* fixup + quality

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* update long comment

* make fix copies

Co-authored-by: Arthur <arthur@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-05-31 18:41:22 +02:00
f394a2a50d [Json configs] Make json prettier for all saved tokenizer files & ensure same json format for all processors (tok + feat_extract) (#17457)
* [Json dump] Make json prettier

* correct more tokenizeirs

* more patterns

* add aggressive test

* the aggressive test was actually useful :-)

* more tests

* Apply suggestions from code review
2022-05-31 17:07:30 +02:00
6ee1474b67 Accumulate tokens into batches in PreTrainedTokenizerBase.add_tokens() (#17119)
* Accumulate tokens into batches in PreTrainedTokenizerBase.add_tokens()

For tokenizers with a small number of special tokens or special tokens
with consecutive token IDs, this reduces the time complexity of creating
the trie from quadratic to linear, see also #16936.

* Extend explanation of batching added tokens
2022-05-31 16:36:45 +02:00
52e7c92920 Add HF.co for PRs / Issues regarding specific model checkpoints (#17485)
* Add HF.co for PRs / Issues regarding specific model checkpoints

* Update .github/ISSUE_TEMPLATE/config.yml

Co-authored-by: Julien Chaumond <julien@huggingface.co>

Co-authored-by: Julien Chaumond <julien@huggingface.co>
2022-05-31 15:58:39 +02:00
dfc38463b8 Setup for Italian translation and add quicktour.mdx translation (#17472)
* Setup for Italian translation and add first document

- Add 'it' folder for files translated into Italian
- Add _config.py and _toctree.yml files
- Add translation of quicktour.mdx

* Fix style issue of italian documentation files

* Add 'it' to the languages section in the .github/workflows

* Remove - installation from _toctree for Italian

* Translation for index file

- Add index to _toctree.yml
- Add translation of index.mdx

* Fix typo in docs/source/it/index.mdx

* Translate code comments in docs/source/it/_config.py

Co-authored-by: Martina Fumanelli <martinafumanelli@Martinas-MBP.homenet.telecomitalia.it>
2022-05-31 09:57:43 -04:00
8f8b3cbce4 Fix checkpoint name (#17484)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-31 15:40:48 +02:00
400b30936a Docker image build in parallel (#17434)
* docker image build in parallel

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-31 15:39:03 +02:00
5af38953bb Added XLM onnx config (#17030)
* Add onnx configuration for xlm

* Add supported features for xlm

* Add xlm to models exportable with onnx

* Add xlm architecture to test file

* Modify docs

* Make code quality fixes
2022-05-31 09:26:06 -04:00
567d9c061d Disk offload fix (#17428)
* Fix offload to disk for big models

* Add test

* Fix test for other models
2022-05-31 09:16:18 -04:00
975dd2bbbc TF: GPT-2 generation supports left-padding (#17426)
* TF GPT-2 now properly works with left padding

* throw a warning when eos token == pad token and there is no attention mask
2022-05-31 14:06:44 +01:00
c1a138613d Fix ViTMAEModelTester (#17470)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-31 15:01:54 +02:00
b0e0ac8a67 [Generate] Fix output scores greedy search (#17442) 2022-05-31 14:59:49 +02:00
2ef09ecfb8 Fix nits (#17349) 2022-05-31 08:41:54 -04:00
28d0048218 Fx support for multiple model architectures (#17393)
* Support for Bart and LayoutLM, and partial support for XLNet

* Support for mbart

* A lot of new models supported

* Support for other models

* LayoutLM fix

* Use strings instead of classes
2022-05-31 10:02:55 +02:00
04681c1d81 typo IBERT in __repr__ quant_mode (#17398)
fix #17397
2022-05-31 03:48:10 -04:00
13fd67346a Fix typo (remove parenthesis) (#17415) 2022-05-31 03:21:32 -04:00
d156898f3b Improve notrainer examples (#17449)
* improve no-trainer examples

* Trigger CI

* adding comment to clarify tracker init on main process

* Trigger CI

* Trigger CI

* Trigger CI
2022-05-28 00:06:31 +05:30
7999ec125f [OPT] Fix bos token id default (#17441) 2022-05-26 18:24:12 +02:00
98f6e1ee87 Fix model parallelism test (#17439) 2022-05-26 09:57:12 -04:00
7535d92e71 Pin protobouf that breaks TensorBoard in PyTorch (#17440) 2022-05-26 09:56:55 -04:00
2295bcaea8 Spanish translation of the file preprocessing.mdx (#16299)
* Spanish translation of the file training.mdx

* Settings - Spanish translation of the file training.mdx

* Latest changes to the Spanish translation of the training.mdx file

* Delete Hugging.mdx

* Last changes to the training fil Espanish version

* Latest modifications

* Latest changes, document ready for PR

* Nits

* Spanish translation of the preprocessing file

* Update docs/source_es/preprocessing.mdx

* Update docs/source_es/preprocessing.mdx

* Update docs/source_es/preprocessing.mdx

* Update docs/source_es/preprocessing.mdx

* Update docs/source_es/preprocessing.mdx

* Update docs/source_es/preprocessing.mdx

* Update docs/source_es/preprocessing.mdx

* Update docs/source_es/preprocessing.mdx

* Update docs/source_es/preprocessing.mdx

* Update docs/source_es/preprocessing.mdx

* Update docs/source_es/preprocessing.mdx

* Update docs/source_es/preprocessing.mdx

* Update docs/source_es/preprocessing.mdx

* Update docs/source_es/preprocessing.mdx

* Update docs/source_es/preprocessing.mdx

* Update docs/source_es/preprocessing.mdx

* Update docs/source_es/preprocessing.mdx

* Nits and add preprocessing to _toctree.yml

Co-authored-by: Yhary Arias <yharystefa@gmail.com>
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
2022-05-26 07:28:14 -04:00
8f46ac9849 Spanish translation of the files sagemaker.mdx and image_classification.mdx (#17262)
* Duplication of the source eng file

* Spanish translation of the file multilingual.mdx

* Update docs/source_es/multilingual.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source_es/multilingual.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source_es/multilingual.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source_es/multilingual.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source_es/multilingual.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source_es/multilingual.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source_es/multilingual.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Fix nits and finish translation

* Spanish translation of sagemaker.mdx

* Was deleted in main

* Security saving

* Complete translation of image_classification.mdx

* Nits

* nits

* Update docs/source/es/image_classification.mdx

* Add files to _toctree.yml

* Fix toctree and add tasks folder

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
2022-05-25 19:10:16 -04:00
5e7f085fcc Added es version of bertology.mdx doc (#17255)
* added bertology es doc

* toctree fix

* Update docs/source/es/bertology.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/bertology.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/bertology.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* change position of bertology in _toctree.yml

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
2022-05-25 18:46:53 -04:00
70484a8d74 Adding the Portuguese version of the tasks/sequence_classification.mdx documentation (#17352)
* add sequence_classification pt doc structure

* add Portuguese tasks/sequence_classification.mdx
2022-05-25 16:21:27 -04:00
a9eca74372 Wav2vec2 finetuning shared file system (#17423)
* fix_torch_device_generate_test

* remove @

* [Fix shared file system]

Co-authored-by: Patrick von Platen <patrick@huggingface.co>
2022-05-25 22:04:43 +02:00
740a1574f1 fix link in performance docs (#17419) 2022-05-25 20:54:43 +02:00
284fc6c0bb Add link to Hub PR docs in model cards (#17421) 2022-05-25 20:38:56 +02:00
35e2d13f3c Upd AutoTokenizer.from_pretrained doc examples (#17416) 2022-05-25 11:35:50 -04:00
897a8dd89f Support compilation via Torchdynamo, AOT Autograd, NVFuser (#17308)
* Support compilation via Torchdynamo, AOT Autograd, NVFuser

* Address comments

* Lint

* Stas comments - missing quality test

* Lintere

* Quality test

* Doc lint

* Reset CUDA peak mem

* Add CustomTrainer

* require a single gpu

Co-authored-by: Stas Bekman <stas@stason.org>
2022-05-25 11:16:09 -04:00
31484afbed Add test for new model parallelism features (#17401) 2022-05-25 10:51:27 -04:00
56b35ce3eb Make check_init script more robust and clean inits (#17408) 2022-05-25 07:23:56 -04:00
bd908e9bb1 Fix README localizer script (#17407) 2022-05-25 07:23:40 -04:00
4d727bd2df Fix expected value for OPT test test_inference_no_head (#17395)
* Fix expected value

* 5e-5

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-25 11:19:06 +02:00
1ef9a1ed4a Bump tensorflow in /examples/research_projects/decision_transformer (#17400)
Bumps [tensorflow](https://github.com/tensorflow/tensorflow) from 2.8.0 to 2.8.1.
- [Release notes](https://github.com/tensorflow/tensorflow/releases)
- [Changelog](https://github.com/tensorflow/tensorflow/blob/master/RELEASE.md)
- [Commits](https://github.com/tensorflow/tensorflow/compare/v2.8.0...v2.8.1)

---
updated-dependencies:
- dependency-name: tensorflow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-24 19:36:55 -04:00
71e602725b [WIP] Adding GPT-NeoX-20B (#16659)
* initial

* first try

* working 20B

* 20B tokenizers

* Docs

* Import fixes for missing classes

* Update docs, fixup

* black formatting

* isort

* flake

* dummy objects

* documentation

* Documentation yml

* more docs

* tweaks for tests

* tokenization auto

* fix neox tests

* test

* test

* einsum

* address PR feedback

* Documentation

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gpt_neox/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/gpt_neox/configuration_gpt_neox.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove undefined LaTeX syntax

* Update to full url to avoid confusion about if that's supposed to refer to the Hub

* fix auto

* move tests

* documentation fix

* more doc fixes

* test refactor

* fix import

* fix import

* fix import

* fix import

* fix import

* style fixes

* More modeling fixes

Co-authored-by: Jason Phang <zp489@gr057.hpc.nyu.edu>
Co-authored-by: Stella Biderman <stellabiderman@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-05-24 09:31:10 -04:00
374a2f693f Clean up CLIP tests (#17380)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-05-24 14:51:26 +02:00
d980929803 Enabling imageGPT auto feature extractor. (#16871)
* Enablign `imageGPT` auto feature extractor.

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Small updates.

* Update after rebase to use `input_ids` instead of `pixel_values`.

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-24 12:30:46 +02:00
31ee80d556 Add LayoutLMv3 (#17060)
* Make forward pass work

* More improvements

* Remove unused imports

* Remove timm dependency

* Improve loss calculation of token classifier

* Fix most tests

* Add docs

* Add model integration test

* Make all tests pass

* Add LayoutLMv3FeatureExtractor

* Improve integration test + make fixup

* Add example script

* Fix style

* Add LayoutLMv3Processor

* Fix style

* Add option to add visual labels

* Make more tokenizer tests pass

* Fix more tests

* Make more tests pass

* Fix bug and improve docs

* Fix import of processors

* Improve docstrings

* Fix toctree and improve docs

* Fix auto tokenizer

* Move tests to model folder

* Move tests to model folder

* change default behavior add_prefix_space

* add prefix space for fast

* add_prefix_spcae set to True for Fast

* no space before `unique_no_split` token

* add test to hightligh special treatment of added tokens

* fix `test_batch_encode_dynamic_overflowing` by building a long enough example

* fix `test_full_tokenizer` with add_prefix_token

* Fix tokenizer integration test

* Make the code more readable

* Add tests for LayoutLMv3Processor

* Fix style

* Add model to README and update init

* Apply suggestions from code review

* Replace asserts by value errors

* Add suggestion by @ducviet00

* Add model to doc tests

* Simplify script

* Improve README

* a step ahead to fix

* Update pair_input_test

* Make all tokenizer tests pass - phew

* Make style

* Add LayoutLMv3 to CI job

* Fix auto mapping

* Fix CI job name

* Make all processor tests pass

* Make tests of LayoutLMv2 and LayoutXLM consistent

* Add copied from statements to fast tokenizer

* Add copied from statements to slow tokenizer

* Remove add_visual_labels attribute

* Fix tests

* Add link to notebooks

* Improve docs of LayoutLMv3Processor

* Fix reference to section

Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-05-24 09:53:45 +02:00
13541b4aa2 Add support for device_map="auto" to OPT (#17382) 2022-05-23 15:25:51 -04:00
71cced8ae3 OPTForCausalLM lm_head input size should be config.word_embed_proj_dim (#17225) 2022-05-23 21:20:29 +02:00
56f50590d5 Use Accelerate in from_pretrained for big model inference (#17341)
* Initial work

* More or less finished with first draft

* Update src/transformers/modeling_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix randomly initialized weights

* Update src/transformers/modeling_utils.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comments

* Rename DeepSpeed folder to temporarily fix the test issue?

* Revert to try if Accelerate fix works

* Use latest Accelerate release

* Quality and fixes

* Style

* Quality

* Add doc

* Test + fix

* More blocks

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2022-05-23 14:32:21 -04:00
2e7e4280aa Traced models serialization and torchscripting fix (#17206)
* Fix torch.jit.script and pickling issues

* Fix get_attr issues

* Fix import in function

* Fix GPT-J and T5 tracing for torch=1.11

* Gate graph surgery on torch version

* Modeling minor changes to enable TorchScripting

* Model serialization / deserialization test

* Remove _assert_is_none users
2022-05-23 17:50:40 +02:00
1cd01b0af3 Fix Comet ML integration (#17381)
Callback function `on_train_end` crashed if Comet ML integration was
used but `COMET_MODE` set to `DISABLE`
2022-05-23 10:43:10 -04:00
c86aad6110 Fix cvt docstrings (#17367) 2022-05-23 16:11:09 +02:00
7b8cb26953 Correct & Improve Doctests for LayoutLMv2 (#17168)
* add inference example to LayoutLMv2ForQuestionAnswering, passing doctest

* add loss example to LayoutLMv2ForQuestionAnswering, passing doctest

* Add correct doctest for LayoutLMv2ForTokenClassification, passing doctest

* add correct doctest for LayoutLMv2ForSequenceClassification, passing test

* add correct doctest for LayoutLMv2Model, passing test

* make fixup

* fix to address review comments

* make style

* fix doctest line break issue, add to documentaiton_tests.txt, address review comments

* move comment about layoutlmv2 dependencies to the doc page

* format doc page as suggested

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* delete extraneous backtick

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-05-23 08:02:31 -04:00
b48ac1a094 Fix CodeParrot training script (#17291)
* average loss over batches and accumulated steps for tracking

* fix layernorm weight decay

* use AdamW from Pytorch instead of Transformers

* add shuffling of sequences inside the batches

* add shuffling of sequences inside the batches

* add logging dir and reformat code

* fix lr tracking

* remove Mistral scaling

* keep Mistral scaling

* reformat code

* fix error

* fix error

* use shuffling function from Pytorch

* remove argument for shuffling batch sequences as it isn't optional

* update package versions and install accelerate from source

* remove unused package

* Update loss average over accumulated steps

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* Update loss average over accumulated steps

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* use one shuffle buffer argument

* compute avg_loss in one line

Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
2022-05-23 12:55:35 +02:00
b9bb417324 Fix a typo relative_postion_if_large -> relative_position_if_large (#17366) 2022-05-20 18:41:12 +02:00
3fd7de49f4 Pin dill to fix examples (#17368)
* Pin dill for now

* Try this version?

* force install

* Actually use dep in testing

* Try a larger pin
2022-05-20 11:00:58 -04:00
54192058f3 [Test OPT] Add batch generation test opt (#17359)
* up

* up
2022-05-19 23:46:26 +02:00
48c22691e3 Fix bug in Wav2Vec2 pretrain example (#17326) 2022-05-19 22:42:44 +02:00
5d6feecf16 fix for 17292 (#17293) 2022-05-19 22:21:19 +02:00
518bd02c9b [Generation] Fix Transition probs (#17311)
* [Draft] fix transition probs

* up

* up

* up

* make it work

* fix

* finish

* update
2022-05-19 22:17:02 +02:00
e8714c0307 [OPT] Run test in lower precision on GPU (#17353)
* [OPT] Run test only in half precision

* up

* up

* up

* up

* finish

* fix on GPU

* Update tests/models/opt/test_modeling_opt.py
2022-05-19 22:15:36 +02:00
2b282296f1 Adding batch_size test to QA pipeline. (#17330) 2022-05-19 14:28:12 -04:00
a4386d7e40 [BC] Fixing usage of text pairs (#17324)
* [BC] Fixing usage of text pairs

The BC is actually preventing users from misusing the pipeline since
users could have been willing to send text pairs and the pipeline would
instead understand the thing as a batch returning bogus results.

The correct usage of text pairs is preserved in this PR even when that
makes the code clunky.

Adds support for {"text":..,, "text_pair": ...} inputs for both dataset
iteration and more explicit usage to pairs.

* Updating the doc.

* Update src/transformers/pipelines/text_classification.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/text_classification.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_text_classification.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* quality.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2022-05-19 10:29:16 +02:00
3601aa8fc9 [tests] fix copy-n-paste error (#17312)
* [tests] fix copy-n-paste error

* fix
2022-05-18 16:00:47 -07:00
1b20c970a2 Fix ci_url might be None (#17332)
* fix

* Update utils/notification_service.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2022-05-18 21:49:08 +02:00
6aad3872ce fix (#17337)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-18 15:26:44 -04:00
1762ded30a Fix metric calculation in examples and setup tests to run on multi-gpu for no_trainer scripts (#17331)
* Fix length in no_trainer examples

* Add setup and teardown

* Use new accelerator config generator to automatically make tests able to run based on environment
2022-05-18 14:17:40 -04:00
6e195eb9de docs for typical decoding (#17186)
Co-authored-by: Jader Martins <jadermcs94@gmail.com>
2022-05-18 19:18:43 +02:00
060fe61dff Not send successful report (#17329)
* send report only if there is any failure

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-18 19:07:48 +02:00
b3b9f99ed2 Fix test_t5_decoder_model_past_large_inputs (#17320)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-18 17:57:23 +02:00
6da76b9c2a Add onnx export cuda support (#17183)
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2022-05-18 17:52:13 +02:00
adc0ff2502 Add CvT (#17299)
* Adding cvt files

* Adding cvt files

* changes in init file

* Adding cvt files

* changes in init file

* Style fixes

* Address comments from code review

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Format lists in docstring

* Fix copies

* Apply suggestion from code review

Co-authored-by: AnugunjNaman <anugunjjha@gmail.com>
Co-authored-by: Ayushman Singh <singhayushman13@protonmail.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-05-18 17:47:18 +02:00
4710702837 Fix style 2022-05-18 10:46:40 -04:00
5fdb54ece7 Add Information Gain Filtration algorithm (#16953)
* Add information gain filtration algorithm

* Complying with black requirements

* Added author

* Fixed import order

* flake8 corrections

Co-authored-by: Javier Turek <javier.turek@intel.com>
2022-05-18 10:39:02 -04:00
91ede485a7 Fix typo (#17328) 2022-05-18 10:29:53 -04:00
fe28eb9452 remove (#17325)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-18 10:06:41 -04:00
2cb2ea3fa1 Accepting real pytorch device as arguments. (#17318)
* Accepting real pytorch device as arguments.

* is_torch_available.
2022-05-18 10:06:24 -04:00
1c9d1f4ca8 Updating the docs for max_seq_len in QA pipeline (#17316) 2022-05-18 15:46:12 +02:00
60ad73448c [T5] Fix init in TF and Flax for pretraining (#17294)
* fix init

* Apply suggestions from code review

* fix

* finish

* Update src/transformers/modeling_tf_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-05-18 15:08:56 +02:00
7ba1d4e51f Add type hints for ProphetNet (Pytorch) (#17223)
* added type hints to prophetnet

* reformatted with black

* fix bc black misformatted some parts

* fix imports

* fix imports

* Update src/transformers/models/prophetnet/configuration_prophetnet.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* update OPTIONAL type hint and docstring

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2022-05-18 13:23:47 +01:00
d6b8e9cec7 Add trajectory transformer (#17141)
* Add trajectory transformer


Fix model init


Fix end of lines for .mdx files

Add trajectory transformer model to toctree

Add forward input docs

Fix docs, remove prints, simplify prediction test

Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Update docs, more descriptive comments

Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Update readme

Small comment update and add conversion script

Rebase and reformat

Fix copies

Fix rebase, remove duplicates

Fix rebase, remove duplicates

* Remove tapex

* Remove tapex

* Remove tapex
2022-05-17 19:07:43 -04:00
c35264007b fix (#17310) 2022-05-17 18:34:31 -04:00
d9050dc768 [LED] fix global_attention_mask not being passed for generation and docs clarification about grad checkpointing (#17112)
* [LED] fixed global_attention_mask not passed for generation + docs clarification for gradient checkpointing

* LED docs clarification

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* [LED] gradient_checkpointing=True should be passed to TrainingArguments

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* [LED] docs: remove wrong word

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* [LED] docs fix typo

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-05-17 23:44:37 +02:00
bad358398a Add support for pretraining recurring span selection to Splinter (#17247)
* Add SplinterForSpanSelection for pre-training recurring span selection.

* Formatting.

* Rename SplinterForSpanSelection to SplinterForPreTraining.

* Ensure repo consistency

* Fixup changes

* Address SplinterForPreTraining PR comments

* Incorporate feedback and derive multiple question tokens per example.

* Update src/transformers/models/splinter/modeling_splinter.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/splinter/modeling_splinter.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Jean Vancoppenole <jean.vancoppenolle@retresco.de>
Co-authored-by: Tobias Günther <tobias.guenther@retresco.de>
Co-authored-by: Tobias Günther <github@tobigue.de>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-05-17 23:42:14 +02:00
0511305549 Add PR author in CI report + merged by info (#17298)
* Add author info to CI report

* Add merged by info

* update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-17 12:56:58 -04:00
032d63b976 Fix dummy creation script (#17304) 2022-05-17 12:56:24 -04:00
986dd5c5bf Fix style 2022-05-17 12:50:14 -04:00
38ddab10da Doctest longformer (#16441)
* Add initial doctring changes

* make fixup

* Add TF doc changes

* fix seq classifier output

* fix quality errors

* t

* swithc head to random init

* Fix expected outputs

* Update src/transformers/models/longformer/modeling_longformer.py

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2022-05-17 18:32:12 +02:00
10704e1209 [Test] Fix W2V-Conformer integration test (#17303)
* [Test] Fix W2V-Conformer integration test

* correct w2v2

* up
2022-05-17 18:20:36 +02:00
28a0811652 Improve mismatched sizes management when loading a pretrained model (#17257)
- Add --ignore_mismatched_sizes argument to classification examples

- Expand the error message when loading a model whose head dimensions are different from expected dimensions
2022-05-17 17:58:14 +02:00
1f13ba818e correct opt (#17301) 2022-05-17 15:48:23 +02:00
349f1c85d3 Rewrite TensorFlow train_step and test_step (#17057)
* Initial commit

* Better label renaming

* Remove breakpoint before pushing (this is your job)

* Test a lot more in the Keras fit() test

* make fixup

* Clarify the case where we flatten y dicts into tensors

* Clarify the case where we flatten y dicts into tensors

* Extract label name remapping to a method
2022-05-17 14:36:23 +01:00
651e48e1e5 Fix tests of mixed precision now that experimental is deprecated (#17300)
* Fix tests of mixed precision now that experimental is deprecated

* Fix mixed precision in training_args_tf.py too
2022-05-17 14:14:17 +01:00
6d211429ec fix retribert's test_torch_encode_plus_sent_to_model (#17231) 2022-05-17 14:33:13 +02:00
ec7f8af106 [ConvNeXT] Fix drop_path_rate (#17280)
* Fix drop_path_rate

* Fix TF's drop path rate
2022-05-17 07:37:48 -04:00
a26ab95e30 Fix wrong PT/TF categories in CI report (#17272)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-17 09:32:47 +02:00
1ac2b8fa7f Fix missing job action button in CI report (#17270)
* use matrix.machine_type

* fix job names used in job_link

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-17 08:31:06 +02:00
5a9957358c Add Wav2Vec2Conformer (#16812)
* save intermediate

* add wav2vec2 conformer

* add more code

* more

* first test passes

* make all checkpoints work

* update

* up

* more clean ups

* save clean-up

* save clean-up

* save more

* remove bogus

* finalize design conformer

* remove vision

* finish all tests

* more changes

* finish code

* add doc tests

* add slow tests

* fix autoconfig test

* up

* correct docstring

* up

* update

* fix

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Update docs/source/en/model_doc/wav2vec2-conformer.mdx

* upload

* save copied from

* correct configs

* fix model outputs

* add to docs

* fix imports

* finish

* finish code

* correct copied from

* correct again

* correct make fix

* improve make fix copies

* save

* correct fix copy from

* correct init structure

* correct

* fix import

* apply suggestions

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
2022-05-17 00:43:16 +02:00
f0395cf58e Fix test_model_parallelization (#17249)
* Fix test_model_parallelization

* Modify
2022-05-16 23:30:49 +02:00
e705e1267c [Tests] Fix slow opt tests (#17282)
* fix opt tests

* remove unused tok

* make style

* make flake8 happy

* Update tests/models/opt/test_modeling_opt.py
2022-05-16 23:24:20 +02:00
f6a6388972 Add Tensorflow Swin model (#16988)
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-05-16 22:19:53 +01:00
6cb7187324 docs(transformers): fix typo (#17263) 2022-05-16 17:04:30 -04:00
053a80c606 logging documentation update (#17174)
* logging documentation

* style

Co-authored-by: Sander Land <sander@chatdesk.com>
2022-05-16 16:47:28 -04:00
8600d770d4 Use the PR URL in CI report (#17269)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-16 22:02:28 +02:00
3fb82f74fd Fix FlavaForPreTrainingIntegrationTest CI test (#17232)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-16 21:14:25 +02:00
9b0d2860eb Better error in the Auto API when a dep is missing (#17289) 2022-05-16 14:55:46 -04:00
66b3e106a1 Make TrainerHyperParameterSigOptIntegrationTest slow test (#17288)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-16 14:18:09 -04:00
ddb1a47ec8 Automatically sort auto mappings (#17250)
* Automatically sort auto mappings

* Better class extraction

* Some auto class magic

* Adapt test and underlying behavior

* Remove re-used config

* Quality
2022-05-16 13:24:20 -04:00
2f611f85e2 Mlflowcallback fix nonetype error (#17171)
* Fix edge cases TypeError: 'NoneType' object is not callable

* fix style
2022-05-16 12:18:30 -04:00
95b6bef624 Align logits and labels in OPT (#17237) 2022-05-16 09:37:39 -04:00
a5d1839679 Remove next sentence prediction from supported ONNX tasks (#17276) 2022-05-16 15:34:04 +02:00
05a90579a8 CodeParrot data pretokenization (#16932)
* add pretokenization arguments

* add pretokenization script

* add support for pretokenized data

* reformat code

* fix run command for training

* fix model call from config

* remove a package

* add comments on pretokenization in the readme

* remove explicit parallelization

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* update readme

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* update readme -remove username

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* update readme -remove username

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* keep data parallelization

* reformat code

* reformat code

* update readme

* reformat code

* Update examples/research_projects/codeparrot/README.md

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com>
2022-05-16 15:32:16 +02:00
e730e12567 Update codeparrot data preprocessing (#16944)
* add new preprocessing arguments

* add new filters

* add new filters to readme

* fix config and test count, update function names and docstrings

* reformat code

* update readme

* Update readme

* rename config_test filter

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* rename few_assignments filter

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* rename tokenizer in arguments

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* rename functions and add limit_line argument for config_test filter

* update threshold for config_test filter

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com>
2022-05-16 14:43:25 +02:00
518dd1277e Updated checkpoint support for Sagemaker Model Parallel (#17219)
* adding partial checkpoint support for optimizer state

* formatted trainer.py

* Refactoring based on comments

* reformatting

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/trainer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Cavdar <dcavdar@a07817b12d7e.ant.amazon.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-05-16 08:17:25 -04:00
71d18d0831 fixed bug in run_mlm_flax_stream.py (#17203)
* fixed bug run_mlm_flax_stream.py

Fixed bug caused by an update to tokenizer keys introduced in recent transformers versions (between `4.6.2` and `4.18.0`) where additional keys were introduced to the tokenizer output.

* Update run_mlm_flax_stream.py

* adding missing paranthesis

* formatted to black

* remove cols from dataset instead

* reformat to black

* moved rem. columns to map

* formatted to black

Co-authored-by: KennethEnevoldsen <kennethcenevolsen@gmail.com>
2022-05-16 13:40:27 +02:00
71abd3ade1 [WIP] [doc] performance/scalability revamp (#15723)
* [doc] performance/scalability revamp

* link the new docs

* no :

* mixed precision

* work on the first doc

* expand the main doc

* Trigger CI

* style

* revamp single GPU training section

* work on training performance

* remove files not used anymore or will be added later

* final touches

* fix rebase

* Add hardware section to toctree

* fix toctree again

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove `fast_tokenizers` entry that was copied in rebase

* add warning about DP vs DDP

* remove todo

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix missing closure of codeblock

* Update docs/source/en/perf_train_gpu_many.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* sync with #16860

* update toc

Co-authored-by: leandro <leandro.vonwerra@spoud.io>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-05-16 13:36:41 +02:00
d3d87b451e TF - Fix convnext classification example (#17261) 2022-05-16 12:24:01 +01:00
e86faecfd4 Fix obvious typos in flax decoder impl (#17279)
Change config.encoder_ffn_dim -> config.decoder_ffn_dim for decoder.
2022-05-16 13:08:04 +02:00
ee393c009a Guide to create custom models in Spanish (#17158)
* file copied and toctree updated

* Intro and configuration translated

* model section translated

* enter hotfix

* Translation over, correction pending

* Typos and corrections

* Update docs/source/es/create_a_model.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/create_a_model.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/create_a_model.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/create_a_model.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
2022-05-13 16:19:29 -04:00
16be422912 Translated version of model_sharing.mdx doc to spanish (#16184)
* Translated version of model_sharing to spanish

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Update docs/source_es/model_sharing.mdx

* Addind model sharing to _toctree.yml

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
2022-05-13 16:18:46 -04:00
f9024814e1 [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial (#17076)
* [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial

* Delete docs/source/pt-br directory

* [ fast_tokenizers.mdx ] - Continuing work on file

* [ fast_tokenizers.mdx ] - Continuing work on file

* Add fast tokenizers to _toctree.yml

* Eliminated config and toctree.yml

* Nits in fast_tokenizers.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
2022-05-13 16:18:14 -04:00
50d1867cf8 Add PR title to push CI report (#17246)
* add PR title to push CI report

* add link

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-13 21:50:40 +02:00
506899d147 Fix push CI channel (#17242)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-13 20:59:56 +02:00
7198b63362 install dev. version of accelerate (#17243)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-13 13:47:09 -04:00
b96cb1693f Fix Trainer for Datasets that don't have dict items (#17239) 2022-05-13 11:49:23 -04:00
9c8fde8e19 Handle copyright in add-new-model-like (#17218) 2022-05-13 11:47:19 -04:00
993553b2f1 fix --gpus option for docker (#17235)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-13 17:26:26 +02:00
38043d8453 Update self-push workflow (#17177)
* update push ci

* install git-python

* update comment

* update deepspeed jobs

* fix report

* skip 2 more tests that require fairscale

* Fix changes in test_fetcher.py (to deal with `setup.py` is changed)

* set RUN_PT_TF_CROSS_TESTS=1 and final clean-up

* remove SIGOPT_API_TOKEN

* remove echo "$matrix_folders"

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-13 16:28:00 +02:00
18d6b356c5 OPT - fix docstring and improve tests slighly (#17228)
* correct some stuff

* fix doc tests

* make style
2022-05-13 15:14:50 +02:00
dfc76018c1 OPT-fix (#17229)
* try fixes

* Revert "try fixes"

This reverts commit a8ad75ef69d4fc03a402ef61bd034b018aa8555e.

* add correct shape

* add correct path
2022-05-13 15:14:23 +02:00
85fc455972 Added translation of installation.mdx to Portuguese Issue #16824 (#16979)
* Added translation of installation.mdx to Portuguese, as well
as default templates of _toctree.yml and _config.py

* [ build_documentation.yml ] - Updated doc_builder to build
documentation in Portuguese.
[ pipeline_tutorial.mdx ] - Created translation for the pipeline_tutorial.mdx.

* [ build_pr_documentation.yml ] - Added pt language to pr_documentation builder.

[ pipeline_tutorial.mdx ] - Grammar changes.

* [ accelerate.mdx ] - Translated to Portuguese the acceleration tutorial.

* [ multilingual.mdx ] - Added portuguese translation for multilingual tutorial.

[ training.mdx ] - Added portuguese translation for training tutorial.

* [ preprocessing.mdx ] - WIP

* Update _toctree.yml

* Adding Pré-processamento to _toctree.yml

* Update accelerate.mdx

* Nits and eliminate preprocessing file while it is ready

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
2022-05-13 07:55:44 -04:00
3f936df662 Fix typo in bug report template (#17178)
* Fix typo

* Force rerun workflows

Co-authored-by: Felix Marty <felix@huggingface.co>
2022-05-12 16:31:12 -04:00
afe5d42d8d Black preview (#17217)
* Black preview

* Fixup too!

* Fix check copies

* Use the same version as the CI

* Bump black
2022-05-12 16:25:55 -04:00
9bd67ac7bb update BART docs (#17212) 2022-05-12 19:25:16 +01:00
30be0da5da Fix dependency table 2022-05-12 11:29:32 -04:00
f04257fdbc Add test to ensure models can take int64 inputs (#17210)
* Add test to ensure models can take int64 inputs

* is_integer is an attribute, not a method

* Fix test when some inputs aren't tensors

* Add casts to blenderbot and blenderbot-small

* Add casts to the other failing models
2022-05-12 16:09:25 +01:00
5294fa12ee Dev version 2022-05-12 11:04:23 -04:00
9f16a1cc13 Update data2vec.mdx to include a Colab Notebook link (that shows fine-tuning) (#17194)
* Update data2vec.mdx

* Update data2vec.mdx

* Update docs/source/en/model_doc/data2vec.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-05-12 10:22:00 -04:00
a42242da7c migrate azure blob for beit checkpoints (#16902)
## Motivation

We are going to use a new blob account to store the checkpoints.

## Modification

Modify the azure blob storage URLs for BEiT checkpoints.
2022-05-12 13:08:15 +02:00
b971c769e8 Add OPT (#17088)
* First version - OPT model

* Final changes

- putting use cache to False

* few changes

- remove commented block

* few changes

- remove unecessary files

* fix style issues

* few changes

- remove a test file
- added the logits test

* Update src/transformers/models/auto/tokenization_auto.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* add gen tests

* few changes

- rm mask filling example on docstring

* few changes

- remove useless args

* some changes

- more tests should pass now
- needs to clean more
- documentation still needs to be done

* fix code quality

* major changes

- change attention architecture to BART-like
- modify some tests
- style fix

* rm useless classes

- remove opt for:
- QA
- cond generation
- seq classif

* Removed autodoc calls to non-existant classes

TOkenizers are not implemented

* Update src/transformers/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/auto/modeling_tf_auto.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Replaced OPTTokeniser with GPT2 tokenizer

* added GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokenizer")

* Removed OPTTokenizer

* make style

* Make style replaces

``` ...).unsqueeze(```
by
``` >>>).unsqueeze(```

* make repo consistency

* Removed PretrainedOPTModel

* fix opt.mdx removed other heads

* fix init, removed 3 heads

* removed heads

* finished cleaning head

* removed seauence classif and question answering

* removed unused imports

* removed useless dummy object for QA, SC and CG

* removed tests for removed useless dummy object for QA, SC and CG

* Removed head_mask using encoder layers which don't exist

* fixed test

* fix line

* added OPT to toctree

* Updated model path with pushed weigths

* fix model path

* fixed code quality

* fixed embeddings and generation tests

* update paths

* clean comments

* removed OPTClassificationHead for sentence classification

* renamed hidden layer

* renamed num layers to standard num_hidden_layers

* num_attention_heads fix

* changes for 125m

* add first version for 125m

* add first version - flax

* add new version

* causal LM output

* replace output type with BaseModelOutputWithPastAndCrossAttentions

* revert working config from 150m to 350m

* clean

* removed decoder input ids

* fixed embed dim

* more embed_dim issues

* make style + removed enc_dec test

* update falx model

* removed troublesome copy

* added is_encoder_decoder=False to config

* added set_input emb fuinction to model class

* requires torch on embed test

* use head mask instead of decoder head mask input param solves a test

* 8 test remaining, update

* Updated create_and_check_decoder_model_past_large_inputs

* Make style

* update op tokenizer with condition

* make style

* See if I can push

* some clean up

* remove linear head hack

* save intermediate

* save correct attention

* add copied from from bart

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix part of the reviewss
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* same changes in naming / conversion

* correct mask

* more fixes

* delete FlaxOPT and TfOPT

* clean traces of Flax and Tf

* fix mask

* fixed positionnal embedding length when past key value is provoded

* get 125m, 6.7b to work

* Added do_layer_norm

* solved mismatch in load dictionnary

* clean up preapre opt input dict

* fixed past key value as bool

* fix previus

* fixed return dict False tuple issue

* All tests are passing

* Make style

* Ignore OPTDecoder non tested

* make fix-copies

* make repo consistency

* small fix

* removed uselss @torch.no_grad decorator

* make styl;e

* fix previous opt test

* style

* make style

* added opt documentation

* update OPT_PRETRAINED_MODEL_ARCHIVE_LIST

* up

* more fixes

* model & config work

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* added comment on padding hack (+2)

* cleaup

* review update

* docstring for missing arg

* Update docs/source/en/model_doc/opt.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docs/source/en/model_doc/opt.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docs/source/en/model_doc/opt.mdx

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/opt/__init__.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* update pretrained map

* update path and tests

* make style

* styling

* make consistency

* add gpt2 tok new

* more tok fixes

* Update src/transformers/models/auto/tokenization_auto.py

* Update docs/source/en/model_doc/opt.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/model_doc/opt.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/en/model_doc/opt.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/models/opt/test_modeling_opt.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/opt/modeling_opt.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update based on reviews

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* make style

* make tokenizer auto tests pass

* apply Lysandre suggestion

* finish tests

* add some good tokenizer tests

* improve docs slighly

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2022-05-12 12:24:35 +02:00
8c7481f35c ViT and Swin symbolic tracing with torch.fx (#17182)
* Support tracing for ViT

* Swin support

* Fix copies

* Fix type annotation issue

* Removed unused import
2022-05-12 10:42:27 +02:00
1a688709b3 Fix contents in index.mdx to match docs' sidebar (#17198)
* Fix contents in index.mdx to match docs' sidebar

* Eliminates api section from contents
2022-05-12 02:37:13 -05:00
b17b78897b Fix style error in Spanish docs (#17197) 2022-05-12 08:51:46 +02:00
1a66a6c677 Translate index.mdx (to ES) and add Spanish models to quicktour.mdx examples (#16685)
* Change nits in Spanish for quicktour.mdx

- Add tasks names in English too.
- Fix small nits in Spanish

* Translate index.mdx to Spanish

* Translate body of index.
* Translated the compatible models list (not the papers´ names). Since this should not be updated manually, I can come back to the original text.

* Add models and a  dataset for Spanish in the code exmaples

* Replaced the English models to Spanish versions.

* Add index to _toctree.yml and fix Spanish

* Fix double ““ error

* Change negative example in ASR example

* make style

* Debug style in quicktour.mdx
2022-05-11 23:35:07 -05:00
e2d678b71c Documentation: Spanish translation of fast_tokenizers.mdx (#16882)
* Spanish translation of fast_tokenizers.mdx

* add fast_tokenizers to the spanish _toctree.yml

* Update docs/source/es/fast_tokenizers.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/fast_tokenizers.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/fast_tokenizers.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/fast_tokenizers.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/fast_tokenizers.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/fast_tokenizers.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
2022-05-11 22:25:44 -05:00
ae82da2181 Added es version of language_modeling.mdx doc (#17021)
* Spanish version of language_modeling.mdx doc file

* modification to toctree.yml file

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/language_modeling.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Correct position of Guías conceptuales

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
2022-05-11 22:04:56 -05:00
36ddcc0d35 Spanish translation of philosophy.mdx #15947 (#16922)
* adding philosophy.mdx translation to Spanish

* adding philosophy.mdx translation to Spanish

* Update docs/source/es/philosophy.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/philosophy.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/philosophy.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/philosophy.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/philosophy.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/philosophy.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/philosophy.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/philosophy.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/philosophy.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* Update docs/source/es/philosophy.mdx

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>

* philosophy translation to Spanish

* Update _toctree.yml

* Update _toctree.yml

* nits

Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
2022-05-11 20:47:50 -05:00
d1d5ebb16c Remove duplicated os.path.join (#17192) 2022-05-11 20:28:32 -04:00
a10f61834d [feat] Add FLAVA model (#16654)
* [WIP] Add FLAVA model

This PR aims to add [FLAVA](ihttps://arxiv.org/abs/2112.04482) model to the transformers repo.

Following checklist delineates the list of things to be done for this PR
to be complete:

[x] Flava init
[x] Flava base models
[x] Flava layers
[x] Flava Configs
[x] Flava encoders
[x] Flava pretraining models
[ ] Flava classification/retrieval models (To be added in a separate PR)
[x] Documentation updates 
[x] Imports updates 
[x] Argstring updates
[x] Flava pretrained checkpoints 
[x] Flava tests
[x] Flava processors 
[x] Sanity check
[x] Lint
2022-05-11 14:56:48 -07:00
7b95825d7d Remove columns before passing to data collator (#17187) 2022-05-11 15:58:32 -04:00
934e21cd4b add shift_tokens_right in FlaxMT5 (#17188) 2022-05-11 20:31:41 +01:00
47412c7d43 Ensure tensors are at least 1d for pad and concat (#17179)
* Ensure tensors are at least 1d for pad and concat

* Compatibility

* Fix

* Fix

* Add test

* Retrigger CI

* Consistency with master

* Retrigger CI
2022-05-11 13:19:08 -04:00
c76afa511c Fix LED documentation (#17181)
* Fix markdown code block

* Use consistent spelling for self-attention

* Fix typos and phrasing

* Fix code style
2022-05-11 13:17:50 -04:00
edcc66d27c Remove unnecessary columns for all dataset types in Trainer (#17166)
* Remove unneeded columns for IterableDataset

* Add test

* Update trainer tests

* Edit docstring

* Lint

* Apply feedback

* Apply feedback
2022-05-11 11:11:26 -04:00
c33f6046c3 [WIP] Enable reproducibility for distributed trainings (#16907)
* add seed worker and set_deterministic_seed_for_cuda function to enforce reproducability

* change function name to enable determinism, add docstrings, reproducability support for tf

* change function name to enable_determinism_for_distributed_training

* revert changes in set_seed and call set_seed within enable_full_determinism

* add one position argument for seed_worker function

* add full_determinism flag in training args and call enable_full_determinism when it is true

* add enable_full_determinism to documentation

* apply make fixup after the last commit

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-05-11 09:37:13 -04:00
5229744b26 Add missing RetriBERT tokenizer tests (#17017)
* Create RetriBERT tests folder

* Add missing RetriBERT tokenizer test file

* Apply style corrections

* Add non-english filter

* Update tests/retribert/test_tokenization_retribert.py

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* Update tests/retribert/test_tokenization_retribert.py

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* Move test files to new directory

* Update import path for testing utils to new test file structure

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
2022-05-11 15:04:07 +02:00
6bc6797e04 Convert image to rgb for clip model (#17101)
Co-authored-by: kuanwee.heng <kuanwee.heng@aaqua.live>
2022-05-11 13:09:54 +01:00
0a2bea4752 Fix repo consistency 2022-05-11 08:05:45 -04:00
0645b07daf propagate "attention_mask" dtype for "use_past" in OnnxConfig.generate_dummy_inputs (#17105)
* propagate attention_mask dtype

* fixup&style
2022-05-11 07:50:35 -04:00
0e6ec2a469 Extend Transformers Trainer Class to Enable PyTorch SGD/Adagrad Optimizers for Training (#17154)
* add torch SGD and Adagrad optimizer bits

* refine naming

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-05-11 07:24:11 -04:00
63517fdf48 [M2M100 doc] remove duplicate example (#17175)
* remove duplicate example

* remove code block
2022-05-11 12:16:46 +01:00
4a419d4995 MobileBERT tokenizer tests (#16896)
* unhardcode pretrained model path, make it a class var

* add tests for mobilebert tokenizer

* allow tempfiles for vocab & merge similarity test to autodelete

* add explanatory comments

* remove unused imports, let make style do its.. thing

* remove inheritance and use BERT tok tests for MobileBERT

* Update tests/mobilebert/test_tokenization_mobilebert.py

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* amend class names, remove unused import, add fix for mobilebert's hub pathname

* unhardcode pretrained model path, make it a class var

* add tests for mobilebert tokenizer

* allow tempfiles for vocab & merge similarity test to autodelete

* add explanatory comments

* remove unused imports, let make style do its.. thing

* remove inheritance and use BERT tok tests for MobileBERT

* Update tests/mobilebert/test_tokenization_mobilebert.py

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* amend class names, remove unused import, add fix for mobilebert's hub pathname

* amend paths for model tests being in models/ subdir of /tests

* explicitly rm test from prev path

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
2022-05-10 16:39:58 -04:00
48a8f3daa1 Add DebertaV2ForMultipleChoice (#17135) 2022-05-10 16:21:44 -04:00
4ad2f68e34 Fix template init (#17163) 2022-05-10 15:24:23 -04:00
e99f0efedc Add MLFLOW_FLATTEN_PARAMS support in MLflowCallback (#17148)
* add support for MLFLOW_FLATTEN_PARAMS

* ensure key is str

* fix style and update warning msg

* Empty commit to trigger CI

* fix bug in check_inits.py

* add unittest for flatten_dict utils

* fix 'NoneType' object is not callable on __del__

* add generic flatten_dict unittest to SPECIAL_MODULE_TO_TEST_MAP

* fix style
2022-05-10 14:29:18 -04:00
976835d515 missing file (#17164) 2022-05-10 10:19:50 -07:00
259eeb6dab Fixing the output of code examples in the preprocessing chapter (#17162) 2022-05-10 12:16:28 -04:00
f861504466 [Deepspeed] add many more models to the model zoo test (#12695)
* model zoo take 2

* add deberta

* new param for zero2

* doc update

* doc update

* add layoutlm

* bump deepspeed

* add deberta-v2, funnel, longformer

* new models

* style

* add t5_v1

* update TAPAS status

* reorg problematic models

* move doc to another PR

* style

* fix checkpoint check test

* making progress on more models running

* cleanup

* new version

* cleanup
2022-05-10 08:22:42 -07:00
9aeacfe0ff [trainer] sharded _load_best_model (#17150)
* [trainer] sharded _load_best_model

probably needs a test?

* undo delete
2022-05-10 07:58:53 -07:00
1766fa2159 train args defaulting None marked as Optional (#17156)
Co-authored-by: Dom Miketa <dmiketa@exscientia.co.uk>
2022-05-10 10:09:34 -04:00
6d80c92c77 LogSumExp trick question_answering pipeline. (#17143)
* LogSumExp trick `question_answering` pipeline.

* Adding a failing test.
2022-05-10 10:03:55 +02:00
d719bcd46a Fix all docs for accelerate install directions (#17145) 2022-05-09 15:45:18 -04:00
766d4bf792 Fix MLflowCallback end_run() and add support for tags and nested runs (#17130)
* ensure mlflow.end_run() is executed at end of training when mlflow.start_run() was executed by the callback

* add debug msg

* add support for MLFLOW_TAGS, MLFLOW_RUN_ID, and MLFLOW_NESTED_RUN

* update to support python 3.6+

* Validate env variables using ENV_VARS_TRUE_VALUES

* Empty-Commit
2022-05-09 13:09:48 -04:00
2fbb237967 Add the auto_find_batch_size capability from Accelerate into Trainer (#17068)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

- Adds auto_batch_size finder 
- Moves training loop to an inner training loop
2022-05-09 12:29:18 -04:00
df735d1317 [WIP] Fix Pyright static type checking by replacing if-else imports with try-except (#16578)
* rebase and isort

* modify cookiecutter init

* fix cookiecutter auto imports

* fix clean_frameworks_in_init

* fix add_model_to_main_init

* blackify

* replace unnecessary f-strings

* update yolos imports

* fix roberta import bug

* fix yolos missing dependency

* fix add_model_like and cookiecutter bug

* fix repository consistency error

* modify cookiecutter, fix add_new_model_like

* remove stale line

Co-authored-by: Dom Miketa <dmiketa@exscientia.co.uk>
2022-05-09 11:28:53 -04:00
7783fa6bb3 Fix quality and repo consistency 2022-05-09 11:14:36 -04:00
05fc1766ff PyTorch FSDP integration in Trainer (#17136)
* PyTorch FSDP integration in Trainer

* reformatting

make style and make quality are now compliant.

* Updating dependency check

* Trigger CI

Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
2022-05-09 20:40:56 +05:30
dc3645dc9c add mobilebert onnx configs (#17029)
* update docs of length_penalty

* Revert "update docs of length_penalty"

This reverts commit 466bf4800b75ec29bd2ff75bad8e8973bd98d01c.

* add mobilebert onnx config

* address suggestions

* Update auto.mdx

* Update __init__.py

* Update features.py
2022-05-09 10:36:53 -04:00
a021f2b90c Add type hints for BigBirdPegasus and Data2VecText PyTorch models (#17123)
* Add type hints for remaining BigBirdPegasus models

Here I added type hints to the BigBirdPegasusForCausalLM class.

* Add missing type hints for Data2VecText models

Added type hints to the Data2VecTextForCausalLM, Data2VecTextForMaskedLM,
Data2VecTextForMultipleChoice, Data2VecTextForQuestionAnswering,
Data2VecTextForSequenceClassification, and
Data2VecTextForTokenClassification classes.
2022-05-09 12:45:43 +01:00
e9fd583ce0 LayoutLMv2Processor: ensure 1-to-1 mapping between images and samples in case of overflowing tokens (#17092)
* add get_overflowing_images function to ensure 1-to-1 mapping between samples and images in LayoutLMv2Processor

* make style

* add test for overflowing_tokens, change assert to ValueError, avoiding unrelated formatting changes

* change line length by passing --preview into black
2022-05-09 07:39:08 -04:00
3212afa614 split single_gpu and multi_gpu (#17083)
* split single_gpu and multi_gpu

* update needs in send_result

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-09 07:13:07 -04:00
215e0681e4 Added BigBirdPegasus onnx config (#17104)
* Add onnx configuration for bigbird-pegasus

* Modify docs
2022-05-06 17:31:00 +02:00
351cdbdfdc Fix self-push CI report path in cat (#17111)
* fix report cat path

* fix report cat path

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-06 07:45:17 -07:00
cad61b6839 Fix link to example scripts (#17103) 2022-05-05 15:20:27 -05:00
a59eb349c5 fix missing "models" in pipeline test module (#17090)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-05 16:12:01 +02:00
dd16a113a4 Remove torchhub test (#17097) 2022-05-05 10:02:47 -04:00
c849a61e65 Fix MLflowCallback and add support for MLFLOW_EXPERIMENT_NAME (#17091)
* Fix use of mlflow.active_run() and add proper support for MLFLOW_EXPERIMENT_NAME

* Fix code style (make style)
2022-05-05 09:49:55 -04:00
99289c08a1 Add type hints for BERTGeneration (#17047)
Added type hints for the BERTGenerationEncoder and BERTGenerationDecoder
classes.
2022-05-05 12:22:46 +01:00
45360e1a8e type hints for pytorch models (#17064)
* type hints for pytorch models

* fixed import error

* fixed some errors
2022-05-05 12:21:17 +01:00
db377a0b37 Added spanish translation of autoclass_tutorial. (#17069)
* Added spanish translation of autoclass_tutorial.
Added 'local' and 'title' fields for autoclass_tutorial.

* Fixed autoclass_tutorial title in _toctree.yml and autoclass_tutorial.mdx
2022-05-04 14:18:24 -05:00
6dc4c36acb minor change on TF Data2Vec test (#17085)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-04 18:39:30 +02:00
23619ef6b7 📝 open fresh PR for pipeline doctests (#17073) 2022-05-04 11:30:34 -05:00
870e6f29a6 Fix DeBERTa token_type_ids (#17082) 2022-05-04 18:23:37 +02:00
279bc5849b Allow saved_model export of TFCLIPModel in save_pretrained (#16886)
* CLIP Serving

* Add type hints per code review

* Use black, flake8, and isort

* Update src/transformers/models/clip/modeling_tf_clip.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Rollback serving_output and add TODO

* Remove irrelevant portions of failing tests

* Revert "Rollback serving_output and add TODO"

This reverts commit a4abfa6ba3b7875a13538dbc2ddc4eb17dfcca8d.

* Rollback to original test/serving_output

* Fix unused var

* Apply suggestions from code review

* Update formatting with black

* Fix style again from rebase

* Update tests/models/clip/test_modeling_tf_clip.py

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sean Moriarity <sean.l.moriarity.mil@army.mil>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2022-05-04 16:37:58 +02:00
ef20390291 Update to build via git for accelerate (#17084) 2022-05-04 09:42:36 -04:00
bb8d40529e Deprecate model templates (#17062)
* Deprecate model templates

* Address review comments
2022-05-04 09:36:38 -04:00
9c5ae87f13 Type hint complete Albert model file. (#16682)
* Type hint complete Albert model file.

* Update typing.

* Update src/transformers/models/albert/modeling_albert.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2022-05-04 14:35:12 +01:00
2bf95e2b09 Bump notebook from 6.4.1 to 6.4.10 in /examples/research_projects/lxmert (#16634)
Bumps [notebook](http://jupyter.org) from 6.4.1 to 6.4.10.

---
updated-dependencies:
- dependency-name: notebook
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-04 08:27:40 -04:00
7a229ef446 Bump notebook in /examples/research_projects/visual_bert (#16635)
Bumps [notebook](http://jupyter.org) from 6.4.1 to 6.4.10.

---
updated-dependencies:
- dependency-name: notebook
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-04 08:27:27 -04:00
049e791758 Add Data2Vec for Vision in TF (#17008)
* add utilities till TFData2VecVisionLayer.

* chore: pass window_size to attention layer.

* feat: add TFData2VecVisionRelativePositionBias.

* feat: initial implementation ready for tf data2vec.

* fix: relative position bias index, table to be fixed.

* chore: implementation added, tests remaining.

* add: tests, other PR files.

* fix: code quality.

* fix: import structure in init.

* chore: run make fix-copies.

* chore: address PR feedback (round I).

* chore: styling nit.

* fix: tests due to removal of to_2tuple().

* chore: rebase with upstream main and move the test.

* Update src/transformers/models/auto/modeling_tf_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/auto/modeling_tf_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix: layer call.

* chore: remove from_pt=True and rerun test.

* chore: remove cast and tf.divide.

* chore: minor edits to the test script.

* Update src/transformers/models/data2vec/modeling_tf_data2vec_vision.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* fix: expand() on TF tensors with broadcast_to().

* fix: test import.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2022-05-04 08:08:25 -04:00
d76d2a2af7 Make sure telemetry arguments are not returned as unused kwargs (#17063)
* Make sure telemetry arguments are not returned as unused kwargs

* Fix test
2022-05-04 07:47:57 -04:00
675e2d1663 Remove masked image modeling from BEIT ONNX export (#16980)
* Add masked image modelling to task mapping

* Refactor ONNX features to be listed alphabetically

* Add warning about BEiT masked image modeling

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-05-04 10:05:24 +02:00
4bb1d0ec84 Skip RoFormer ONNX test if rjieba not installed (#16981)
* Skip RoFormer ONNX test if rjieba not installed

* Update deps table

* Skip RoFormer serialization test

* Fix RoFormer vocab

* Add rjieba to CircleCI
2022-05-04 10:04:10 +02:00
db034660fb Fix hashing for deduplication (#17048) 2022-05-04 08:40:24 +02:00
39f8eafc1b Remove device parameter from create_extended_attention_mask_for_decoder (#16894) 2022-05-03 11:06:11 -04:00
dd739f7045 Remove fetch in model templates test 2022-05-03 10:49:12 -04:00
1c9fcd0e04 Fix RNG reload in resume training from epoch checkpoint (#17055)
* Fix RNG reload in resume training from epoch checkpoint

* Fix test
2022-05-03 10:31:24 -04:00
6e17ba6aa5 Remove Python and use v2 action (#17059) 2022-05-03 10:12:17 -04:00
a8fa2f91f4 Make Trainer compatible with sharded checkpoints (#17053)
* Make Trainer compatible with sharded checkpoints

* Add doc
2022-05-03 09:55:10 -04:00
19420fd99e Move test model folders (#17034)
* move test model folders (TODO: fix imports and others)

* fix (potentially partially) imports (in model test modules)

* fix (potentially partially) imports (in tokenization test modules)

* fix (potentially partially) imports (in feature extraction test modules)

* fix import utils.test_modeling_tf_core

* fix path ../fixtures/

* fix imports about generation.test_generation_flax_utils

* fix more imports

* fix fixture path

* fix get_test_dir

* update module_to_test_file

* fix get_tests_dir from wrong transformers.utils

* update config.yml (CircleCI)

* fix style

* remove missing imports

* update new model script

* update check_repo

* update SPECIAL_MODULE_TO_TEST_MAP

* fix style

* add __init__

* update self-scheduled

* fix add_new_model scripts

* check one way to get location back

* python setup.py build install

* fix import in test auto

* update self-scheduled.yml

* update slack notification script

* Add comments about artifact names

* fix for yolos

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-05-03 14:42:02 +02:00
cd9274d010 [FlaxBert] Add ForCausalLM (#16995)
* [FlaxBert] Add ForCausalLM

* make style

* fix output attentions

* Add RobertaForCausalLM

* remove comment

* fix fx-to-pt model loading

* remove comment

* add modeling tests

* add enc-dec model tests

* add big_bird

* add electra

* make style

* make repo-consitency

* add to docs

* remove roberta test

* quality

* amend cookiecutter

* fix attention_mask bug in flax bert model tester

* tighten pt-fx thresholds to 1e-5

* add 'copied from' statements

* amend 'copied from' statements

* amend 'copied from' statements

* quality
2022-05-03 11:26:19 +02:00
31616b8d61 [T5 Tokenizer] Model has no fixed position ids - there is no hardcode… (#16990)
* [T5 Tokenizer] Model has no fixed position ids - there is no hardcoded max length

* [T5 Tokenizer] Model has no fixed position ids - there is no hardcoded max length

* correct t5 tokenizer

* correct t5 tokenizer

* fix test

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* finish

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-05-02 21:27:34 +02:00
1073f00d4e Clean up setup.py (#17045)
* Clean up setup.py

* Trigger CI

* Upgrade Python used
2022-05-02 12:58:17 -04:00
30ca529902 Make the sacremoses dependency optional (#17049)
* Make sacremoses optional

* Pickle
2022-05-02 12:47:47 -04:00
bb2e088be7 Allow all imports from transformers (#17050) 2022-05-02 12:47:39 -04:00
1ac698744c Add YOLOS (#16848)
* First draft

* Add YolosForObjectDetection

* Make forward pass work

* Add mid position embeddings

* Add interpolation of position encodings

* Add expected values

* Add YOLOS to tests

* Add integration test

* Support tiny model as well

* Support all models in conversion script

* Remove mid_pe_size attribute

* Make more tests pass

* Add model to README and fix config

* Add copied from statements

* Rename base_model_prefix to vit

* Add missing YOLOS_PRETRAINED_CONFIG_ARCHIVE_MAP

* Apply suggestions from code review

* Apply more suggestions from code review

* Convert remaining checkpoints

* Improve docstrings

* Add YolosFeatureExtractor

* Add feature extractor to docs

* Add corresponding tests

* Fix style

* Fix docs

* Apply suggestion from code review

* Fix bad rebase

* Fix some more bad rebase

* Fix missing character

* Improve docs and variable names

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-05-02 18:30:55 +02:00
f275e593bf Fix no_trainer examples to properly calculate the number of samples (#17046)
* Update all examples to properly calculate progress bar
2022-05-02 11:56:25 -04:00
35d48db881 Update no_trainer examples to use new logger (#17044)
* Propagate and fix imports
2022-05-02 11:56:15 -04:00
daecae1f1c [Trainer] Move logic for checkpoint loading into separate methods for easy overriding (#17043) 2022-05-02 10:40:37 -04:00
2de2c9ecca Clean up vision tests (#17024)
* Clean up tests

* Make fixup

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-05-02 16:28:58 +02:00
4be8b95a9f Disable Flax GPU tests on push (#17042) 2022-05-02 10:25:53 -04:00
bdd690a74d add torch.no_grad when in eval mode (#17020)
* add torch.no_grad when in eval mode

* make style quality
2022-05-02 07:49:19 -04:00
9586e222af Fix typo in RetriBERT docstring (#17018) 2022-05-02 07:48:20 -04:00
93b802c43e [Flax(Speech)EncoderDecoder] Fix bug in decoder_module (#17036)
* [FlaxSpeechEncoderDecoder] Fix bug in `decoder_module`

* [FlaxEncoderDecoder] Fix bug in `decoder_module`
2022-05-02 13:06:45 +02:00
1ae182d9a6 Fix style 2022-05-02 06:19:31 -04:00
2c2a2169b6 Fx with meta (#16836)
* Add meta proxy

* Uses meta data to trace data dependent control-flow

* Remove commented class

* Handles torch creating functions

* Added type annotation to fix tracing

* Tracing works for everything but T5 and GPT-J

* Almost all previously supported models pass

* All architectures can be traced except T5

* Intermediate commit to have a trace of the comparison operators for HFProxy

* Everything works, except loss computation

* Everything works

* Removed unused import

* Overriden methods do not use underlying ops (linear and torch.matmul), and model attributes are copied to the traced version

* Fix torch_matmul_override

* Change attributes reference to deepcopy

* Remove breakpoint and add torch_index_override

* Small fix

* Fix typo

* Replace asserts by explicit exceptions
2022-05-02 11:46:52 +02:00
ff846e9b28 [FlaxGenerate] Fix bug in decoder_start_token_id (#17035) 2022-05-02 11:05:27 +02:00
eb877f1fd0 update docs of length_penalty (#17022) 2022-05-02 11:01:18 +02:00
da47c264f9 Add translating guide (#17004)
* Add translating guide
2022-04-30 17:43:38 -05:00
ede5e04191 Add a check on config classes docstring checkpoints (#17012)
* Add the check

* add missing ckpts

* add a list to ignore

* call the added check script

* better regex pattern

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-30 10:40:46 +02:00
7152ed2bae Result of new doc style with fixes (#17015)
* Result of new doc style with fixes

* Add last two files

* Bump hf-doc-builder
2022-04-29 17:42:15 -04:00
18df440709 Replace dict/BatchEncoding instance checks by Mapping (#17014)
* Replace dict/BatchEncoding instance checks by Mapping

* Typo
2022-04-29 17:20:52 -04:00
b8dffd1f3e Revert "Updating variable names. (#16445)" (#17011)
This reverts commit 4f3a14e3c235c8b6b8cd2f5bc448a0cffacddf61.
2022-04-29 12:26:45 -04:00
4f3a14e3c2 Updating variable names. (#16445) 2022-04-29 17:44:28 +02:00
20fb5d51ea Update README_zh-hans.md (#16977) 2022-04-29 11:05:03 -04:00
63fbed5c59 Make create_extended_attention_mask_for_decoder static method (#16893) 2022-04-29 10:57:09 -04:00
fb0ae12947 TF: XLA bad words logits processor and list of processors (#16974) 2022-04-29 15:54:58 +01:00
57e6464ac9 Update all require decorators to use skipUnless when possible (#16999) 2022-04-29 08:55:38 -04:00
e952e049b4 use scale=1.0 in floats_tensor called in speech model testers (#17007)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-29 14:41:33 +02:00
e6f00a11d7 Update README to latest release (#16997) 2022-04-28 14:17:44 -04:00
3486a92a57 Fix savedir for by epoch (#16996) 2022-04-28 13:49:45 -04:00
5af5735f62 set eos_token_id to None to generate until max length (#16989)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-28 19:47:38 +02:00
01562dac7e Rename a class to reflect framework pattern AutoModelXxx -> TFAutoModelXxx (#16993) 2022-04-28 18:11:54 +01:00
1be8d56ec6 Add parameter --config_overrides for run_mlm_wwm.py (#16961)
* dd parameter --config_overrides for run_mlm_wwm.py

* linter
2022-04-28 10:44:55 -04:00
1f9e862507 Update check_models_are_tested to deal with Windows path (#16973)
* fix

* Apply suggestions from code review

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-04-28 15:31:57 +02:00
dced262409 Update tokenization_bertweet.py (#16941)
The emoji version must be either 0.5.4 or 0.6.0. Newer emoji versions have been updated to newer versions of the Emoji Charts, thus not consistent with the one used for pre-processing the pre-training Tweet corpus (i.e. not consistent with the vocab).
2022-04-27 16:54:31 -04:00
992996e9ca Add -e flag to some GH workflow yml files (#16959)
* Add -e flag

* add check

* create new keys

* run python setup.py build install

* add comments

* change to develop

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-27 21:44:21 +02:00
596afb4297 Fix check_all_models_are_tested (#16970)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-27 21:18:29 +02:00
691cdbb7d7 Fix doc notebooks links (#16969)
* Fix doc notebooks links

* Remove missing section
2022-04-27 14:59:53 -04:00
60e1d883f1 Fixup no_trainer save logic (#16968)
* Fixup all examples
2022-04-27 14:46:49 -04:00
c79bbc3ba5 Fix multiple deletions of the same files in save_pretrained (#16947)
* Fix multiple deletions of the same files in save_pretrained

* Add is_main_process argument
2022-04-27 12:28:42 -04:00
bfbec17765 Fix add-new-model-like when model doesn't support all frameworks (#16966) 2022-04-27 11:15:25 -04:00
cf8a7c2490 Update custom_models.mdx (#16964)
BertModelForSequenceClassification -> BertForSequenceClassification
2022-04-27 16:46:55 +02:00
5896b3ecce Fix distributed_concat with scalar tensor (#16963)
* Fix `distributed_concat` with scalar tensor

* Update trainer_pt_utils.py
2022-04-27 10:26:22 -04:00
084c38c59d [HF Argparser] Fix parsing of optional boolean arguments (#16946)
* Add fix

* Apply suggestion from code review

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-04-27 15:00:45 +02:00
c82e017aa9 Misc. fixes for Pytorch QA examples: (#16958)
1. Fixes evaluation errors popping up when you train/eval on squad v2 (one was newly encountered and one that was previously reported Running SQuAD 1.0 sample command raises IndexError #15401 but not completely fixed).
2. Removes boolean arguments that don't use store_true. Please, don't use these: *ANY non-empty string is being converted to True in this case and this clearly is not the desired behavior (and it creates a LOT of confusion).
3. All no-trainer test scripts are now saving metric values in the same way (with the right prefix eval_), which is consistent with the trainer-based versions.
4. Adds forgotten model.eval() in the no-trainer versions. This improved some results, but not everything (see the discussion in the end). Please, see the F1 scores and the discussion below.
2022-04-27 08:51:39 -04:00
49d5bcb0f3 Fix HubertRobustTest PT/TF equivalence test on GPU (#16943)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-27 10:50:03 +02:00
479fdc4925 Add semantic script, trainer (#16834)
* Add first draft

* Improve script and README

* Improve README

* Apply suggestions from code review

* Improve script, add link to resulting model

* Add corresponding test

* Adjust learning rate
2022-04-27 10:12:18 +02:00
a4a88fa09f [Research] Speed up evaluation for XTREME-S (#16785)
* Avoid repeated per-lang filtering

* Language groups and logits preprocessing

* Style
2022-04-27 08:34:21 +02:00
2d91e3c304 use original loaded keys to find mismatched keys (#16920) 2022-04-26 17:29:52 -04:00
d365f5074f Fix RuntimeError message format (#16906) 2022-04-26 17:08:28 -04:00
10dfa126b7 documentation: some minor clean up (#16850) 2022-04-26 16:56:08 -04:00
aaee4038c3 Add onnx config for RoFormer (#16861)
* add roformer onnx config
2022-04-26 16:51:15 +02:00
8afaaa26f5 FIx Iterations for decoder (#16934)
FIx Iterations for decoder
2022-04-26 12:54:14 +02:00
fa32247406 apply torch int div to layoutlmv2 (#15457)
* apply torch int div

* black linting fixup

* update path to torch_int_div

* clarify imports
2022-04-26 10:07:51 +02:00
344b9fb0c6 Limit the use of PreTrainedModel.device (#16935)
* Limit the use of PreTrainedModel.device

* Fix
2022-04-25 20:58:50 -04:00
6568752039 Fix issue probably-meant-fstring found at https://codereview.doctor (#16913) 2022-04-25 15:15:00 -04:00
fea94d6790 Replace deprecated logger.warn with warning (#16876) 2022-04-25 15:12:51 -04:00
e03966e404 TF: XLA stable softmax (#16892)
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-04-25 20:10:51 +01:00
8246caf3eb added deit onnx config (#16887)
* added deit onnx config
2022-04-25 20:50:45 +02:00
9331b37967 TF: XLA Logits Warpers (#16899)
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2022-04-25 19:48:08 +01:00
809dac48f9 TF: XLA logits processors - minimum length, forced eos, and forced bos (#16912)
* XLA min len, forced eos, and forced bos

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2022-04-25 19:27:53 +01:00
f6210c49e2 Fix RemBertTokenizerFast (#16933)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-25 19:51:50 +02:00
32adbb26d6 Fix PyTorch RAG tests GPU OOM (#16881)
* add torch.cuda.empty_cache in some PT RAG tests

* torch.cuda.empty_cache in tearDownModule()

* tearDown()

* add gc.collect()

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-25 17:33:56 +02:00
3e47d19cfc Add missing ckpt in config docs (#16900)
* add missing ckpt in config docs

* add more missing ckpt in config docs

* fix wrong ckpts

* fix realm ckpt

* fix s2t2

* fix xlm_roberta ckpt

* Fix for deberta v2

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* use only one checkpoint for DPR

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-04-25 17:31:45 +02:00
3a71e94a92 Fix doc test quicktour dataset (#16929)
* fix doc test

* fix doc test

Co-authored-by: Patrick <patrick@pop-os.localdomain>
2022-04-25 16:26:59 +02:00
508baf1943 add bigbird typo fixes (#16897)
Co-authored-by: ChainYo <t.chaigneau.tc@gmail.com>
2022-04-25 11:32:06 +02:00
72728be3db [DocTests] Fix some doc tests (#16889)
* [DocTests] Fix some doc tests

* hacky fix

* correct
2022-04-23 08:40:14 +02:00
22fc93c4d9 Changes in create_optimizer to support tensor parallelism with SMP (#16880)
* changes in create optimizer to support tensor parallelism with SMP

* Update src/transformers/trainer.py

Convert if check to one line.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Cavdar <dcavdar@a07817b12d7e.ant.amazon.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-04-22 15:24:38 -04:00
99c8226b12 TF: XLA repetition penalty (#16879) 2022-04-22 18:29:32 +01:00
ec81c11a18 Add OnnxConfig for ConvBERT (#16859)
* add OnnxConfig for ConvBert

Co-authored-by: ChainYo <t.chaigneau.tc@gmail.com>
2022-04-22 18:19:15 +02:00
0d1cff1195 Add doc tests for Albert and Bigbird (#16774)
* Add doctest BERT

* make fixup

* fix typo

* change checkpoints

* make fixup

* define doctest output value, update doctest for mobilebert

* solve fix-copies

* update QA target start index and end index

* change checkpoint for docs and reuse defined variable

* Update src/transformers/models/bert/modeling_tf_bert.py

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* make fixup

* Add Doctest for Albert and Bigbird

* make fixup

* overwrite examples for Albert and Bigbird

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* update longer examples for Bigbird

* using examples from squad_v2

* print out example text

* change name token-classification-big-bird checkpoint to random

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-04-22 18:07:16 +02:00
9fa88172c2 Minor fixes/improvements in convert_file_size_to_int (#16891)
* Minor improvements to `convert_file_size_to_int`

* Add <unit>bit version to kilos and megas

* Minor fix
2022-04-22 16:54:20 +02:00
6d90d76f5d TF: rework XLA generate tests (#16866) 2022-04-22 12:38:08 +01:00
3b1bbefc47 Add missing entries in mappings (#16857)
* add missing entries in some mappings

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-22 10:53:24 +02:00
d91841315a New features for CodeParrot training script (#16851)
* add tflops logging and fix grad accumulation

* add accelerate tracking and checkpointing

* scale loss of last batch correctly

* fix typo

* compress loss computation

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* add resume from checkpoint argument

* add load_state accelerate from checkpoint, register lr scheduler and add tflops function

* reformat code

* reformat code

* add condition on path for resume checkpoint

* combine if conditions

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

* add source for tflops formula

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
2022-04-21 18:43:46 +02:00
eef2422e96 Fix doctest list (#16878)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-21 18:12:14 +02:00
0b1e0fcf7a Fix GPT-J onnx conversion (#16780)
* add gptj to TOKENIZER_MAPPING_NAMES

* fix int32 to float to avoid problem in onnx

* Update src/transformers/models/gptj/modeling_gptj.py

Co-authored-by: ChainYo <t.chaigneau.tc@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2022-04-21 15:55:30 +02:00
bae9b6458c Use ACT2FN to fetch ReLU activation (#16874)
- all activations should be fetched through ACT2FN
- it returns ReLU as `nn.Module`, which allows attaching hooks on the activation function and prints it to stdout when `print(model)`
2022-04-21 09:33:29 -04:00
cb555af2c7 Return input_ids in ImageGPT feature extractor (#16872) 2022-04-21 09:09:00 -04:00
e789418ebe Adding support for array key in raw dictionnaries in ASR pipeline. (#16827)
* Adding support for `array` key in raw dictionnaries in ASR pipeline.

* ES .

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Making it work by not popping `array` first.

* Black 22.3

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-04-21 14:39:10 +02:00
daf520b033 tiny tweak to allow BatchEncoding.token_to_char when token doesn't correspond to chars (#15901)
* tweak to allow BatchEncoding.char_to_token(0)

* update docstring

* remote trailing whitespace

* make fixup

* make value checking for span_indices explicit

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-04-21 08:07:54 -04:00
cb7e166428 t5: add conversion script for T5X to FLAX (#16853)
* t5: add conversion script for T5X to FLAX

* t5: make flake happy

* t5: add copyright message to t5x conversion script

* t5: fix lm head for v1.0 checkpoints
2022-04-21 13:00:35 +02:00
6620f60c0a Long QuestionAnsweringPipeline fix. (#16778)
* Temporary commit witht the long QA fix.

* Adding slow tests covering this fix.

* Removing fast test as it doesn't fail anyway.
2022-04-21 09:59:25 +02:00
705d65368f Fix multiproc metrics in no_trainer examples (#16865) 2022-04-20 17:26:27 -04:00
175da8d182 Fix custom init sorting script (#16864) 2022-04-20 17:05:39 -04:00
67ed0e43dc [docs] fix url (#16860) 2022-04-20 11:01:24 -07:00
afa1ef0992 [modeling_utils] use less cpu memory with sharded checkpoint loading (#16844)
* less cpu memory with sharded checkpoint loading

* Trigger CI

* Trigger CI
2022-04-20 07:44:37 -07:00
e13a91fe60 Fixing return type tensor with num_return_sequences>1. (#16828)
* Fixing return type tensor with `num_return_sequences>1`.

* Nit.
2022-04-20 16:11:51 +02:00
ff06b17791 add DebertaV2 fast tokenizer (#15529)
Co-authored-by: alcinos <carion.nicolas@gmail.com>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
Co-authored-by: Nicolas Carion <carion.nicolas@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-04-20 10:26:51 +02:00
e1c153cbaa [Typo] Fix typo in modeling utils (#16840) 2022-04-19 23:09:03 +02:00
3104036e7f Add support for bitsandbytes (#15622)
* Add initial BNB integration

* fixup! Add initial BNB integration

* Add bnb test decorator

* Update Adamw8bit option name

* Use the full bnb package name

* Overide bnb for all embedding layers

* Fix package name

* Formatting

* Remove unnecessary import

* Update src/transformers/trainer.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename AdamwBNB optimizer option

* Add training test checking that bnb memory utilization is lower

* fix merge

* fix merge; fix + extend new test

* cleanup

* expand bnb

* move all require_* candidates to testing_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2022-04-19 16:01:29 -04:00
e6d23a4b9b Improve test_pt_tf_model_equivalence on PT side (#16731)
* Update test_pt_tf_model_equivalence on PT side

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-19 21:13:27 +02:00
3dd57b15c5 Type hints added to Speech to Text (#16506)
* Type hints added

* return hints added

* Update src/transformers/models/speech_to_text/modeling_tf_speech_to_text.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2022-04-19 17:58:08 +01:00
1efca4e6c8 replace Speech2TextTokenizer by Speech2TextFeatureExtractor in some docstrings (#16835)
* replace `Speech2TextTokenizer` by `Speech2TextFeatureExtractor` in docstring

* quality
2022-04-19 18:32:22 +02:00
b5c6a63ed9 Correct Logging of Eval metric to Tensorboard (#16825)
* Correct Logging of Eval metric to Tensorboard

An empty dictionary ``eval_metrics`` was being logged, is replaced by ``eval_metric`` which is the output dictionary of ``metric.compute()``.

* Remove unused variable
2022-04-19 17:27:54 +02:00
f09c45e067 TF: Add sigmoid activation function (#16819) 2022-04-19 16:13:08 +01:00
74814574ae Add doc about attention_mask on gpt2 (#16829)
* Add doc about `attention_mask` on gpt2

Add a simple sentence describing how `attention_mask` needs to be constructed when ``past_key_values` is used.

* Add doc about attention_mask on gpt2_tf

* clean up style

* remove empty line white spaces

* remove whitespace in empty line
2022-04-19 16:32:26 +02:00
b96e82c80a Add image classification script, no trainer (#16727)
* Add first draft

* Improve README and run fixup

* Make script aligned with other scripts, improve README

* Improve script and add test

* Remove print statement

* Apply suggestions from code review

* Add num_labels to make test pass

* Improve README
2022-04-19 16:32:08 +02:00
db9f189121 [ASR Pipeline] Correct init docs (#16833)
* correct

* up
2022-04-19 16:12:36 +02:00
77de8d6c31 Add onnx export of models with a multiple choice classification head (#16758)
* Add export of models with a multiple-choice classification head
2022-04-19 15:51:51 +02:00
b74a955325 fix rum_clm.py seeking text column name twice (#16624) 2022-04-19 14:38:25 +01:00
3663fca41b Type hints added for TFMobileBert (#16505)
* Type hints added

* make style

* Return type hints added

* fixed typo

Co-authored-by: matt <rocketknight1@gmail.com>
2022-04-19 14:37:03 +01:00
a2392415e9 Some tests misusing assertTrue for comparisons fix (#16771)
* Fix issue avoid-misusing-assert-true found at https://codereview.doctor

* fix tests

* fix tf

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-04-19 14:44:08 +02:00
d3bd9ac728 [Flax] improve large model init and loading (#16148)
* begin do_init

* add params_shape_tree

* raise error if params are accessed when do_init is False

* don't allow do_init=False when keys are missing

* make shape tree a property

* assign self._params at the end

* add test for do_init

* add do_init arg to all flax models

* fix param setting

* disbale do_init for composite models

* update test

* add do_init in FlaxBigBirdForMultipleChoice

* better names and errors

* improve test

* style

* add a warning when do_init=False

* remove extra if

* set params after _required_params

* add test for from_pretrained

* do_init => _do_init

* chage warning to info

* fix typo

* add params in init_weights

* add params to gpt neo init

* add params to init_weights

* update do_init test

* Trigger CI

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* update template

* trigger CI

* style

* style

* fix template

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-04-19 14:19:55 +02:00
6de4ee61a0 Wav2 vec2 phoneme ctc tokenizer optimisation (#16817)
* Solved href rendering issue in heading

Markdown references in headings such as '####' don't render well.
Replaced it with <h4>...<a></a></h> banners.

* PhonemeTokenizer optimization using phonemizer lib

The backend should only be initialized once, otherwise it is reloaded.
Added `init_backend` function, intializes a backend attribute.
Phonemize re-uses self.backend.
Should give ~10 times faster phonemization.

* formatted file with make style

* Documentation suggestion

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update /tokenization_wav2vec2_phoneme.py based on PR suggestion

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update CONTRIBUTING.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-04-19 07:39:04 -04:00
306c9ee966 Fix LayoutLMv2 tokenization docstrings (#16187)
* Fix docstrings

* Fix up

* Fix
2022-04-19 12:14:51 +02:00
7db7aab439 Add semantic script no trainer, v2 (#16788)
* Add first draft from previous PR

* First draft

* Improve README and remove num_labels

* Make script more aligned with other scripts

* Improve README and apply suggestion from code review
2022-04-19 09:07:29 +02:00
494c2a8c4d Clean up semantic segmentation tests (#16801)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-04-19 09:02:19 +02:00
989a15d173 fix _setup_devices in case where there is no torch.distributed package in build (#16821)
* fix _setup_devices in case where there is not torch.distributed

* in training_args_sm.py as well
2022-04-18 18:36:46 -04:00
c11a49573f Refactor issues with yaml (#16772)
* Refactor issues with yaml

* Update .github/ISSUE_TEMPLATE/bug-report.yml

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Update .github/ISSUE_TEMPLATE/bug-report.yml

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Update .github/ISSUE_TEMPLATE/feature-request.yml

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update .github/ISSUE_TEMPLATE/bug-report.yml

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update .github/ISSUE_TEMPLATE/bug-report.yml

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Address review comments

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-04-18 16:43:21 -04:00
51e0ebedcb Allow passing encoder_ouputs as tuple to EncoderDecoder Models (#16814)
* Add passing encoder_outputs as tuple to existing test

* Add check for tuple

* Add check for tuple also for speech and vision

Co-authored-by: jsnfly <jsnfly@gmx.de>
2022-04-18 19:49:58 +02:00
51fa7191b1 use base_version to check torch version in torch_less_than_1_11 (#16806)
* use base_version

* make is_torch_less_than_1_8 match 1_11

Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
2022-04-18 13:02:00 -04:00
8d3f952adb [Data2Vec] Add data2vec vision (#16760)
* save intermediate

* add vision

* add vision

* save

* finish models

* finish models

* continue

* finish

* up

* up

* up

* tests all pass

* clean up

* up

* up

* fix bugs in beit

* correct docs

* finish

* finish docs

* make style

* up

* more fixes

* fix type hint

* make style

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/data2vec/test_modeling_data2vec_vision.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fix test

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-04-18 17:52:13 +02:00
33cd4be576 fix megatron bert convert state dict naming (#15820) 2022-04-18 11:34:36 -04:00
9a2995ee39 [Quicktour Audio] Improve && remove ffmpeg dependency (#16723)
* [Quicktour Audio] Improve && remove ffmpeg dependency

* final fix

* final touches
2022-04-18 16:50:13 +02:00
d3c9d0e55f [ViT, BEiT, DeiT, DPT] Improve code (#16799)
* Improve code

* Fix bugs

* Fix another bug

* Clean up DTP as well

* Update DPT model outputs

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-04-18 09:25:08 -04:00
3785f4665a Fix syntax error in TorchHub workflow 2022-04-18 07:54:00 -04:00
6984848ed0 Create empty venv on cache miss (#16816) 2022-04-18 07:49:31 -04:00
438144832e Raise error and suggestion when using custom optimizer with Fairscale or Deepspeed (#16786)
* optimizer issues related to saving

* remove the "optimizer saving" option

* reformat using make style
2022-04-18 07:47:21 -04:00
b4ddd2677c TF generate refactor - XLA sample (#16713) 2022-04-18 10:58:24 +01:00
02de7a8e7f CI: non-remote GH Actions now use a python venv (#16789) 2022-04-18 09:47:38 +01:00
dee6f01636 Pin Jax to last working release (#16808)
* Pin Jax to last working release

* Try lower

* Try lower
2022-04-16 21:15:19 -04:00
78f346c2b5 Update README.md (#16797) 2022-04-15 14:10:16 +02:00
ee209d4d01 Fix PT TF ViTMAE (#16766)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-04-15 06:37:10 +02:00
5da33f8729 [modeling utils] revamp from_pretrained(..., low_cpu_mem_usage=True) + tests (#16657)
* add low_cpu_mem_usage tests

* wip: revamping

* wip

* install /usr/bin/time

* wip

* cleanup

* cleanup

* cleanup

* cleanup

* cleanup

* fix assert

* put the wrapper back

* cleanup; switch to bert-base-cased

* Trigger CI

* Trigger CI
2022-04-14 18:10:05 -07:00
ce2fef2ad2 [trainer / deepspeed] fix hyperparameter_search (#16740)
* [trainer / deepspeed] fix hyperparameter_search

* require optuna

* style

* oops

* add dep in the right place

* create deepspeed-testing dep group

* Trigger CI
2022-04-14 17:24:38 -07:00
1b7de41a07 Fix issue avoid-missing-comma found at https://codereview.doctor (#16768) 2022-04-14 16:42:27 -04:00
de8b06f9bf [SpeechEncoderDecoderModel] Fix bug in reshaping labels (#16748) 2022-04-14 19:02:40 +01:00
048443db86 Improve image classification example (#16585)
* Improve README

* Make dataset_name argument optional

* Improve local data

* Fix bug

* Improve README some more

* Apply suggestions from code review

* Improve README

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-04-14 18:10:52 +02:00
3e4eec47f5 Kill async pushes when calling push_to_hub with blocking=True (#16755) 2022-04-14 10:02:29 -04:00
c21e1071a7 [deepspeed / m2m_100] make deepspeed zero-3 work with layerdrop (#16717)
* [deepspeed / m2m_100] make deepspeed 3 work with layerdrop

* fix

* revert last
2022-04-14 06:51:55 -07:00
89293a0f6b Make nightly install dev accelerate (#16783) 2022-04-14 09:41:02 -04:00
b151ddb9b9 Fix batch size in evaluation loop (#16763)
* Fix batch size in evaluation loop

* remove debug statement
2022-04-14 09:22:54 -04:00
1827 changed files with 181004 additions and 27955 deletions

File diff suppressed because it is too large Load Diff

3
.gitattributes vendored
View File

@ -1,3 +1,4 @@
*.py eol=lf
*.rst eol=lf
*.md eol=lf
*.md eol=lf
*.mdx eol=lf

View File

@ -1,22 +0,0 @@
---
name: "\U0001F5A5 New benchmark"
about: Benchmark a part of this library and share your results
title: "[Benchmark]"
labels: ''
assignees: ''
---
# 🖥 Benchmarking `transformers`
## Benchmark
Which part of `transformers` did you benchmark?
## Set-up
What did you run your benchmarks on? Please include details, such as: CPU, GPU? If using multiple GPUs, which parallelization did you use?
## Results
Put your results here!

View File

@ -1,20 +0,0 @@
---
name: "\U0001F31F New model addition"
about: Submit a proposal/request to implement a new Transformer-based model
title: ''
labels: New model
assignees: ''
---
# 🌟 New model addition
## Model description
<!-- Important information -->
## Open source status
* [ ] the model implementation is available: (give details)
* [ ] the model weights are available: (give details)
* [ ] who are the authors: (mention them, if possible by @gh-username)

View File

@ -1,107 +0,0 @@
---
name: "\U0001F41B Bug Report"
about: Submit a bug report to help us improve transformers
title: ''
labels: ''
assignees: ''
---
## Environment info
<!-- You can run the command `transformers-cli env` and copy-and-paste its output below.
Don't forget to fill out the missing fields in that output! -->
- `transformers` version:
- Platform:
- Python version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
### Who can help
<!-- Your issue will be replied to more quickly if you can figure out the right person to tag with @
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
Please tag fewer than 3 people.
Models:
- ALBERT, BERT, XLM, DeBERTa, DeBERTa-v2, ELECTRA, MobileBert, SqueezeBert: @LysandreJik
- T5, Pegasus, EncoderDecoder: @patrickvonplaten
- Blenderbot, MBART, BART, Marian, Pegasus: @patil-suraj
- Reformer, TransfoXL, XLNet, FNet: @patrickvonplaten
- Longformer, BigBird: @ydshieh
- FSMT: @stas00
- Funnel: @sgugger
- GPT-2, GPT: @patil-suraj, @patrickvonplaten, @LysandreJik
- RAG, DPR: @patrickvonplaten, @lhoestq
- TensorFlow: @Rocketknight1
- JAX/Flax: @patil-suraj
- TAPAS, LayoutLM, LayoutLMv2, LUKE, ViT, BEiT, DEiT, DETR, CANINE: @NielsRogge
- GPT-Neo, GPT-J, CLIP: @patil-suraj
- Wav2Vec2, HuBERT, SpeechEncoderDecoder, UniSpeech, UniSpeechSAT, SEW, SEW-D, Speech2Text: @patrickvonplaten, @anton-l
If the model isn't in the list, ping @LysandreJik who will redirect you to the correct contributor.
Library:
- Benchmarks: @patrickvonplaten
- Deepspeed: @stas00
- Ray/raytune: @richardliaw, @amogkam
- Text generation: @patrickvonplaten @narsil
- Tokenizers: @SaulLu
- Trainer: @sgugger
- Pipelines: @Narsil
- Speech: @patrickvonplaten, @anton-l
- Vision: @NielsRogge, @sgugger
Documentation: @sgugger
Model hub:
- for issues with a model, report at https://discuss.huggingface.co/ and tag the model's creator.
HF projects:
- datasets: [different repo](https://github.com/huggingface/datasets)
- rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Examples:
- maintained examples (not research project or legacy): @sgugger, @patil-suraj
For research projetcs, please ping the contributor directly. For example, on the following projects:
- research_projects/bert-loses-patience: @JetRunner
- research_projects/distillation: @VictorSanh
-->
## Information
Model I am using (Bert, XLNet ...):
The problem arises when using:
* [ ] the official example scripts: (give details below)
* [ ] my own modified scripts: (give details below)
The tasks I am working on is:
* [ ] an official GLUE/SQUaD task: (give the name)
* [ ] my own task or dataset: (give details below)
## To reproduce
Steps to reproduce the behavior:
1.
2.
3.
<!-- If you have code snippets, error messages, stack traces please provide them here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.-->
## Expected behavior
<!-- A clear and concise description of what you would expect to happen. -->

119
.github/ISSUE_TEMPLATE/bug-report.yml vendored Normal file
View File

@ -0,0 +1,119 @@
name: "\U0001F41B Bug Report"
description: Submit a bug report to help us improve transformers
labels: [ "bug" ]
body:
- type: textarea
id: system-info
attributes:
label: System Info
description: Please share your system info with us. You can run the command `transformers-cli env` and copy-paste its output below.
placeholder: transformers version, platform, python version, ...
validations:
required: true
- type: textarea
id: who-can-help
attributes:
label: Who can help?
description: |
Your issue will be replied to more quickly if you can figure out the right person to tag with @
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
Please tag fewer than 3 people.
Models:
- ALBERT, BERT, XLM, DeBERTa, DeBERTa-v2, ELECTRA, MobileBert, SqueezeBert: `@LysandreJik`
- T5, Pegasus, EncoderDecoder: `@patrickvonplaten`
- Blenderbot, MBART, BART, Marian, Pegasus: `@patil-suraj`
- Reformer, TransfoXL, XLNet, FNet: `@patrickvonplaten`
- Longformer, BigBird: `@ydshieh`
- FSMT: `@stas00`
- Funnel: `@sgugger`
- GPT-2, GPT: `@patil-suraj`, `@patrickvonplaten`, `@LysandreJik`
- RAG, DPR: `@patrickvonplaten`, `@lhoestq`
- TensorFlow: `@Rocketknight1`
- JAX/Flax: `@patil-suraj`
- TAPAS, LayoutLM, LayoutLMv2, LUKE, ViT, BEiT, DEiT, DETR, CANINE: `@NielsRogge`
- GPT-Neo, GPT-J, CLIP: `@patil-suraj`
- Wav2Vec2, HuBERT, UniSpeech, UniSpeechSAT, SEW, SEW-D: `@patrickvonplaten`, `@anton-l`
- SpeechEncoderDecoder, Speech2Text, Speech2Text2: `@sanchit-gandhi`, `@patrickvonplaten`, `@anton-l`
If the model isn't in the list, ping `@LysandreJik` who will redirect you to the correct contributor.
Library:
- Benchmarks: `@patrickvonplaten`
- Deepspeed: `@stas00`
- Ray/raytune: `@richardliaw`, `@amogkam`
- Text generation: `@patrickvonplaten`, `@Narsil`, `@gante`
- Tokenizers: `@SaulLu`
- Trainer: `@sgugger`
- Pipelines: `@Narsil`
- Speech: `@patrickvonplaten`, `@anton-l`, `@sanchit-gandhi`
- Vision: `@NielsRogge`, `@sgugger`
Documentation: `@sgugger`, `@stevhliu`
Model hub:
- for issues with a model, report at https://discuss.huggingface.co/ and tag the model's creator.
HF projects:
- datasets: [different repo](https://github.com/huggingface/datasets)
- rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Examples:
- maintained examples (not research project or legacy): `@sgugger`, `@patil-suraj`
For research projetcs, please ping the contributor directly. For example, on the following projects:
- research_projects/bert-loses-patience: `@JetRunner`
- research_projects/distillation: `@VictorSanh`
placeholder: "@Username ..."
- type: checkboxes
id: information-scripts-examples
attributes:
label: Information
description: 'The problem arises when using:'
options:
- label: "The official example scripts"
- label: "My own modified scripts"
- type: checkboxes
id: information-tasks
attributes:
label: Tasks
description: "The tasks I am working on are:"
options:
- label: "An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)"
- label: "My own task or dataset (give details below)"
- type: textarea
id: reproduction
validations:
required: true
attributes:
label: Reproduction
description: |
Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
If you have code snippets, error messages, stack traces please provide them here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
placeholder: |
Steps to reproduce the behavior:
1.
2.
3.
- type: textarea
id: expected-behavior
validations:
required: true
attributes:
label: Expected behavior
description: "A clear and concise description of what you would expect to happen."

12
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@ -0,0 +1,12 @@
blank_issues_enabled: true
version: 2.1
contact_links:
- name: Model checkpoints on the Hugging Face Hub
url: https://huggingface.co/models
about: Open a Pull request / Discussion related to a specific model checkpoint directly on the Hugging Face Hub
- name: Website Related
url: https://github.com/huggingface/hub-docs/issues
about: Feature requests and bug reports related to the website
- name: Forum
url: https://discuss.huggingface.co/
about: General usage questions and community discussions

View File

@ -1,25 +0,0 @@
---
name: "\U0001F680 Feature request"
about: Submit a proposal/request for a new transformers feature
title: ''
labels: ''
assignees: ''
---
# 🚀 Feature request
<!-- A clear and concise description of the feature proposal.
Please provide a link to the paper and code in case they exist. -->
## Motivation
<!-- Please outline the motivation for the proposal. Is your feature request
related to a problem? e.g., I'm always frustrated when [...]. If this is related
to another GitHub issue, please link here too. -->
## Your contribution
<!-- Is there any way that you could help, e.g. by submitting a PR?
Make sure to read the CONTRIBUTING.MD readme:
https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md -->

View File

@ -0,0 +1,31 @@
name: "\U0001F680 Feature request"
description: Submit a proposal/request for a new transformers feature
labels: [ "feature" ]
body:
- type: textarea
id: feature-request
validations:
required: true
attributes:
label: Feature request
description: |
A clear and concise description of the feature proposal. Please provide a link to the paper and code in case they exist.
- type: textarea
id: motivation
validations:
required: true
attributes:
label: Motivation
description: |
Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.
- type: textarea
id: contribution
validations:
required: true
attributes:
label: Your contribution
description: |
Is there any way that you could help, e.g. by submitting a PR? Make sure to read the CONTRIBUTING.MD [readme](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md)

View File

@ -1,58 +0,0 @@
---
name: "\U0001F4DA Migration from pytorch-pretrained-bert or pytorch-transformers"
about: Report a problem when migrating from pytorch-pretrained-bert or pytorch-transformers
to transformers
title: ''
labels: Migration
assignees: ''
---
# 📚 Migration
## Information
<!-- Important information -->
Model I am using (Bert, XLNet ...):
Language I am using the model on (English, Chinese ...):
The problem arises when using:
* [ ] the official example scripts: (give details below)
* [ ] my own modified scripts: (give details below)
The tasks I am working on is:
* [ ] an official GLUE/SQUaD task: (give the name)
* [ ] my own task or dataset: (give details below)
## Details
<!-- A clear and concise description of the migration issue.
If you have code snippets, please provide it here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
-->
## Environment info
<!-- You can run the command `python transformers-cli env` and copy-and-paste its output below.
Don't forget to fill out the missing fields in that output! -->
- `transformers` version:
- Platform:
- Python version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
<!-- IMPORTANT: which version of the former library do you use? -->
* `pytorch-transformers` or `pytorch-pretrained-bert` version (or branch):
## Checklist
- [ ] I have read the migration guide in the readme.
([pytorch-transformers](https://github.com/huggingface/transformers#migrating-from-pytorch-transformers-to-transformers);
[pytorch-pretrained-bert](https://github.com/huggingface/transformers#migrating-from-pytorch-pretrained-bert-to-transformers))
- [ ] I checked if a related official extension example runs on my machine.

72
.github/ISSUE_TEMPLATE/migration.yml vendored Normal file
View File

@ -0,0 +1,72 @@
name: "\U0001F4DA Migration from pytorch-pretrained-bert or pytorch-transformers"
description: Report a problem when migrating from pytorch-pretrained-bert or pytorch-transformers to transformers
labels: [ "migration" ]
body:
- type: textarea
id: system-info
attributes:
label: System Info
description: Please share your system info with us. You can run the command `transformers-cli env` and copy-paste its output below.
render: shell
placeholder: transformers version, platform, python version, ...
validations:
required: true
- type: checkboxes
id: information-scripts-examples
attributes:
label: Information
description: 'The problem arises when using:'
options:
- label: "The official example scripts"
- label: "My own modified scripts"
- type: checkboxes
id: information-tasks
attributes:
label: Tasks
description: "The tasks I am working on are:"
options:
- label: "An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)"
- label: "My own task or dataset (give details below)"
- type: textarea
id: reproduction
validations:
required: true
attributes:
label: Reproduction
description: |
Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
If you have code snippets, error messages, stack traces please provide them here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
placeholder: |
Steps to reproduce the behavior:
1.
2.
3.
- type: textarea
id: expected-behavior
validations:
required: true
attributes:
label: Expected behavior
description: "A clear and concise description of what you would expect to happen."
render: shell
- type: checkboxes
id: checklist
attributes:
label: Checklist
options:
- label: "I have read the migration guide in the readme.
([pytorch-transformers](https://github.com/huggingface/transformers#migrating-from-pytorch-transformers-to-transformers);
[pytorch-pretrained-bert](https://github.com/huggingface/transformers#migrating-from-pytorch-pretrained-bert-to-transformers))"
required: true
- label: "I checked if a related official extension example runs on my machine."
required: true

View File

@ -0,0 +1,31 @@
name: "\U0001F31F New model addition"
description: Submit a proposal/request to implement a new model
labels: [ "New model" ]
body:
- type: textarea
id: description-request
validations:
required: true
attributes:
label: Model description
description: |
Put any and all important information relative to the model
- type: checkboxes
id: information-tasks
attributes:
label: Open source status
description: |
Please note that if the model implementation isn't available or if the weights aren't open-source, we are less likely to implement it in `transformers`.
options:
- label: "The model implementation is available"
- label: "The model weights are available"
- type: textarea
id: additional-info
attributes:
label: Provide useful links for the implementation
description: |
Please provide information regarding the implementation, the weights, and the authors.
Please mention the authors by @gh-username if you're aware of their usernames.

View File

@ -1,26 +0,0 @@
---
name: "❓ Questions & Help"
about: Post your general questions on the Hugging Face forum: https://discuss.huggingface.co/
title: ''
labels: ''
assignees: ''
---
# ❓ Questions & Help
<!-- The GitHub issue tracker is primarly intended for bugs, feature requests,
new models, benchmarks, and migration questions. For all other questions,
we direct you to the Hugging Face forum: https://discuss.huggingface.co/ .
-->
## Details
<!-- Description of your issue -->
<!-- You should first ask your question on the forum, and only if
you didn't get an answer after a few days ask it here on GitHub. -->
**A link to original question on the forum**:
<!-- Your issue will be closed if you don't fill this part. -->

View File

@ -25,7 +25,7 @@ requirements:
- sacremoses
- regex !=2019.12.17
- protobuf
- tokenizers >=0.10.1,<0.11.0
- tokenizers >=0.11.1,!=0.11.3,<0.13
- pyyaml >=5.1
run:
- python
@ -40,7 +40,7 @@ requirements:
- sacremoses
- regex !=2019.12.17
- protobuf
- tokenizers >=0.10.1,<0.11.0
- tokenizers >=0.11.1,!=0.11.3,<0.13
- pyyaml >=5.1
test:

View File

@ -18,32 +18,54 @@ jobs:
steps:
- uses: actions/checkout@v2
- name: Loading cache.
- name: Install dependencies
run: |
sudo apt -y update && sudo apt install -y libsndfile1-dev
- name: Load cached virtual environment
uses: actions/cache@v2
id: cache
with:
path: ~/.cache/pip
key: v1-tests_model_like-${{ hashFiles('setup.py') }}
path: ~/venv/
key: v4-tests_model_like-${{ hashFiles('setup.py') }}
- name: Install dependencies
- name: Create virtual environment on cache miss
if: steps.cache.outputs.cache-hit != 'true'
run: |
python -m venv ~/venv && . ~/venv/bin/activate
pip install --upgrade pip!=21.3
pip install -U click # Click 7 is installed in the environment by default, but we need at least version 8 for Black
sudo apt -y update && sudo apt install -y libsndfile1-dev
pip install .[dev]
pip install -e .[dev]
- name: Check transformers location
# make `transformers` available as package (required since we use `-e` flag) and check it's indeed from the repo.
run: |
. ~/venv/bin/activate
python setup.py develop
transformers_install=$(pip list -e | grep transformers)
transformers_install_array=($transformers_install)
transformers_loc=${transformers_install_array[-1]}
transformers_repo_loc=$(pwd .)
if [ "$transformers_loc" != "$transformers_repo_loc" ]; then
echo "transformers is from $transformers_loc but it shoud be from $transformers_repo_loc/src."
echo "A fix is required. Stop testing."
exit 1
fi
- name: Create model files
run: |
. ~/venv/bin/activate
transformers-cli add-new-model-like --config_file tests/fixtures/add_distilbert_like_config.json --path_to_repo .
make style
make fix-copies
- name: Run all PyTorch modeling test
run: |
. ~/venv/bin/activate
python -m pytest -n 2 --dist=loadfile -s --make-reports=tests_new_models tests/bert_new/test_modeling_bert_new.py
- name: Run style changes
run: |
. ~/venv/bin/activate
make style && make quality && make repo-consistency
- name: Failure short reports

View File

@ -5,6 +5,7 @@ on:
branches:
- docker-image*
repository_dispatch:
workflow_call:
schedule:
- cron: "0 1 * * *"
@ -39,9 +40,35 @@ jobs:
push: true
tags: huggingface/transformers-all-latest-gpu
latest-with-torch-nightly-docker:
name: "Nightly PyTorch + Stable TensorFlow"
runs-on: ubuntu-latest
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
-
name: Check out code
uses: actions/checkout@v2
-
name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v2
with:
context: ./docker/transformers-all-latest-gpu
build-args: |
REF=main
PYTORCH=pre
push: true
tags: huggingface/transformers-all-latest-torch-nightly-gpu
latest-torch-deepspeed-docker:
name: "Latest PyTorch + DeepSpeed"
needs: latest-docker
runs-on: ubuntu-latest
steps:
-
@ -66,6 +93,32 @@ jobs:
push: true
tags: huggingface/transformers-pytorch-deepspeed-latest-gpu
nightly-torch-deepspeed-docker:
name: "Nightly PyTorch + DeepSpeed"
runs-on: ubuntu-latest
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
-
name: Check out code
uses: actions/checkout@v2
-
name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v2
with:
context: ./docker/transformers-pytorch-deepspeed-nightly-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-deepspeed-nightly-gpu
doc-builder:
name: "Doc builder"
runs-on: ubuntu-latest
@ -93,7 +146,6 @@ jobs:
latest-pytorch:
name: "Latest PyTorch [dev]"
runs-on: ubuntu-latest
needs: latest-torch-deepspeed-docker
steps:
-
name: Set up Docker Buildx
@ -118,7 +170,6 @@ jobs:
tags: huggingface/transformers-pytorch-gpu
latest-tensorflow:
needs: latest-pytorch
name: "Latest TensorFlow [dev]"
runs-on: ubuntu-latest
steps:

View File

@ -0,0 +1,108 @@
name: Build docker images (Past CI)
on:
push:
branches:
- past-ci-docker-image*
concurrency:
group: docker-images-builds
cancel-in-progress: false
jobs:
past-pytorch-docker:
name: "Past PyTorch Docker"
strategy:
fail-fast: false
matrix:
version: ["1.11", "1.10", "1.9", "1.8", "1.7", "1.6", "1.5", "1.4"]
runs-on: ubuntu-latest
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
-
name: Check out code
uses: actions/checkout@v2
-
name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v2
with:
context: ./docker/transformers-past-gpu
build-args: |
REF=main
FRAMEWORK=pytorch
VERSION=${{ matrix.version }}
push: true
tags: huggingface/transformers-pytorch-past-${{ matrix.version }}-gpu
past-tensorflow-docker:
name: "Past TensorFlow Docker"
strategy:
fail-fast: false
matrix:
version: ["2.8", "2.7", "2.6", "2.5"]
runs-on: ubuntu-latest
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
-
name: Check out code
uses: actions/checkout@v2
-
name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v2
with:
context: ./docker/transformers-past-gpu
build-args: |
REF=main
FRAMEWORK=tensorflow
VERSION=${{ matrix.version }}
push: true
tags: huggingface/transformers-tensorflow-past-${{ matrix.version }}-gpu
past-tensorflow-docker-2-4:
name: "Past TensorFlow Docker"
strategy:
fail-fast: false
matrix:
version: ["2.4"]
runs-on: ubuntu-latest
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
-
name: Check out code
uses: actions/checkout@v2
-
name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v2
with:
context: ./docker/transformers-past-gpu
build-args: |
REF=main
BASE_DOCKER_IMAGE=nvidia/cuda:11.0.3-cudnn8-devel-ubuntu20.04
FRAMEWORK=tensorflow
VERSION=${{ matrix.version }}
push: true
tags: huggingface/transformers-tensorflow-past-${{ matrix.version }}-gpu

View File

@ -15,6 +15,6 @@ jobs:
commit_sha: ${{ github.sha }}
package: transformers
notebook_folder: transformers_doc
languages: en es
languages: de en es it pt
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}

View File

@ -14,4 +14,4 @@ jobs:
commit_sha: ${{ github.event.pull_request.head.sha }}
pr_number: ${{ github.event.number }}
package: transformers
languages: en es
languages: de en es it pt

View File

@ -32,9 +32,7 @@ jobs:
- name: GPU visibility
run: |
utils/print_env_pt.py
TF_CPP_MIN_LOG_LEVEL=3 python3 -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
TF_CPP_MIN_LOG_LEVEL=3 python3 -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
python3 utils/print_env.py
- name: Prepare files for doctests
run: |

View File

@ -1,46 +0,0 @@
name: Torch hub integration
on:
push:
branches:
- "*"
jobs:
torch_hub_integration:
runs-on: ubuntu-latest
env:
# TODO quickfix but may need more investigation
ACTIONS_ALLOW_UNSECURE_COMMANDS: True
steps:
# no checkout necessary here.
- name: Extract branch name
run: echo "::set-env name=BRANCH::${GITHUB_REF#refs/heads/}"
- name: Check branch name
run: echo $BRANCH
- name: Set up Python
uses: actions/setup-python@v1
with:
python-version: 3.7
- name: Loading cache
uses: actions/cache@v2
id: cache
with:
path: ~/.cache/pip
key: v0-torch_hub-${{ hashFiles('setup.py') }}
- name: Install dependencies
run: |
pip install --upgrade pip
# install torch-hub specific dependencies
pip install -e git+https://github.com/huggingface/transformers.git#egg=transformers[torchhub]
# no longer needed
pip uninstall -y transformers
#- name: Torch hub list
# run: |
# python -c "import torch; print(torch.hub.list('huggingface/transformers:$BRANCH'))"
#- name: Torch hub help
# run: |
# python -c "import torch; print(torch.hub.help('huggingface/transformers:$BRANCH', 'modelForSequenceClassification'))"

View File

@ -1,43 +1,51 @@
name: Model templates runner
on:
push:
branches:
- main
pull_request:
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
types: [assigned, opened, synchronize, reopened]
repository_dispatch:
schedule:
- cron: "0 2 * * *"
jobs:
run_tests_templates:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v1
- name: Install Python
uses: actions/setup-python@v1
with:
python-version: 3.6
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: ~/.cache/pip
key: v1.2-tests_templates-${{ hashFiles('setup.py') }}
uses: actions/checkout@v2
- name: Install dependencies
run: |
pip install --upgrade pip!=21.3
sudo apt -y update && sudo apt install -y libsndfile1-dev
pip install .[dev]
- name: Load cached virtual environment
uses: actions/cache@v2
id: cache
with:
path: ~/venv/
key: v4-tests_templates-${{ hashFiles('setup.py') }}
- name: Create virtual environment on cache miss
if: steps.cache.outputs.cache-hit != 'true'
run: |
python -m venv ~/venv && . ~/venv/bin/activate
pip install --upgrade pip!=21.3
pip install -e .[dev]
- name: Check transformers location
# make `transformers` available as package (required since we use `-e` flag) and check it's indeed from the repo.
run: |
. ~/venv/bin/activate
python setup.py develop
transformer_loc=$(pip show transformers | grep "Location: " | cut -c11-)
transformer_repo_loc=$(pwd .)
if [ "$transformer_loc" != "$transformer_repo_loc/src" ]; then
echo "transformers is from $transformer_loc but it shoud be from $transformer_repo_loc/src."
echo "A fix is required. Stop testing."
exit 1
fi
- name: Create model files
run: |
. ~/venv/bin/activate
transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/encoder-bert-tokenizer.json --path=templates/adding_a_new_model
transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/pt-encoder-bert-tokenizer.json --path=templates/adding_a_new_model
transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/standalone.json --path=templates/adding_a_new_model
@ -53,11 +61,12 @@ jobs:
- name: Run all non-slow tests
run: |
. ~/venv/bin/activate
python -m pytest -n 2 --dist=loadfile -s --make-reports=tests_templates tests/*template*
- name: Run style changes
run: |
git fetch origin main:main
. ~/venv/bin/activate
make style && make quality && make repo-consistency
- name: Failure short reports

View File

@ -1,250 +1,264 @@
name: Self-hosted runner; Nightly (scheduled)
name: Self-hosted runner (nightly)
# Note that each job's dependencies go into a corresponding docker file.
#
# For example for `run_all_tests_torch_cuda_extensions_gpu` the docker image is
# `huggingface/transformers-pytorch-deepspeed-latest-gpu`, which can be found at
# `docker/transformers-pytorch-deepspeed-latest-gpu/Dockerfile`
on:
push:
branches:
- nightly_ci*
repository_dispatch:
schedule:
- cron: "0 0 */3 * *"
repository_dispatch:
# Disable temporarily until the test suite can be run under 12 hours.
# schedule:
# - cron: "0 16 * * *"
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
RUN_SLOW: yes
OMP_NUM_THREADS: 16
MKL_NUM_THREADS: 16
PYTEST_TIMEOUT: 600
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
RUN_PT_TF_CROSS_TESTS: 1
jobs:
run_all_tests_torch_gpu:
runs-on: [self-hosted, docker-gpu, single-gpu]
container:
image: pytorch/pytorch:1.10.0-cuda11.3-cudnn8-runtime
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Launcher docker
uses: actions/checkout@v2
run_check_runners:
name: Check Runners
strategy:
matrix:
machine_type: [single-gpu, multi-gpu]
runs-on: ${{ format('{0}-{1}', matrix.machine_type, 'docker') }}
container:
image: huggingface/transformers-all-latest-torch-nightly-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: NVIDIA-SMI
run: |
nvidia-smi
setup:
name: Setup
needs: run_check_runners
strategy:
matrix:
machine_type: [single-gpu, multi-gpu]
runs-on: ${{ format('{0}-{1}', matrix.machine_type, 'docker') }}
container:
image: huggingface/transformers-all-latest-torch-nightly-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- name: Update clone
working-directory: /transformers
run: |
git fetch && git checkout ${{ github.sha }}
- name: Install dependencies
run: |
apt -y update && apt install -y libsndfile1-dev git espeak-ng
pip install --upgrade pip
pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
pip install https://github.com/kpu/kenlm/archive/master.zip
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html -U
- name: Cleanup
working-directory: /transformers
run: |
rm -rf tests/__pycache__
rm -rf tests/models/__pycache__
rm -rf reports
- name: Are GPUs recognized by our DL frameworks
run: |
utils/print_env_pt.py
- id: set-matrix
name: Identify models to test
working-directory: /transformers/tests
run: |
echo "::set-output name=matrix::$(python3 -c 'import os; tests = os.getcwd(); model_tests = os.listdir(os.path.join(tests, "models")); d1 = sorted(list(filter(os.path.isdir, os.listdir(tests)))); d2 = sorted(list(filter(os.path.isdir, [f"models/{x}" for x in model_tests]))); d1.remove("models"); d = d2 + d1; print(d)')"
- name: Run all tests on GPU
run: |
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_gpu tests
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_gpu_failures_short.txt
run_tests_single_gpu:
name: Model tests
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [single-gpu]
runs-on: ${{ format('{0}-{1}', matrix.machine_type, 'docker') }}
container:
image: huggingface/transformers-all-latest-torch-nightly-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs: [run_check_runners, setup]
steps:
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Run examples tests on GPU
if: ${{ always() }}
env:
OMP_NUM_THREADS: 16
MKL_NUM_THREADS: 16
RUN_SLOW: yes
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
run: |
pip install -r examples/pytorch/_tests_requirements.txt
python -m pytest -n 1 -v --dist=loadfile --make-reports=examples_torch_gpu examples
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: Failure short reports
if: ${{ always() }}
run: cat reports/examples_torch_gpu_failures_short.txt
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Run all pipeline tests on GPU
if: ${{ always() }}
env:
RUN_PIPELINE_TESTS: yes
run: |
python -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=tests_torch_pipeline_gpu tests
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_pipeline_gpu_failures_short.txt
- name: Run all tests on GPU
working-directory: /transformers
run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }}
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_torch_gpu_test_reports
path: reports
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
run_all_tests_torch_multi_gpu:
runs-on: [self-hosted, docker-gpu, multi-gpu]
container:
image: pytorch/pytorch:1.10.0-cuda11.3-cudnn8-runtime
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Launcher docker
uses: actions/checkout@v2
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}
- name: NVIDIA-SMI
continue-on-error: true
run: |
nvidia-smi
run_tests_multi_gpu:
name: Model tests
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [multi-gpu]
runs-on: ${{ format('{0}-{1}', matrix.machine_type, 'docker') }}
container:
image: huggingface/transformers-all-latest-torch-nightly-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs: [run_check_runners, setup]
steps:
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Install dependencies
run: |
apt -y update && apt install -y libsndfile1-dev git espeak-ng
pip install --upgrade pip
pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
pip install https://github.com/kpu/kenlm/archive/master.zip
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html -U
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: Are GPUs recognized by our DL frameworks
run: |
utils/print_env_pt.py
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Run all tests on GPU
env:
MKL_SERVICE_FORCE_INTEL: 1
run: |
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_multi_gpu tests
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_multi_gpu_failures_short.txt
- name: Run all tests on GPU
working-directory: /transformers
run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }}
- name: Run all pipeline tests on GPU
if: ${{ always() }}
env:
RUN_PIPELINE_TESTS: yes
run: |
python -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=tests_torch_pipeline_multi_gpu tests
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_pipeline_multi_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_torch_multi_gpu_test_reports
path: reports
run_all_tests_torch_cuda_extensions_gpu:
name: Torch CUDA extension tests
strategy:
fail-fast: false
matrix:
machine_type: [single-gpu, multi-gpu]
runs-on: ${{ format('{0}-{1}', matrix.machine_type, 'docker') }}
needs: [run_check_runners, setup]
container:
image: huggingface/transformers-pytorch-deepspeed-nightly-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /workspace/transformers
run: git fetch && git checkout ${{ github.sha }}
run_all_tests_torch_cuda_extensions_gpu:
runs-on: [self-hosted, docker-gpu, single-gpu]
container:
image: nvcr.io/nvidia/pytorch:21.03-py3
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Launcher docker
uses: actions/checkout@v2
- name: Remove cached torch extensions
run: rm -rf /github/home/.cache/torch_extensions/
- name: NVIDIA-SMI
run: |
nvidia-smi
# To avoid unknown test failures
- name: Pre build DeepSpeed *again*
working-directory: /workspace
run: |
python3 -m pip uninstall -y deepspeed
rm -rf DeepSpeed
git clone https://github.com/microsoft/DeepSpeed && cd DeepSpeed && rm -rf build
DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 python3 -m pip install . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
- name: Install dependencies
run: |
apt -y update && apt install -y libaio-dev libsndfile1-dev git espeak-ng
pip install --upgrade pip
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html -U
pip install .[testing,deepspeed]
pip install https://github.com/kpu/kenlm/archive/master.zip
pip install git+https://github.com/microsoft/DeepSpeed
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Are GPUs recognized by our DL frameworks
run: |
utils/print_env_pt.py
- name: Environment
working-directory: /workspace/transformers
run: |
python utils/print_env.py
- name: Run all tests on GPU
run: |
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended
- name: Run all tests on GPU
working-directory: /workspace/transformers
run: |
python -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_cuda_extensions_gpu_failures_short.txt
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /workspace/transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu/failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_tests_torch_cuda_extensions_gpu_test_reports
path: reports
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: ${{ matrix.machine_type }}_run_tests_torch_cuda_extensions_gpu_test_reports
path: /workspace/transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu
run_all_tests_torch_cuda_extensions_multi_gpu:
runs-on: [self-hosted, docker-gpu, multi-gpu]
container:
image: nvcr.io/nvidia/pytorch:21.03-py3
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Launcher docker
uses: actions/checkout@v2
send_results:
name: Send results to webhook
runs-on: ubuntu-latest
if: always()
needs: [run_check_runners, setup, run_tests_single_gpu, run_tests_multi_gpu, run_all_tests_torch_cuda_extensions_gpu]
steps:
- name: Preliminary job status
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
echo "Runner status: ${{ needs.run_check_runners.result }}"
echo "Setup status: ${{ needs.setup.result }}"
- name: NVIDIA-SMI
continue-on-error: true
run: |
nvidia-smi
- name: Install dependencies
run: |
apt -y update && apt install -y libaio-dev libsndfile1-dev git espeak-ng
pip install --upgrade pip
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html -U
rm -rf ~/.cache/torch_extensions/ # shared between conflicting builds
pip install .[testing,fairscale]
pip install https://github.com/kpu/kenlm/archive/master.zip
pip install git+https://github.com/microsoft/DeepSpeed # testing bleeding edge
- name: Are GPUs recognized by our DL frameworks
run: |
utils/print_env_pt.py
- name: Run all tests on GPU
run: |
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_cuda_extensions_multi_gpu tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_cuda_extensions_multi_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_tests_torch_cuda_extensions_multi_gpu_test_reports
path: reports
send_results:
name: Send results to webhook
runs-on: ubuntu-latest
if: always()
needs: [
run_all_tests_torch_gpu,
run_all_tests_torch_multi_gpu,
run_all_tests_torch_cuda_extensions_gpu,
run_all_tests_torch_cuda_extensions_multi_gpu
]
steps:
- uses: actions/checkout@v2
- uses: actions/download-artifact@v2
- name: Send message to Slack
env:
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
CI_SLACK_CHANNEL_ID_PAST_FUTURE: ${{ secrets.CI_SLACK_CHANNEL_ID_PAST_FUTURE }}
run: |
pip install slack_sdk
python utils/notification_service.py scheduled nightly-torch
- uses: actions/checkout@v2
- uses: actions/download-artifact@v2
- name: Send message to Slack
env:
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }}
CI_SLACK_REPORT_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID_PAST_FUTURE }}
CI_EVENT: nightly-build
SETUP_STATUS: ${{ needs.setup.result }}
RUNNER_STATUS: ${{ needs.run_check_runners.result }}
# We pass `needs.setup.outputs.matrix` as the argument. A processing in `notification_service.py` to change
# `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`.
run: |
pip install slack_sdk
python utils/notification_service.py "${{ needs.setup.outputs.matrix }}"

136
.github/workflows/self-past-caller.yml vendored Normal file
View File

@ -0,0 +1,136 @@
name: Self-hosted runner (past-ci-caller)
on:
push:
branches:
- run-past-ci*
jobs:
run_past_ci_pytorch_1-11:
name: PyTorch 1.11
if: always()
uses: ./.github/workflows/self-past.yml
with:
framework: pytorch
version: "1.11"
secrets: inherit
run_past_ci_pytorch_1-10:
name: PyTorch 1.10
if: always()
needs: [run_past_ci_pytorch_1-11]
uses: ./.github/workflows/self-past.yml
with:
framework: pytorch
version: "1.10"
secrets: inherit
run_past_ci_pytorch_1-9:
name: PyTorch 1.9
if: always()
needs: [run_past_ci_pytorch_1-10]
uses: ./.github/workflows/self-past.yml
with:
framework: pytorch
version: "1.9"
secrets: inherit
run_past_ci_pytorch_1-8:
name: PyTorch 1.8
if: always()
needs: [run_past_ci_pytorch_1-9]
uses: ./.github/workflows/self-past.yml
with:
framework: pytorch
version: "1.8"
secrets: inherit
run_past_ci_pytorch_1-7:
name: PyTorch 1.7
if: always()
needs: [run_past_ci_pytorch_1-8]
uses: ./.github/workflows/self-past.yml
with:
framework: pytorch
version: "1.7"
secrets: inherit
run_past_ci_pytorch_1-6:
name: PyTorch 1.6
if: always()
needs: [run_past_ci_pytorch_1-7]
uses: ./.github/workflows/self-past.yml
with:
framework: pytorch
version: "1.6"
secrets: inherit
run_past_ci_pytorch_1-5:
name: PyTorch 1.5
if: always()
needs: [run_past_ci_pytorch_1-6]
uses: ./.github/workflows/self-past.yml
with:
framework: pytorch
version: "1.5"
secrets: inherit
run_past_ci_pytorch_1-4:
name: PyTorch 1.4
if: always()
needs: [run_past_ci_pytorch_1-5]
uses: ./.github/workflows/self-past.yml
with:
framework: pytorch
version: "1.4"
secrets: inherit
run_past_ci_tensorflow_2-8:
name: TensorFlow 2.8
if: always()
needs: [run_past_ci_pytorch_1-4]
uses: ./.github/workflows/self-past.yml
with:
framework: tensorflow
version: "2.8"
secrets: inherit
run_past_ci_tensorflow_2-7:
name: TensorFlow 2.7
if: always()
needs: [run_past_ci_tensorflow_2-8]
uses: ./.github/workflows/self-past.yml
with:
framework: tensorflow
version: "2.7"
secrets: inherit
run_past_ci_tensorflow_2-6:
name: TensorFlow 2.6
if: always()
needs: [run_past_ci_tensorflow_2-7]
uses: ./.github/workflows/self-past.yml
with:
framework: tensorflow
version: "2.6"
secrets: inherit
run_past_ci_tensorflow_2-5:
name: TensorFlow 2.5
if: always()
needs: [run_past_ci_tensorflow_2-6]
uses: ./.github/workflows/self-past.yml
with:
framework: tensorflow
version: "2.5"
secrets: inherit
run_past_ci_tensorflow_2-4:
name: TensorFlow 2.4
if: always()
needs: [run_past_ci_tensorflow_2-5]
uses: ./.github/workflows/self-past.yml
with:
framework: tensorflow
version: "2.4"
secrets: inherit

216
.github/workflows/self-past.yml vendored Normal file
View File

@ -0,0 +1,216 @@
name: Self-hosted runner (past)
# Note that each job's dependencies go into a corresponding docker file.
#
# For example for `run_all_tests_torch_cuda_extensions_gpu` the docker image is
# `huggingface/transformers-pytorch-deepspeed-latest-gpu`, which can be found at
# `docker/transformers-pytorch-deepspeed-latest-gpu/Dockerfile`
on:
workflow_call:
inputs:
framework:
required: true
type: string
version:
required: true
type: string
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
RUN_PT_TF_CROSS_TESTS: 1
jobs:
setup:
name: Setup
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps:
- name: Checkout transformers
uses: actions/checkout@v2
with:
fetch-depth: 2
- name: Cleanup
run: |
rm -rf tests/__pycache__
rm -rf tests/models/__pycache__
rm -rf reports
- id: set-matrix
name: Identify models to test
run: |
cd tests
echo "::set-output name=matrix::$(python3 -c 'import os; tests = os.getcwd(); model_tests = os.listdir(os.path.join(tests, "models")); d1 = sorted(list(filter(os.path.isdir, os.listdir(tests)))); d2 = sorted(list(filter(os.path.isdir, [f"models/{x}" for x in model_tests]))); d1.remove("models"); d = d2 + d1; print(d)')"
run_check_runners:
name: Check Runners
needs: setup
strategy:
matrix:
machine_type: [single-gpu, multi-gpu]
runs-on: ${{ format('{0}-{1}', matrix.machine_type, 'docker-past-ci') }}
container:
image: huggingface/transformers-${{ inputs.framework }}-past-${{ inputs.version }}-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: NVIDIA-SMI
run: |
nvidia-smi
run_tests_single_gpu:
name: Model tests
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [single-gpu]
runs-on: ${{ format('{0}-{1}', matrix.machine_type, 'docker-past-ci') }}
container:
image: huggingface/transformers-${{ inputs.framework }}-past-${{ inputs.version }}-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs: [setup, run_check_runners]
steps:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Run all tests on GPU
working-directory: /transformers
run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}
run_tests_multi_gpu:
name: Model tests
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [multi-gpu]
runs-on: ${{ format('{0}-{1}', matrix.machine_type, 'docker-past-ci') }}
container:
image: huggingface/transformers-${{ inputs.framework }}-past-${{ inputs.version }}-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs: [setup, run_check_runners]
steps:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Run all tests on GPU
working-directory: /transformers
run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}
send_results:
name: Send results to webhook
runs-on: ubuntu-latest
if: always()
needs: [setup, run_check_runners, run_tests_single_gpu, run_tests_multi_gpu]
steps:
- name: Preliminary job status
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
echo "Runner status: ${{ needs.run_check_runners.result }}"
echo "Setup status: ${{ needs.setup.result }}"
- uses: actions/checkout@v2
- uses: actions/download-artifact@v2
# Create a directory to store test failure tables in the next step
- name: Create directory
run: mkdir test_failure_tables
- name: Send message to Slack
env:
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }}
CI_SLACK_REPORT_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID_PAST_FUTURE }}
CI_EVENT: Past CI - ${{ inputs.framework }}-${{ inputs.version }}
SETUP_STATUS: ${{ needs.setup.result }}
RUNNER_STATUS: ${{ needs.run_check_runners.result }}
# We pass `needs.setup.outputs.matrix` as the argument. A processing in `notification_service.py` to change
# `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`.
run: |
pip install slack_sdk
python utils/notification_service.py "${{ needs.setup.outputs.matrix }}"
# Upload complete failure tables, as they might be big and only truncated versions could be sent to Slack.
- name: Failure table artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: test_failure_tables_${{ inputs.framework }}-${{ inputs.version }}
path: test_failure_tables

52
.github/workflows/self-push-caller.yml vendored Normal file
View File

@ -0,0 +1,52 @@
# Used to trigger self-push CI
name: Self-hosted runner (push-caller)
on:
push:
branches:
- main
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
jobs:
check-for-setup:
runs-on: ubuntu-latest
name: Check if setup was changed
outputs:
changed: ${{ steps.was_changed.outputs.changed }}
steps:
- uses: actions/checkout@v3
with:
fetch-depth: "2"
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@v22.2
- name: Was setup changed
id: was_changed
run: |
for file in ${{ steps.changed-files.outputs.all_changed_files }}; do
if [ `basename "${file}"` = "setup.py" ]; then
echo ::set-output name=changed::"1"
fi
done
build-docker-containers:
needs: check-for-setup
if: (github.event_name == 'push') && (needs.check-for-setup.outputs.changed == '1')
uses: ./.github/workflows/build-docker-images.yml
secrets: inherit
run_push_ci:
name: Trigger Push CI
runs-on: ubuntu-latest
if: ${{ always() }}
needs: build-docker-containers
steps:
- name: Trigger push CI via workflow_run
run: echo "Trigger push CI via workflow_run"

View File

@ -1,9 +1,12 @@
name: Self-hosted runner (push)
on:
workflow_run:
workflows: ["Self-hosted runner (push-caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- main
- ci_*
- ci-*
paths:
@ -20,37 +23,68 @@ env:
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
PYTEST_TIMEOUT: 60
TF_FORCE_GPU_ALLOW_GROWTH: true
RUN_PT_TF_CROSS_TESTS: 1
jobs:
run_tests_torch_gpu:
runs-on: [self-hosted, docker-gpu, single-gpu]
container:
image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
setup:
name: Setup
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
test_map: ${{ steps.set-matrix.outputs.test_map }}
steps:
- name: Install dependencies
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# `CI_BRANCH_PUSH`: The branch name from the push event
# `CI_BRANCH_WORKFLOW_RUN`: The name of the branch on which this workflow is triggered by `workflow_run` event
# `CI_BRANCH`: The non-empty branch name from the above two (one and only one of them is empty)
# `CI_SHA_PUSH`: The commit SHA from the push event
# `CI_SHA_WORKFLOW_RUN`: The commit SHA that triggers this workflow by `workflow_run` event
# `CI_SHA`: The non-empty commit SHA from the above two (one and only one of them is empty)
run: |
apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git
apt install -y libsndfile1-dev espeak-ng
pip install --upgrade pip
pip install .[sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
pip install https://github.com/kpu/kenlm/archive/master.zip
CI_BRANCH_PUSH=${{ github.event.ref }}
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
CI_BRANCH_WORKFLOW_RUN=${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH=${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN=${{ github.event.workflow_run.head_sha }}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: Launcher docker
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Checkout transformers
uses: actions/checkout@v2
with:
fetch-depth: 2
- name: NVIDIA-SMI
- name: Update clone using environment variables
run: |
nvidia-smi
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Are GPUs recognized by our DL frameworks
- name: Cleanup
run: |
utils/print_env_pt.py
rm -rf tests/__pycache__
rm -rf tests/models/__pycache__
rm -rf reports
- name: Fetch the tests to run
# TODO: add `git-python` in the docker images
run: |
pip install --upgrade git-python
python utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
- name: Report fetched tests
@ -59,441 +93,445 @@ jobs:
name: test_fetched
path: test_preparation.txt
- name: Run all non-slow tests on GPU
- id: set-matrix
name: Organize tests into models
# The `keys` is used as GitHub actions matrix for jobs, i.e. `models/bert`, `tokenization`, `pipeline`, etc.
# The `test_map` is used to get the actual identified test files under each key.
# If no test to run (so no `test_map.json` file), create a dummy map (empty matrix will fail)
run: |
if [ -f test_list.txt ]; then
python -m pytest -n 2 --dist=loadfile -v --make-reports=tests_torch_gpu $(cat test_list.txt)
if [ -f test_map.json ]; then
keys=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); d = list(test_map.keys()); print(d)')
test_map=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); print(test_map)')
else
keys=$(python3 -c 'keys = ["dummy"]; print(keys)')
test_map=$(python3 -c 'test_map = {"dummy": []}; print(test_map)')
fi
echo $keys
echo $test_map
echo "::set-output name=matrix::$keys"
echo "::set-output name=test_map::$test_map"
run_check_runners:
name: Check Runners
needs: setup
strategy:
matrix:
machine_type: [single-gpu, multi-gpu]
runs-on: [self-hosted, docker-gpu, '${{ matrix.machine_type }}']
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: NVIDIA-SMI
run: |
nvidia-smi
run_tests_single_gpu:
name: Model tests
needs: [setup, run_check_runners]
# `dummy` means there is no test to run
if: contains(fromJson(needs.setup.outputs.matrix), 'dummy') != true
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [single-gpu]
runs-on: [self-hosted, docker-gpu, '${{ matrix.machine_type }}']
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${{ github.event.ref }}
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
CI_BRANCH_WORKFLOW_RUN=${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH=${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN=${{ github.event.workflow_run.head_sha }}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Update clone using environment variables
working-directory: /transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
echo "${{ fromJson(needs.setup.outputs.test_map)[matrix.folders] }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Run all non-slow selected tests on GPU
working-directory: /transformers
run: |
python3 -m pytest -n 2 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} ${{ fromJson(needs.setup.outputs.test_map)[matrix.folders] }}
- name: Failure short reports
if: ${{ failure() }}
run: cat reports/tests_torch_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_torch_gpu_test_reports
path: reports
run_tests_flax_gpu:
runs-on: [self-hosted, docker-gpu-test, single-gpu]
container:
image: tensorflow/tensorflow:2.4.1-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Set up Python 3.7
uses: actions/setup-python@v2
with:
python-version: 3.7
- name: Install dependencies
run: |
apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git espeak-ng
pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
pip install --upgrade pip
pip install .[sklearn,testing,sentencepiece,flax,flax-speech,vision]
- name: Launcher docker
uses: actions/checkout@v2
with:
fetch-depth: 2
- name: NVIDIA-SMI
continue-on-error: true
run: |
nvidia-smi
- name: Are GPUs recognized by our DL frameworks
run: |
python -c "from jax.lib import xla_bridge; print('GPU available:', xla_bridge.get_backend().platform)"
python -c "import jax; print('Number of GPUs available:', len(jax.local_devices()))"
- name: Fetch the tests to run
run: |
python utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
- name: Report fetched tests
uses: actions/upload-artifact@v2
with:
name: test_fetched
path: test_preparation.txt
- name: Run all non-slow tests on GPU
run: |
if [ -f test_list.txt ]; then
python -m pytest -n 2 --dist=loadfile -v --make-reports=tests_flax_gpu $(cat test_list.txt)
fi
- name: Failure short reports
if: ${{ failure() }}
run: cat reports/tests_flax_gpu_failures_short.txt
run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_flax_gpu_test_reports
path: reports
name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}
# run_tests_tf_gpu:
# runs-on: [self-hosted, docker-gpu, single-gpu]
# timeout-minutes: 120
# container:
# image: tensorflow/tensorflow:2.4.1-gpu
# options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
# steps:
# - name: Install dependencies
# run: |
# apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git espeak-ng
# pip install --upgrade pip
# pip install .[sklearn,testing,onnxruntime,sentencepiece,tf-speech]
# pip install https://github.com/kpu/kenlm/archive/master.zip
#
# - name: Launcher docker
# uses: actions/checkout@v2
# with:
# fetch-depth: 2
#
# - name: NVIDIA-SMI
# run: |
# nvidia-smi
#
# - name: Are GPUs recognized by our DL frameworks
# run: |
# TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
# TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
#
# - name: Fetch the tests to run
# run: |
# python utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
#
# - name: Report fetched tests
# uses: actions/upload-artifact@v2
# with:
# name: test_fetched
# path: test_preparation.txt
#
# - name: Run all non-slow tests on GPU
# env:
# TF_NUM_INTRAOP_THREADS: 8
# TF_NUM_INTEROP_THREADS: 1
# run: |
# if [ -f test_list.txt ]; then
# python -m pytest -n 1 --dist=loadfile --make-reports=tests_tf_gpu $(cat test_list.txt)
# fi
#
# - name: Failure short reports
# if: ${{ failure() }}
# run: cat reports/tests_tf_gpu_failures_short.txt
#
# - name: Test suite reports artifacts
# if: ${{ always() }}
# uses: actions/upload-artifact@v2
# with:
# name: run_all_tests_tf_gpu_test_reports
# path: reports
run_tests_torch_multi_gpu:
runs-on: [self-hosted, docker-gpu, multi-gpu]
run_tests_multi_gpu:
name: Model tests
needs: [setup, run_check_runners]
# `dummy` means there is no test to run
if: contains(fromJson(needs.setup.outputs.matrix), 'dummy') != true
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [multi-gpu]
runs-on: [self-hosted, docker-gpu, '${{ matrix.machine_type }}']
container:
image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
image: huggingface/transformers-all-latest-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Install dependencies
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git espeak-ng
apt install -y libsndfile1-dev espeak-ng
pip install --upgrade pip
pip install .[sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
pip install https://github.com/kpu/kenlm/archive/master.zip
- name: Launcher docker
uses: actions/checkout@v2
with:
fetch-depth: 2
CI_BRANCH_PUSH=${{ github.event.ref }}
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
CI_BRANCH_WORKFLOW_RUN=${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH=${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN=${{ github.event.workflow_run.head_sha }}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Update clone using environment variables
working-directory: /transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
echo "${{ fromJson(needs.setup.outputs.test_map)[matrix.folders] }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: NVIDIA-SMI
continue-on-error: true
run: |
nvidia-smi
- name: Are GPUs recognized by our DL frameworks
- name: Environment
working-directory: /transformers
run: |
utils/print_env_pt.py
python3 utils/print_env.py
- name: Fetch the tests to run
run: |
python utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
- name: Report fetched tests
uses: actions/upload-artifact@v2
with:
name: test_fetched
path: test_preparation.txt
- name: Run all non-slow tests on GPU
- name: Run all non-slow selected tests on GPU
env:
MKL_SERVICE_FORCE_INTEL: 1
working-directory: /transformers
run: |
if [ -f test_list.txt ]; then
python -m pytest -n 2 --dist=loadfile -v --make-reports=tests_torch_multi_gpu $(cat test_list.txt)
fi
python3 -m pytest -n 2 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} ${{ fromJson(needs.setup.outputs.test_map)[matrix.folders] }}
- name: Failure short reports
if: ${{ failure() }}
run: cat reports/tests_torch_multi_gpu_failures_short.txt
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_torch_multi_gpu_test_reports
path: reports
name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}
# run_tests_flax_multi_gpu:
# runs-on: [self-hosted, docker-gpu, multi-gpu]
# container:
# image: tensorflow/tensorflow:2.4.1-gpu
# options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
# steps:
# - name: Install dependencies
# run: |
# apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git espeak-ng
# pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
# pip install --upgrade pip
# pip install .[sklearn,testing,sentencepiece,flax,flax-speech,vision]
# pip install https://github.com/kpu/kenlm/archive/master.zip
#
# - name: Launcher docker
# uses: actions/checkout@v2
# with:
# fetch-depth: 2
#
# - name: NVIDIA-SMI
# continue-on-error: true
# run: |
# nvidia-smi
#
# - name: Are GPUs recognized by our DL frameworks
# run: |
# python -c "from jax.lib import xla_bridge; print('GPU available:', xla_bridge.get_backend().platform)"
# python -c "import jax; print('Number of GPUs available:', len(jax.local_devices()))"
#
# - name: Fetch the tests to run
# run: |
# python utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
#
# - name: Report fetched tests
# uses: actions/upload-artifact@v2
# with:
# name: test_fetched
# path: test_preparation.txt
#
# - name: Run all non-slow tests on GPU
# run: |
# if [ -f test_list.txt ]; then
# python -m pytest -n 2 --dist=loadfile -v --make-reports=tests_flax_multi_gpu $(cat test_list.txt)
# fi
#
# - name: Failure short reports
# if: ${{ failure() }}
# run: cat reports/tests_flax_multi_gpu_failures_short.txt
#
# - name: Test suite reports artifacts
# if: ${{ always() }}
# uses: actions/upload-artifact@v2
# with:
# name: run_all_tests_flax_multi_gpu_test_reports
# path: reports
# run_tests_tf_multi_gpu:
# runs-on: [self-hosted, docker-gpu, multi-gpu]
# timeout-minutes: 120
# container:
# image: tensorflow/tensorflow:2.4.1-gpu
# options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
# steps:
# - name: Install dependencies
# run: |
# apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git espeak-ng
# pip install --upgrade pip
# pip install .[sklearn,testing,onnxruntime,sentencepiece,tf-speech]
# pip install https://github.com/kpu/kenlm/archive/master.zip
#
# - name: Launcher docker
# uses: actions/checkout@v2
# with:
# fetch-depth: 2
#
# - name: NVIDIA-SMI
# run: |
# nvidia-smi
#
# - name: Are GPUs recognized by our DL frameworks
# run: |
# TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
# TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
#
# - name: Fetch the tests to run
# run: |
# python utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
#
# - name: Report fetched tests
# uses: actions/upload-artifact@v2
# with:
# name: test_fetched
# path: test_preparation.txt
#
# - name: Run all non-slow tests on GPU
# env:
# TF_NUM_INTRAOP_THREADS: 8
# TF_NUM_INTEROP_THREADS: 1
# run: |
# if [ -f test_list.txt ]; then
# python -m pytest -n 1 --dist=loadfile --make-reports=tests_tf_multi_gpu $(cat test_list.txt)
# fi
#
# - name: Failure short reports
# if: ${{ failure() }}
# run: cat reports/tests_tf_multi_gpu_failures_short.txt
#
# - name: Test suite reports artifacts
# if: ${{ always() }}
# uses: actions/upload-artifact@v2
# with:
# name: run_all_tests_tf_multi_gpu_test_reports
# path: reports
run_tests_torch_cuda_extensions_gpu:
runs-on: [self-hosted, docker-gpu, single-gpu]
run_tests_torch_cuda_extensions_single_gpu:
name: Torch CUDA extension tests
needs: [setup, run_check_runners]
if: contains(fromJson(needs.setup.outputs.matrix), 'deepspeed') || contains(fromJson(needs.setup.outputs.matrix), 'extended')
strategy:
fail-fast: false
matrix:
machine_type: [single-gpu]
runs-on: [self-hosted, docker-gpu, '${{ matrix.machine_type }}']
container:
image: nvcr.io/nvidia/pytorch:21.03-py3
image: huggingface/transformers-pytorch-deepspeed-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Launcher docker
uses: actions/checkout@v2
with:
fetch-depth: 2
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${{ github.event.ref }}
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
CI_BRANCH_WORKFLOW_RUN=${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH=${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN=${{ github.event.workflow_run.head_sha }}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Update clone using environment variables
working-directory: /workspace/transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Remove cached torch extensions
run: rm -rf /github/home/.cache/torch_extensions/
# To avoid unknown test failures
- name: Pre build DeepSpeed *again*
working-directory: /workspace
run: |
python3 -m pip uninstall -y deepspeed
DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Install dependencies
- name: Environment
working-directory: /workspace/transformers
run: |
apt -y update && apt install -y libaio-dev
pip install --upgrade pip
pip install .[testing,deepspeed]
python utils/print_env.py
- name: Are GPUs recognized by our DL frameworks
- name: Run all non-slow selected tests on GPU
working-directory: /workspace/transformers
# TODO: Here we pass all tests in the 2 folders for simplicity. It's better to pass only the identified tests.
run: |
utils/print_env_pt.py
- name: Fetch the tests to run
run: |
python utils/tests_fetcher.py --diff_with_last_commit --filters tests/deepspeed tests/extended | tee test_preparation.txt
- name: Report fetched tests
uses: actions/upload-artifact@v2
with:
name: test_fetched
path: test_preparation.txt
- name: Run all tests on GPU
run: |
if [ -f test_list.txt ]; then
python -m pytest -n 1 --dist=loadfile -v --make-reports=tests_torch_cuda_extensions_gpu $(cat test_list.txt)
fi
python -m pytest -n 1 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ failure() }}
run: cat reports/tests_torch_cuda_extensions_gpu_failures_short.txt
continue-on-error: true
run: cat /workspace/transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu/failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_tests_torch_cuda_extensions_gpu_test_reports
path: reports
name: ${{ matrix.machine_type }}_run_tests_torch_cuda_extensions_gpu_test_reports
path: /workspace/transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu
run_tests_torch_cuda_extensions_multi_gpu:
runs-on: [self-hosted, docker-gpu, multi-gpu]
name: Torch CUDA extension tests
needs: [setup, run_check_runners]
if: contains(fromJson(needs.setup.outputs.matrix), 'deepspeed') || contains(fromJson(needs.setup.outputs.matrix), 'extended')
strategy:
fail-fast: false
matrix:
machine_type: [multi-gpu]
runs-on: [self-hosted, docker-gpu, '${{ matrix.machine_type }}']
container:
image: nvcr.io/nvidia/pytorch:21.03-py3
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
image: huggingface/transformers-pytorch-deepspeed-latest-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Launcher docker
uses: actions/checkout@v2
with:
fetch-depth: 2
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${{ github.event.ref }}
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
CI_BRANCH_WORKFLOW_RUN=${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH=${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN=${{ github.event.workflow_run.head_sha }}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Update clone using environment variables
working-directory: /workspace/transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Remove cached torch extensions
run: rm -rf /github/home/.cache/torch_extensions/
# To avoid unknown test failures
- name: Pre build DeepSpeed *again*
working-directory: /workspace
run: |
python3 -m pip uninstall -y deepspeed
DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
- name: NVIDIA-SMI
continue-on-error: true
run: |
nvidia-smi
- name: Install dependencies
- name: Environment
working-directory: /workspace/transformers
run: |
apt -y update && apt install -y libaio-dev
pip install --upgrade pip
rm -rf ~/.cache/torch_extensions/ # shared between conflicting builds
pip install .[testing,deepspeed,fairscale]
python utils/print_env.py
- name: Are GPUs recognized by our DL frameworks
- name: Run all non-slow selected tests on GPU
working-directory: /workspace/transformers
# TODO: Here we pass all tests in the 2 folders for simplicity. It's better to pass only the identified tests.
run: |
utils/print_env_pt.py
- name: Fetch the tests to run
run: |
python utils/tests_fetcher.py --diff_with_last_commit --filters tests/deepspeed tests/extended | tee test_preparation.txt
- name: Report fetched tests
uses: actions/upload-artifact@v2
with:
name: test_fetched
path: test_preparation.txt
- name: Run all tests on GPU
run: |
if [ -f test_list.txt ]; then
python -m pytest -n 1 --dist=loadfile -v --make-reports=tests_torch_cuda_extensions_multi_gpu $(cat test_list.txt)
fi
python -m pytest -n 1 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ failure() }}
run: cat reports/tests_torch_cuda_extensions_multi_gpu_failures_short.txt
continue-on-error: true
run: cat /workspace/transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu/failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_tests_torch_cuda_extensions_multi_gpu_test_reports
path: reports
name: ${{ matrix.machine_type }}_run_tests_torch_cuda_extensions_gpu_test_reports
path: /workspace/transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu
send_results:
name: Send results to webhook
runs-on: ubuntu-latest
if: always()
needs: [
run_tests_torch_gpu,
# run_tests_tf_gpu,
run_tests_torch_multi_gpu,
# run_tests_tf_multi_gpu,
run_tests_torch_cuda_extensions_gpu,
setup,
run_check_runners,
run_tests_single_gpu,
run_tests_multi_gpu,
run_tests_torch_cuda_extensions_single_gpu,
run_tests_torch_cuda_extensions_multi_gpu
]
steps:
- name: Preliminary job status
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
echo "Setup status: ${{ needs.setup.result }}"
echo "Runner status: ${{ needs.run_check_runners.result }}"
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${{ github.event.ref }}
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
CI_BRANCH_WORKFLOW_RUN=${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH=${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN=${{ github.event.workflow_run.head_sha }}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- uses: actions/checkout@v2
- uses: actions/download-artifact@v2
- name: Update clone using environment variables
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- uses: actions/download-artifact@v2
- name: Send message to Slack
env:
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }}
CI_SLACK_REPORT_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
CI_EVENT: push
CI_TITLE_PUSH: ${{ github.event.head_commit.message }}
CI_TITLE_WORKFLOW_RUN: ${{ github.event.workflow_run.head_commit.message }}
CI_SHA: ${{ env.CI_SHA }}
SETUP_STATUS: ${{ needs.setup.result }}
RUNNER_STATUS: ${{ needs.run_check_runners.result }}
# We pass `needs.setup.outputs.matrix` as the argument. A processing in `notification_service.py` to change
# `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`.
run: |
pip install slack_sdk
python utils/notification_service_deprecated.py push
python utils/notification_service.py "${{ needs.setup.outputs.matrix }}"

View File

@ -22,12 +22,27 @@ env:
RUN_PT_TF_CROSS_TESTS: 1
jobs:
setup:
name: Setup
run_check_runners:
name: Check Runners
strategy:
matrix:
machines: [multi-gpu-docker, single-gpu-docker]
runs-on: ${{ matrix.machines }}
machine_type: [single-gpu, multi-gpu]
runs-on: ${{ format('{0}-{1}', matrix.machine_type, 'docker') }}
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: NVIDIA-SMI
run: |
nvidia-smi
setup:
name: Setup
needs: run_check_runners
strategy:
matrix:
machine_type: [single-gpu, multi-gpu]
runs-on: ${{ format('{0}-{1}', matrix.machine_type, 'docker') }}
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
@ -43,60 +58,124 @@ jobs:
working-directory: /transformers
run: |
rm -rf tests/__pycache__
rm -rf tests/models/__pycache__
rm -rf reports
- id: set-matrix
name: Identify models to test
working-directory: /transformers/tests
run: |
echo "::set-output name=matrix::$(python3 -c 'import os; x = list(filter(os.path.isdir, os.listdir(os.getcwd()))); x.sort(); print(x)')"
echo "::set-output name=matrix::$(python3 -c 'import os; tests = os.getcwd(); model_tests = os.listdir(os.path.join(tests, "models")); d1 = sorted(list(filter(os.path.isdir, os.listdir(tests)))); d2 = sorted(list(filter(os.path.isdir, [f"models/{x}" for x in model_tests]))); d1.remove("models"); d = d2 + d1; print(d)')"
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: GPU visibility
working-directory: /transformers
run: |
utils/print_env_pt.py
TF_CPP_MIN_LOG_LEVEL=3 python3 -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
TF_CPP_MIN_LOG_LEVEL=3 python3 -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
run_tests_gpu:
run_tests_single_gpu:
name: Model tests
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machines: [multi-gpu-docker, single-gpu-docker]
runs-on: ${{ matrix.machines }}
machine_type: [single-gpu]
runs-on: ${{ format('{0}-{1}', matrix.machine_type, 'docker') }}
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs: setup
needs: [run_check_runners, setup]
steps:
- name: Echo folder ${{ matrix.folders }}
run: echo "${{ matrix.folders }}"
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Run all tests on GPU
working-directory: /transformers
run: python3 -m pytest -v --make-reports=${{ matrix.machines }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }}
run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machines }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: ${{ matrix.machines }}_run_all_tests_gpu_${{ matrix.folders }}_test_reports
path: /transformers/reports/${{ matrix.machines }}_tests_gpu_${{ matrix.folders }}
name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}
run_tests_multi_gpu:
name: Model tests
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [multi-gpu]
runs-on: ${{ format('{0}-{1}', matrix.machine_type, 'docker') }}
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs: [run_check_runners, setup]
steps:
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Run all tests on GPU
working-directory: /transformers
run: python3 -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: ${{ matrix.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ matrix.machine_type }}_tests_gpu_${{ matrix.folders }}
run_examples_gpu:
name: Examples directory
@ -104,109 +183,136 @@ jobs:
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs: setup
needs: [run_check_runners, setup]
steps:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Run examples tests on GPU
working-directory: /transformers
run: |
pip install -r examples/pytorch/_tests_requirements.txt
python3 -m pytest -v --make-reports=examples_gpu examples/pytorch
python3 -m pytest -v --make-reports=single-gpu_examples_gpu examples/pytorch
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/examples_gpu/failures_short.txt
run: cat /transformers/reports/single-gpu_examples_gpu/failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_examples_gpu
path: /transformers/reports/examples_gpu
name: single-gpu_run_examples_gpu
path: /transformers/reports/single-gpu_examples_gpu
run_pipelines_torch_gpu:
name: PyTorch pipelines
strategy:
fail-fast: false
matrix:
machines: [multi-gpu-docker, single-gpu-docker]
runs-on: ${{ matrix.machines }}
machine_type: [single-gpu, multi-gpu]
runs-on: ${{ format('{0}-{1}', matrix.machine_type, 'docker') }}
container:
image: huggingface/transformers-pytorch-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs: setup
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs: [run_check_runners, setup]
steps:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Run all pipeline tests on GPU
working-directory: /transformers
env:
RUN_PIPELINE_TESTS: yes
run: |
python3 -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=${{ matrix.machines }}_tests_torch_pipeline_gpu tests
python3 -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=${{ matrix.machine_type }}_tests_torch_pipeline_gpu tests
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machines }}_tests_torch_pipeline_gpu/failures_short.txt
run: cat /transformers/reports/${{ matrix.machine_type }}_tests_torch_pipeline_gpu/failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: ${{ matrix.machines }}_run_tests_torch_pipeline_gpu
path: /transformers/reports/${{ matrix.machines }}_tests_torch_pipeline_gpu
name: ${{ matrix.machine_type }}_run_tests_torch_pipeline_gpu
path: /transformers/reports/${{ matrix.machine_type }}_tests_torch_pipeline_gpu
run_pipelines_tf_gpu:
name: TensorFlow pipelines
strategy:
fail-fast: false
matrix:
machines: [multi-gpu-docker, single-gpu-docker]
runs-on: ${{ matrix.machines }}
machine_type: [single-gpu, multi-gpu]
runs-on: ${{ format('{0}-{1}', matrix.machine_type, 'docker') }}
container:
image: huggingface/transformers-tensorflow-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs: setup
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs: [run_check_runners, setup]
steps:
- name: Update clone
working-directory: /transformers
run: |
git fetch && git checkout ${{ github.sha }}
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Run all pipeline tests on GPU
working-directory: /transformers
env:
RUN_PIPELINE_TESTS: yes
run: |
python3 -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=${{ matrix.machines }}_tests_tf_pipeline_gpu tests
python3 -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=${{ matrix.machine_type }}_tests_tf_pipeline_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: |
cat /transformers/reports/${{ matrix.machines }}_tests_tf_pipeline_gpu/failures_short.txt
cat /transformers/reports/${{ matrix.machine_type }}_tests_tf_pipeline_gpu/failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: ${{ matrix.machines }}_run_tests_tf_pipeline_gpu
path: /transformers/reports/${{ matrix.machines }}_tests_tf_pipeline_gpu
name: ${{ matrix.machine_type }}_run_tests_tf_pipeline_gpu
path: /transformers/reports/${{ matrix.machine_type }}_tests_tf_pipeline_gpu
run_all_tests_torch_cuda_extensions_gpu:
name: Torch CUDA extension tests
strategy:
fail-fast: false
matrix:
machines: [multi-gpu-docker, single-gpu-docker]
runs-on: ${{ matrix.machines }}
needs: setup
machine_type: [single-gpu, multi-gpu]
runs-on: ${{ format('{0}-{1}', matrix.machine_type, 'docker') }}
needs: [run_check_runners, setup]
container:
image: huggingface/transformers-pytorch-deepspeed-latest-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
@ -215,38 +321,64 @@ jobs:
working-directory: /workspace/transformers
run: git fetch && git checkout ${{ github.sha }}
- name: Re-compile DeepSpeed
- name: Remove cached torch extensions
run: rm -rf /github/home/.cache/torch_extensions/
# To avoid unknown test failures
- name: Pre build DeepSpeed *again*
working-directory: /workspace
run: |
pip install deepspeed # installs the deps correctly
rm -rf DeepSpeed
git clone https://github.com/microsoft/DeepSpeed && cd DeepSpeed && rm -rf build
DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 python3 -m pip install -e . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
python3 -m pip uninstall -y deepspeed
DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /workspace/transformers
run: |
python utils/print_env.py
- name: Run all tests on GPU
working-directory: /workspace/transformers
run: |
python -m pytest -v --make-reports=${{ matrix.machines }}_tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended
python -m pytest -v --make-reports=${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /workspace/transformers/reports/${{ matrix.machines }}_tests_torch_cuda_extensions_gpu/failures_short.txt
run: cat /workspace/transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu/failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: ${{ matrix.machines }}_run_tests_torch_cuda_extensions_gpu_test_reports
path: /workspace/transformers/reports/${{ matrix.machines }}_tests_torch_cuda_extensions_gpu
name: ${{ matrix.machine_type }}_run_tests_torch_cuda_extensions_gpu_test_reports
path: /workspace/transformers/reports/${{ matrix.machine_type }}_tests_torch_cuda_extensions_gpu
send_results:
name: Send results to webhook
runs-on: ubuntu-latest
if: always()
needs: [setup, run_tests_gpu, run_examples_gpu, run_pipelines_tf_gpu, run_pipelines_torch_gpu, run_all_tests_torch_cuda_extensions_gpu]
needs: [
run_check_runners,
setup,
run_tests_single_gpu,
run_tests_multi_gpu,
run_examples_gpu,
run_pipelines_tf_gpu,
run_pipelines_torch_gpu,
run_all_tests_torch_cuda_extensions_gpu
]
steps:
- name: Preliminary job status
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
echo "Runner status: ${{ needs.run_check_runners.result }}"
echo "Setup status: ${{ needs.setup.result }}"
- uses: actions/checkout@v2
- uses: actions/download-artifact@v2
- name: Send message to Slack
@ -255,6 +387,12 @@ jobs:
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }}
CI_SLACK_REPORT_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
CI_EVENT: scheduled
SETUP_STATUS: ${{ needs.setup.result }}
RUNNER_STATUS: ${{ needs.run_check_runners.result }}
# We pass `needs.setup.outputs.matrix` as the argument. A processing in `notification_service.py` to change
# `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`.
run: |
pip install slack_sdk
python utils/notification_service.py "${{ needs.setup.outputs.matrix }}"

View File

@ -16,17 +16,25 @@ jobs:
steps:
- uses: actions/checkout@v2
- name: Loading cache.
- name: Load cached virtual environment
uses: actions/cache@v2
id: cache
with:
path: ~/.cache/pip
key: v1-metadata-${{ hashFiles('setup.py') }}
path: ~/venv/
key: v3-metadata-${{ hashFiles('setup.py') }}
- name: Create virtual environment on cache miss
if: steps.cache.outputs.cache-hit != 'true'
run: |
python -m venv ~/venv && . ~/venv/bin/activate
pip install --upgrade pip
- name: Setup environment
run: |
. ~/venv/bin/activate
pip install git+https://github.com/huggingface/transformers#egg=transformers[dev]
- name: Update metadata
run: |
. ~/venv/bin/activate
python utils/update_metadata.py --token ${{ secrets.SYLVAIN_HF_TOKEN }} --commit_sha ${{ github.sha }}

View File

@ -128,7 +128,7 @@ You will need basic `git` proficiency to be able to contribute to
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
Git](https://git-scm.com/book/en/v2) is a very good reference.
Follow these steps to start contributing:
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/transformers/blob/main/setup.py#L426)):
1. Fork the [repository](https://github.com/huggingface/transformers) by
clicking on the 'Fork' button on the repository's page. This creates a copy of the code
@ -171,6 +171,14 @@ Follow these steps to start contributing:
If you have already cloned that repo, you might need to `git pull` to get the most recent changes in the `datasets`
library.
Depending on your OS, you might need to install some external libraries, as well, if the `pip` installation fails.
For macOS, you will likely need [MeCab](https://taku910.github.io/mecab/), which can be installed from Homebrew:
```bash
brew install mecab
```
5. Develop the features on your branch.
@ -368,8 +376,7 @@ For documentation strings, 🤗 Transformers follows the [google style](https://
Check our [documentation writing guide](https://github.com/huggingface/transformers/tree/main/docs#writing-documentation---specification)
for more information.
#### This guide was heavily inspired by the awesome [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/main/CONTRIBUTING.md)
**This guide was heavily inspired by the awesome [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/main/CONTRIBUTING.md).**
### Develop on Windows

View File

@ -9,7 +9,7 @@ modified_only_fixup:
$(eval modified_py_files := $(shell python utils/get_modified_files.py $(check_dirs)))
@if test -n "$(modified_py_files)"; then \
echo "Checking/fixing $(modified_py_files)"; \
black $(modified_py_files); \
black --preview $(modified_py_files); \
isort $(modified_py_files); \
flake8 $(modified_py_files); \
else \
@ -39,27 +39,32 @@ repo-consistency:
python utils/check_dummies.py
python utils/check_repo.py
python utils/check_inits.py
python utils/check_config_docstrings.py
python utils/tests_fetcher.py --sanity_check
# this target runs checks on all files
quality:
black --check $(check_dirs)
black --check --preview $(check_dirs)
isort --check-only $(check_dirs)
python utils/custom_init_isort.py --check_only
python utils/sort_auto_mappings.py --check_only
flake8 $(check_dirs)
doc-builder style src/transformers docs/source --max_len 119 --check_only --path_to_docs docs/source
python utils/check_doc_toc.py
# Format source code automatically and check is there are any problems left that need manual fixing
extra_style_checks:
python utils/custom_init_isort.py
python utils/sort_auto_mappings.py
doc-builder style src/transformers docs/source --max_len 119 --path_to_docs docs/source
python utils/check_doc_toc.py --fix_and_overwrite
# this target runs checks on all files and potentially modifies some of them
style:
black $(check_dirs)
black --preview $(check_dirs)
isort $(check_dirs)
${MAKE} autogenerate_code
${MAKE} extra_style_checks

150
README.md
View File

@ -87,12 +87,16 @@ Here are a few examples:
In Computer Vision:
- [Image classification with ViT](https://huggingface.co/google/vit-base-patch16-224)
- [Object Detection with DETR](https://huggingface.co/facebook/detr-resnet-50)
- [Image Segmentation with DETR](https://huggingface.co/facebook/detr-resnet-50-panoptic)
- [Semantic Segmentation with SegFormer](https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512)
- [Panoptic Segmentation with DETR](https://huggingface.co/facebook/detr-resnet-50-panoptic)
In Audio:
- [Automatic Speech Recognition with Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h)
- [Keyword Spotting with Wav2Vec2](https://huggingface.co/superb/wav2vec2-base-superb-ks)
In Multimodal tasks:
- [Visual Question Answering with ViLT](https://huggingface.co/dandelin/vilt-b32-finetuned-vqa)
**[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team, is the official demo of this repos text generation capabilities.
## If you are looking for custom support from the Hugging Face team
@ -116,24 +120,48 @@ To immediately use a model on a given input (text, image, audio, ...), we provid
The second line of code downloads and caches the pretrained model used by the pipeline, while the third evaluates it on the given text. Here the answer is "positive" with a confidence of 99.97%.
Many NLP tasks have a pre-trained `pipeline` ready to go. For example, we can easily extract question answers given context:
Many tasks have a pre-trained `pipeline` ready to go, in NLP but also in computer vision and speech. For example, we can easily extract detected objects in an image:
``` python
>>> import requests
>>> from PIL import Image
>>> from transformers import pipeline
# Allocate a pipeline for question-answering
>>> question_answerer = pipeline('question-answering')
>>> question_answerer({
... 'question': 'What is the name of the repository ?',
... 'context': 'Pipeline has been included in the huggingface/transformers repository'
... })
{'score': 0.30970096588134766, 'start': 34, 'end': 58, 'answer': 'huggingface/transformers'}
# Download an image with cute cats
>>> url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/coco_sample.png"
>>> image_data = requests.get(url, stream=True).raw
>>> image = Image.open(image_data)
# Allocate a pipeline for object detection
>>> object_detector = pipeline('object_detection')
>>> object_detector(image)
[{'score': 0.9982201457023621,
'label': 'remote',
'box': {'xmin': 40, 'ymin': 70, 'xmax': 175, 'ymax': 117}},
{'score': 0.9960021376609802,
'label': 'remote',
'box': {'xmin': 333, 'ymin': 72, 'xmax': 368, 'ymax': 187}},
{'score': 0.9954745173454285,
'label': 'couch',
'box': {'xmin': 0, 'ymin': 1, 'xmax': 639, 'ymax': 473}},
{'score': 0.9988006353378296,
'label': 'cat',
'box': {'xmin': 13, 'ymin': 52, 'xmax': 314, 'ymax': 470}},
{'score': 0.9986783862113953,
'label': 'cat',
'box': {'xmin': 345, 'ymin': 23, 'xmax': 640, 'ymax': 368}}]
```
In addition to the answer, the pretrained model used here returned its confidence score, along with the start position and end position of the answer in the tokenized sentence. You can learn more about the tasks supported by the `pipeline` API in [this tutorial](https://huggingface.co/docs/transformers/task_summary).
Here we get a list of objects detected in the image, with a box surrounding the object and a confidence score. Here is the original image on the right, with the predictions displayed on the left:
To download and use any of the pretrained models on your given task, all it takes is three lines of code. Here is the PyTorch version:
<h3 align="center">
<a><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/coco_sample.png" width="400"></a>
<a><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/coco_sample_post_processed.png" width="400"></a>
</h3>
You can learn more about the tasks supported by the `pipeline` API in [this tutorial](https://huggingface.co/docs/transformers/task_summary).
In addition to `pipeline`, to download and use any of the pretrained models on your given task, all it takes is three lines of code. Here is the PyTorch version:
```python
>>> from transformers import AutoTokenizer, AutoModel
@ -143,6 +171,7 @@ To download and use any of the pretrained models on your given task, all it take
>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)
```
And here is the equivalent code for TensorFlow:
```python
>>> from transformers import AutoTokenizer, TFAutoModel
@ -156,7 +185,7 @@ And here is the equivalent code for TensorFlow:
The tokenizer is responsible for all the preprocessing the pretrained model expects, and can be called directly on a single string (as in the above examples) or a list. It will output a dictionary that you can use in downstream code or simply directly pass to your model using the ** argument unpacking operator.
The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) or a [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (depending on your backend) which you can use normally. [This tutorial](https://huggingface.co/docs/transformers/training) explains how to integrate such a model into a classic PyTorch or TensorFlow training loop, or how to use our `Trainer` API to quickly fine-tune on a new dataset.
The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) or a [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (depending on your backend) which you can use as usual. [This tutorial](https://huggingface.co/docs/transformers/training) explains how to integrate such a model into a classic PyTorch or TensorFlow training loop, or how to use our `Trainer` API to quickly fine-tune on a new dataset.
## Why should I use transformers?
@ -169,7 +198,7 @@ The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/sta
1. Lower compute costs, smaller carbon footprint:
- Researchers can share trained models instead of always retraining.
- Practitioners can reduce compute time and production costs.
- Dozens of architectures with over 20,000 pretrained models, some in more than 100 languages.
- Dozens of architectures with over 60,000 pretrained models across all modalities.
1. Choose the right framework for every part of a model's lifetime:
- Train state-of-the-art models in 3 lines of code.
@ -184,7 +213,7 @@ The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/sta
## Why shouldn't I use transformers?
- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving into additional abstractions/files.
- The training API is not intended to work on any model but is optimized to work with the models provided by the library. For generic machine learning loops, you should use another library.
- The training API is not intended to work on any model but is optimized to work with the models provided by the library. For generic machine learning loops, you should use another library (possibly, [Accelerate](https://huggingface.co/docs/accelerate)).
- While we strive to present as many use cases as possible, the scripts in our [examples folder](https://github.com/huggingface/transformers/tree/main/examples) are just that: examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs.
## Installation
@ -220,6 +249,8 @@ conda install -c huggingface transformers
Follow the installation pages of Flax, PyTorch or TensorFlow to see how to install them with conda.
> **_NOTE:_** On Windows, you may be prompted to activate Developer Mode in order to benefit from caching. If this is not an option for you, please let us know in [this issue](https://github.com/huggingface/huggingface_hub/issues/1062).
## Model architectures
**[All the model checkpoints](https://huggingface.co/models)** provided by 🤗 Transformers are seamlessly integrated from the huggingface.co [model hub](https://huggingface.co) where they are uploaded directly by [users](https://huggingface.co/users) and [organizations](https://huggingface.co/organizations).
@ -234,77 +265,96 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei.
1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
1. **[ConvNeXT](https://huggingface.co/docs/transformers/main/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[Data2Vec](https://huggingface.co/docs/transformers/main/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) and a German version of DistilBERT.
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval
for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon
Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GLPN](https://huggingface.co/docs/transformers/main/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
1. **[ImageGPT](https://huggingface.co/docs/transformers/main/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutxlm)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze.
1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.
1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert.
1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MaskFormer](https://huggingface.co/docs/transformers/main/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
1. **[MBart](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[MBart-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari.
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/main/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen.
1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (from Huawei Noahs Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu.
1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (from Google) released with the paper [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao, and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[PLBart](https://huggingface.co/docs/transformers/main/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](https://huggingface.co/docs/transformers/main/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (from META Platforms) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/abs/2010.12821) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
1. **[RegNet](https://huggingface.co/docs/transformers/main/model_doc/regnet)** (from META Platforms) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
1. **[ResNet](https://huggingface.co/docs/transformers/main/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
@ -313,34 +363,40 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
1. **[SqueezeBert](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](https://huggingface.co/docs/transformers/main/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[TAPEX](https://huggingface.co/docs/transformers/main/model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou.
1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou.
1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER
AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
1. **[VAN](https://huggingface.co/docs/transformers/main/model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/abs/2202.09741) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
1. **[ViLT](https://huggingface.co/docs/transformers/main/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/abs/2202.09741) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
1. **[VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang.
1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
1. **[ViTMAE](https://huggingface.co/docs/transformers/main/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
1. **[WavLM](https://huggingface.co/docs/transformers/main/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino.
1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli.
1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)** (from Microsoft Research) released with the paper [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling.
1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
1. **[YOSO](https://huggingface.co/docs/transformers/main/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.
1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
1. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
To check if each model has an implementation in Flax, PyTorch or TensorFlow, or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to [this table](https://huggingface.co/docs/transformers/index#supported-frameworks).

View File

@ -221,16 +221,19 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
1. **[ConvNeXT](https://huggingface.co/docs/transformers/main/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[Data2Vec](https://huggingface.co/docs/transformers/main/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
@ -239,51 +242,69 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/distillation) and a German version of DistilBERT.
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (from NAVER) released with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GLPN](https://huggingface.co/docs/transformers/main/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
1. **[ImageGPT](https://huggingface.co/docs/transformers/main/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutxlm)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze.
1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.
1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert.
1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MaskFormer](https://huggingface.co/docs/transformers/main/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
1. **[MBart](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[MBart-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari.
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/main/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen.
1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (from Huawei Noahs Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu.
1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (from Google) released with the paper [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao, Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[PLBart](https://huggingface.co/docs/transformers/main/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](https://huggingface.co/docs/transformers/main/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RegNet](https://huggingface.co/docs/transformers/main/model_doc/regnet)** (from META Research) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (from META Research) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
1. **[ResNet](https://huggingface.co/docs/transformers/main/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
@ -292,24 +313,30 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
1. **[SqueezeBert](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](https://huggingface.co/docs/transformers/main/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[TAPEX](https://huggingface.co/docs/transformers/main/model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou.
1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou.
1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
1. **[VAN](https://huggingface.co/docs/transformers/main/model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/pdf/2202.09741.pdf) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
1. **[ViLT](https://huggingface.co/docs/transformers/main/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/pdf/2202.09741.pdf) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
1. **[VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang.
1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
1. **[ViTMAE](https://huggingface.co/docs/transformers/main/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino.
1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli.
1. **[WavLM](https://huggingface.co/docs/transformers/main/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)** (from Microsoft Research) released with the paper [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling.
1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
@ -318,7 +345,8 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. **[YOSO](https://huggingface.co/docs/transformers/main/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.
1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
1. 새로운 모델을 올리고 싶나요? 우리가 **상세한 가이드와 템플릿** 으로 새로운 모델을 올리도록 도와드릴게요. 가이드와 템플릿은 이 저장소의 [`templates`](./templates) 폴더에서 확인하실 수 있습니다. [컨트리뷰션 가이드라인](./CONTRIBUTING.md)을 꼭 확인해주시고, PR을 올리기 전에 메인테이너에게 연락하거나 이슈를 오픈해 피드백을 받으시길 바랍니다.
각 모델이 Flax, PyTorch, TensorFlow으로 구현되었는지 또는 🤗 Tokenizers 라이브러리가 지원하는 토크나이저를 사용하는지 확인하려면, [이 표](https://huggingface.co/docs/transformers/index#supported-frameworks)를 확인하세요.

View File

@ -173,7 +173,7 @@ checkpoint: 检查点
- 对所有模型统一的API
1. 更低计算开销,更少的碳排放:
- 研究人员可以分享亿训练的模型而非次从头开始训练
- 研究人员可以分享训练的模型而非次从头开始训练
- 工程师可以减少计算用时和生产环境开销
- 数十种模型架构、两千多个预训练模型、100多种语言支持
@ -227,7 +227,7 @@ conda install -c huggingface transformers
## 模型架构
**🤗 Transformers 支持的[所有的模型检查点](https://huggingface.co/models)** 由[用户](https://huggingface.co/users)和[组织](https://huggingface.co/organizations)上传,均与 huggingface.co [model hub](https://huggingface.co) 无缝整合。
🤗 Transformers 支持的[**所有的模型检查点**](https://huggingface.co/models)由[用户](https://huggingface.co/users)和[组织](https://huggingface.co/organizations)上传,均与 huggingface.co [model hub](https://huggingface.co) 无缝整合。
目前的检查点数量: ![](https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen)
@ -245,16 +245,19 @@ conda install -c huggingface transformers
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (来自 Google Research) 伴随论文 [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (来自 Alexa) 伴随论文 [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) 由 Adrian de Wynter and Daniel J. Perry 发布。
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (来自 Google Research) 伴随论文 [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) 由 Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel 发布。
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (来自 Inria/Facebook/Sorbonne) 伴随论文 [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) 由 Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot 发布。
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (来自 Google Research) 伴随论文 [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) 由 Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting 发布。
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (来自 OpenAI) 伴随论文 [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) 由 Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever 发布。
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (来自 Salesforce) 伴随论文 [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) 由 Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong 发布。
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (来自 YituTech) 伴随论文 [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) 由 Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan 发布。
1. **[ConvNeXT](https://huggingface.co/docs/transformers/main/model_doc/convnext)** (来自 Facebook AI) 伴随论文 [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) 由 Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie 发布。
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (来自 Facebook AI) 伴随论文 [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) 由 Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie 发布。
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (来自 Tsinghua University) 伴随论文 [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) 由 Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun 发布。
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (来自 Salesforce) 伴随论文 [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) 由 Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher 发布。
1. **[Data2Vec](https://huggingface.co/docs/transformers/main/model_doc/data2vec)** (来自 Facebook) 伴随论文 [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) 由 Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli 发布。
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (来自 Microsoft) 伴随论文 [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) 由 Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang 发布。
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (来自 Facebook) 伴随论文 [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) 由 Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli 发布。
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (来自 Microsoft) 伴随论文 [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) 由 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen 发布。
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (来自 Microsoft) 伴随论文 [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) 由 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen 发布。
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (来自 Berkeley/Facebook/Google) 伴随论文 [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) 由 Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch 发布。
@ -263,51 +266,69 @@ conda install -c huggingface transformers
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (来自 Microsoft Research) 伴随论文 [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) 由 Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan 发布。
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (来自 HuggingFace), 伴随论文 [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) 由 Victor Sanh, Lysandre Debut and Thomas Wolf 发布。 同样的方法也应用于压缩 GPT-2 到 [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/distillation), RoBERTa 到 [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/distillation), Multilingual BERT 到 [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/distillation) 和德语版 DistilBERT。
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (来自 Microsoft Research) 伴随论文 [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) 由 Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei 发布。
1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (来自 NAVER) 伴随论文 [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) 由 Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park 发布。
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (来自 Facebook) 伴随论文 [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) 由 Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih 发布。
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (来自 Intel Labs) 伴随论文 [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) 由 René Ranftl, Alexey Bochkovskiy, Vladlen Koltun 发布。
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (来自 Google Research/Stanford University) 伴随论文 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 发布。
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (来自 Google Research) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (来自 Baidu) 伴随论文 [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu 发布。
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (来自 CNRS) 伴随论文 [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) 由 Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab 发布。
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (来自 Facebook AI) 伴随论文 [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) 由 Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela 发布。
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (来自 Google Research) 伴随论文 [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) 由 James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon 发布。
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (来自 CMU/Google Brain) 伴随论文 [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) 由 Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le 发布。
1. **[GLPN](https://huggingface.co/docs/transformers/main/model_doc/glpn)** (来自 KAIST) 伴随论文 [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) 由 Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim 发布。
1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (来自 KAIST) 伴随论文 [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) 由 Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim 发布。
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (来自 OpenAI) 伴随论文 [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) 由 Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever 发布。
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (来自 EleutherAI) 随仓库 [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) 发布。作者为 Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy 发布。
1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (来自 OpenAI) 伴随论文 [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) 由 Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever** 发布。
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (来自 EleutherAI) 伴随论文 [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) 由 Ben Wang and Aran Komatsuzaki 发布。
1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (来自 UCSD, NVIDIA) 伴随论文 [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) 由 Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang 发布。
1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (来自 Facebook) 伴随论文 [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) 由 Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed 发布。
1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (来自 Berkeley) 伴随论文 [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) 由 Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer 发布。
1. **[ImageGPT](https://huggingface.co/docs/transformers/main/model_doc/imagegpt)** (来自 OpenAI) 伴随论文 [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) 由 Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever 发布。
1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (来自 OpenAI) 伴随论文 [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) 由 Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever 发布。
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (来自 Microsoft Research Asia) 伴随论文 [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) 由 Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou 发布。
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (来自 Microsoft Research Asia) 伴随论文 [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) 由 Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou 发布。
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (来自 Microsoft Research Asia) 伴随论文 [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) 由 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei 发布。
1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (来自 Microsoft Research Asia) 伴随论文 [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) 由 Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei 发布。
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutxlm)** (来自 Microsoft Research Asia) 伴随论文 [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) 由 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei 发布。
1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (来自 AllenAI) 伴随论文 [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) 由 Iz Beltagy, Matthew E. Peters, Arman Cohan 发布。
1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (来自 Meta AI) 伴随论文 [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) 由 Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze 发布。
1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (来自 AllenAI) 伴随论文 [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) 由 Iz Beltagy, Matthew E. Peters, Arman Cohan 发布。
1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (来自 Google AI) released 伴随论文 [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) 由 Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang 发布。
1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (来自 Studio Ousia) 伴随论文 [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) 由 Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto 发布。
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (来自 UNC Chapel Hill) 伴随论文 [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) 由 Hao Tan and Mohit Bansal 发布。
1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (来自 Facebook) 伴随论文 [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) 由 Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert 发布。
1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (来自 Facebook) 伴随论文 [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) 由 Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin 发布。
1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** 用 [OPUS](http://opus.nlpl.eu/) 数据训练的机器翻译模型由 Jörg Tiedemann 发布。[Marian Framework](https://marian-nmt.github.io/) 由微软翻译团队开发。
1. **[MaskFormer](https://huggingface.co/docs/transformers/main/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov
1. **[MBart](https://huggingface.co/docs/transformers/model_doc/mbart)** (来自 Facebook) 伴随论文 [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) 由 Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer 发布。
1. **[MBart-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (来自 Facebook) 伴随论文 [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) 由 Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan 发布。
1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov
1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (来自 Facebook) 伴随论文 [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) 由 Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer 发布。
1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (来自 Facebook) 伴随论文 [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) 由 Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan 发布。
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (来自 NVIDIA) 伴随论文 [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 发布。
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (来自 NVIDIA) 伴随论文 [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 发布。
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (来自 Studio Ousia) 伴随论文 [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) 由 Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka 发布。
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (来自 CMU/Google Brain) 伴随论文 [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) 由 Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou 发布。
1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (来自 Apple) 伴随论文 [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) 由 Sachin Mehta and Mohammad Rastegari 发布。
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (来自 Microsoft Research) 伴随论文 [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) 由 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu 发布。
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (来自 Google AI) 伴随论文 [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) 由 Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel 发布。
1. **[Nyströmformer](https://huggingface.co/docs/transformers/main/model_doc/nystromformer)** (来自 the University of Wisconsin - Madison) 伴随论文 [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) 由 Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh 发布。
1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (来自 中国人民大学 AI Box) 伴随论文 [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) 由 Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen 发布。
1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (来自华为诺亚方舟实验室) 伴随论文 [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) 由 Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu 发布。
1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (来自 Meta) 伴随论文 [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) 由 the NLLB team 发布。
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (来自 the University of Wisconsin - Madison) 伴随论文 [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) 由 Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh 发布。
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (来自 Meta AI) 伴随论文 [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) 由 Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al 发布。
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (来自 Google AI) 伴随论文 [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) 由 Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby 发布。
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (来自 Google) 伴随论文 [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) 由 Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu 发布。
1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (来自 Google) 伴随论文 [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) 由 Jason Phang, Yao Zhao, Peter J. Liu 发布。
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (来自 Deepmind) 伴随论文 [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) 由 Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira 发布。
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (来自 VinAI Research) 伴随论文 [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) 由 Dat Quoc Nguyen and Anh Tuan Nguyen 发布。
1. **[PLBart](https://huggingface.co/docs/transformers/main/model_doc/plbart)** (来自 UCLA NLP) 伴随论文 [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) 由 Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang 发布。
1. **[PoolFormer](https://huggingface.co/docs/transformers/main/model_doc/poolformer)** (来自 Sea AI Labs) 伴随论文 [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) 由 Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng 发布。
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (来自 UCLA NLP) 伴随论文 [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) 由 Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang 发布。
1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (来自 Sea AI Labs) 伴随论文 [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) 由 Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng 发布。
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (来自 Microsoft Research) 伴随论文 [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) 由 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou 发布。
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (来自 NVIDIA) 伴随论文 [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) 由 Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius 发布。
1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (来自 Facebook) 伴随论文 [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) 由 Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela 发布。
1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (来自 Google Research) 伴随论文 [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) 由 Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang 发布。
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (来自 Google Research) 伴随论文 [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) 由 Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya 发布。
1. **[RegNet](https://huggingface.co/docs/transformers/main/model_doc/regnet)** (from META Research) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (from META Research) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (来自 Google Research) 伴随论文 [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) 由 Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder 发布。
1. **[ResNet](https://huggingface.co/docs/transformers/main/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (来自 Facebook), 伴随论文 [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) 由 Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov 发布。
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (来自 ZhuiyiTechnology), 伴随论文 [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) 由 Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu 发布。
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (来自 NVIDIA) 伴随论文 [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) 由 Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo 发布。
@ -316,24 +337,30 @@ conda install -c huggingface transformers
1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (来自 Facebook), 伴随论文 [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) 由 Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino 发布。
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (来自 Facebook) 伴随论文 [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) 由 Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau 发布。
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (来自 Tel Aviv University) 伴随论文 [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) 由 Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy 发布。
1. **[SqueezeBert](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (来自 Berkeley) 伴随论文 [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) 由 Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer 发布。
1. **[Swin Transformer](https://huggingface.co/docs/transformers/main/model_doc/swin)** (来自 Microsoft) 伴随论文 [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) 由 Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo 发布。
1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (来自 Berkeley) 伴随论文 [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) 由 Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer 发布。
1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (来自 Microsoft) 伴随论文 [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) 由 Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo 发布。
1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/model_doc/swinv2)** (来自 Microsoft) 伴随论文 [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) 由 Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo 发布。
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (来自 Google AI) 伴随论文 [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) 由 Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu 发布。
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (来自 Google AI) 伴随论文 [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) 由 Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu 发布。
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (来自 Google AI) 伴随论文 [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) 由 Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos 发布。
1. **[TAPEX](https://huggingface.co/docs/transformers/main/model_doc/tapex)** (来自 Microsoft Research) 伴随论文 [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) 由 Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou 发布。
1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (来自 Microsoft Research) 伴随论文 [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) 由 Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou 发布。
1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (来自 Google/CMU) 伴随论文 [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) 由 Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov 发布。
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (来自 Microsoft) 伴随论文 [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) 由 Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei 发布。
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (来自 Microsoft Research) 伴随论文 [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) 由 Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang 发布。
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (来自 Microsoft Research) 伴随论文 [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) 由 Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu 发布。
1. **[VAN](https://huggingface.co/docs/transformers/main/model_doc/van)** (来自 Tsinghua University and Nankai University) 伴随论文 [Visual Attention Network](https://arxiv.org/pdf/2202.09741.pdf) 由 Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu 发布。
1. **[ViLT](https://huggingface.co/docs/transformers/main/model_doc/vilt)** (来自 NAVER AI Lab/Kakao Enterprise/Kakao Brain) 伴随论文 [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) 由 Wonjae Kim, Bokyung Son, Ildoo Kim 发布。
1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (来自 Tsinghua University and Nankai University) 伴随论文 [Visual Attention Network](https://arxiv.org/pdf/2202.09741.pdf) 由 Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu 发布。
1. **[VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)** (来自 Multimedia Computing Group, Nanjing University) 伴随论文 [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) 由 Zhan Tong, Yibing Song, Jue Wang, Limin Wang 发布。
1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (来自 NAVER AI Lab/Kakao Enterprise/Kakao Brain) 伴随论文 [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) 由 Wonjae Kim, Bokyung Son, Ildoo Kim 发布。
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (来自 Google AI) 伴随论文 [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) 由 Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby 发布。
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (来自 UCLA NLP) 伴随论文 [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) 由 Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang 发布。
1. **[ViTMAE](https://huggingface.co/docs/transformers/main/model_doc/vit_mae)** (来自 Meta AI) 伴随论文 [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) 由 Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick 发布。
1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (来自 Meta AI) 伴随论文 [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) 由 Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick 发布。
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (来自 Facebook AI) 伴随论文 [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) 由 Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli 发布。
1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (来自 Facebook AI) 伴随论文 [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) 由 Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino 发布。
1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (来自 Facebook AI) 伴随论文 [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) 由 Qiantong Xu, Alexei Baevski, Michael Auli 发布。
1. **[WavLM](https://huggingface.co/docs/transformers/main/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)** (来自 Microsoft Research) 伴随论文 [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) 由 Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling 发布。
1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (来自 Facebook) 伴随论文 [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) 由 Guillaume Lample and Alexis Conneau 发布。
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (来自 Microsoft Research) 伴随论文 [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) 由 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou 发布。
@ -342,7 +369,8 @@ conda install -c huggingface transformers
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (来自 Google/CMU) 伴随论文 [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) 由 Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le 发布。
1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (来自 Facebook AI) 伴随论文 [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) 由 Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli 发布。
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (来自 Facebook AI) 伴随论文 [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) 由 Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli 发布。
1. **[YOSO](https://huggingface.co/docs/transformers/main/model_doc/yoso)** (来自 the University of Wisconsin - Madison) 伴随论文 [You Only Sample (Almost) 由 Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh 发布。
1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (来自 Huazhong University of Science & Technology) 伴随论文 [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) 由 Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu 发布。
1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (来自 the University of Wisconsin - Madison) 伴随论文 [You Only Sample (Almost) 由 Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh 发布。
1. 想要贡献新的模型?我们这里有一份**详细指引和模板**来引导你添加新的模型。你可以在 [`templates`](./templates) 目录中找到他们。记得查看 [贡献指南](./CONTRIBUTING.md) 并在开始写 PR 前联系维护人员或开一个新的 issue 来获得反馈。
要检查某个模型是否已有 Flax、PyTorch 或 TensorFlow 的实现,或其是否在 🤗 Tokenizers 库中有对应词符化器tokenizer敬请参阅[此表](https://huggingface.co/docs/transformers/index#supported-frameworks)。
@ -357,7 +385,7 @@ conda install -c huggingface transformers
| [文档](https://huggingface.co/transformers/) | 完整的 API 文档和教程 |
| [任务总结](https://huggingface.co/docs/transformers/task_summary) | 🤗 Transformers 支持的任务 |
| [预处理教程](https://huggingface.co/docs/transformers/preprocessing) | 使用 `Tokenizer` 来为模型准备数据 |
| [训练和微调](https://huggingface.co/docstransformers/training) | 在 PyTorch/TensorFlow 的训练循环或 `Trainer` API 中使用 🤗 Transformers 提供的模型 |
| [训练和微调](https://huggingface.co/docs/transformers/training) | 在 PyTorch/TensorFlow 的训练循环或 `Trainer` API 中使用 🤗 Transformers 提供的模型 |
| [快速上手:微调和用例脚本](https://github.com/huggingface/transformers/tree/main/examples) | 为各种任务提供的用例脚本 |
| [模型分享和上传](https://huggingface.co/docs/transformers/model_sharing) | 和社区上传和分享你微调的模型 |
| [迁移](https://huggingface.co/docs/transformers/migration) | 从 `pytorch-transformers` 或 `pytorch-pretrained-bert` 迁移到 🤗 Transformers |

View File

@ -185,7 +185,7 @@ Tokenizer 為所有的預訓練模型提供了預處理,並可以直接轉換
- 對所有模型使用的制式化API
1. 更低的運算成本,更少的碳排放:
- 研究人員可以分享訓練的模型而非從頭開始訓練
- 研究人員可以分享訓練的模型而非每次從頭開始訓練
- 工程師可以減少計算時間以及生產成本
- 數十種模型架構、兩千多個預訓練模型、100多種語言支援
@ -257,16 +257,19 @@ conda install -c huggingface transformers
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
1. **[ConvNeXT](https://huggingface.co/docs/transformers/main/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[Data2Vec](https://huggingface.co/docs/transformers/main/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
@ -275,51 +278,69 @@ conda install -c huggingface transformers
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/distillation) and a German version of DistilBERT.
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (from NAVER) released with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[ERNIE](https://huggingface.co/docs/transformers/model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GLPN](https://huggingface.co/docs/transformers/main/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released with the paper [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
1. **[ImageGPT](https://huggingface.co/docs/transformers/main/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutxlm)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze.
1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.
1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert.
1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MaskFormer](https://huggingface.co/docs/transformers/main/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov
1. **[MBart](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[MBart-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov
1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari.
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/main/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen.
1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (from Huawei Noahs Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu.
1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[PEGASUS-X](https://huggingface.co/docs/transformers/model_doc/pegasus_x)** (from Google) released with the paper [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao, Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[PLBart](https://huggingface.co/docs/transformers/main/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](https://huggingface.co/docs/transformers/main/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RegNet](https://huggingface.co/docs/transformers/main/model_doc/regnet)** (from META Research) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (from META Research) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
1. **[ResNet](https://huggingface.co/docs/transformers/main/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
@ -328,24 +349,30 @@ conda install -c huggingface transformers
1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook) released with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University) released with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
1. **[SqueezeBert](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](https://huggingface.co/docs/transformers/main/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released with the paper [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[TAPEX](https://huggingface.co/docs/transformers/main/model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou.
1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou.
1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft) released with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
1. **[VAN](https://huggingface.co/docs/transformers/main/model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/pdf/2202.09741.pdf) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
1. **[ViLT](https://huggingface.co/docs/transformers/main/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/pdf/2202.09741.pdf) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
1. **[VideoMAE](https://huggingface.co/docs/transformers/model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang.
1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
1. **[ViTMAE](https://huggingface.co/docs/transformers/main/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino.
1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli.
1. **[WavLM](https://huggingface.co/docs/transformers/main/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[X-CLIP](https://huggingface.co/docs/transformers/model_doc/xclip)** (from Microsoft Research) released with the paper [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling.
1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
@ -354,7 +381,8 @@ conda install -c huggingface transformers
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. **[YOSO](https://huggingface.co/docs/transformers/main/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.
1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
1. 想要貢獻新的模型?我們這裡有一份**詳細指引和模板**來引導你加入新的模型。你可以在 [`templates`](./templates) 目錄中找到它們。記得查看[貢獻指引](./CONTRIBUTING.md)並在開始寫 PR 前聯繫維護人員或開一個新的 issue 來獲得 feedbacks。
要檢查某個模型是否已有 Flax、PyTorch 或 TensorFlow 的實作,或其是否在🤗 Tokenizers 函式庫中有對應的 tokenizer敬請參閱[此表](https://huggingface.co/docs/transformers/index#supported-frameworks)。

View File

@ -3,20 +3,53 @@ LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
# Use login shell to read variables from `~/.profile` (to pass dynamic created variables between RUN commands)
SHELL ["sh", "-lc"]
# The following `ARG` are mainly used to specify the versions explicitly & directly in this docker file, and not meant
# to be used as arguments for docker build (so far).
ARG PYTORCH='1.12.1'
# (not always a valid torch version)
ARG INTEL_TORCH_EXT='1.11.0'
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu113'
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg git-lfs
RUN git lfs install
RUN python3 -m pip install --no-cache-dir --upgrade pip
ARG REF=main
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime]
RUN python3 -m pip install --no-cache-dir -U torch tensorflow
# TODO: Handle these in a python utility script
RUN [ ${#PYTORCH} -gt 0 -a "$PYTORCH" != "pre" ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile
RUN echo torch=$VERSION
# `torchvision` and `torchaudio` should be installed along with `torch`, especially for nightly build.
# Currently, let's just use their latest releases (when `torch` is installed with a release version)
# TODO: We might need to specify proper versions that work with a specific torch version (especially for past CI).
RUN [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA || python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA
RUN python3 -m pip install --no-cache-dir -U tensorflow
RUN python3 -m pip uninstall -y flax jax
RUN python3 -m pip install --no-cache-dir torch-scatter -f https://data.pyg.org/whl/torch-$(python3 -c "from torch import version; print(version.__version__.split('+')[0])")+cu102.html
# Use installed torch version for `torch-scatter` to avid to deal with PYTORCH='pre'.
# If torch is nightly version, the link is likely to be invalid, but the installation falls back to the latest torch-scatter
RUN python3 -m pip install --no-cache-dir torch-scatter -f https://data.pyg.org/whl/torch-$(python3 -c "from torch import version; print(version.__version__.split('+')[0])")+$CUDA.html
RUN python3 -m pip install --no-cache-dir intel_extension_for_pytorch==$INTEL_TORCH_EXT+cpu -f https://software.intel.com/ipex-whl-stable
RUN python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git pytesseract https://github.com/kpu/kenlm/archive/master.zip
RUN python3 -m pip install -U "itsdangerous<2.1.0"
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate
# Add bitsandbytes for mixed int8 testing
RUN python3 -m pip install --no-cache-dir bitsandbytes
RUN python3 -m pip install --no-cache-dir decord
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop

View File

@ -23,4 +23,4 @@ COPY . transformers/
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"]
CMD ["/bin/bash"]

View File

@ -15,5 +15,6 @@ RUN python3 -m pip install --no-cache-dir torchvision git+https://github.com/fac
RUN python3 -m pip install --no-cache-dir pytorch-quantization --extra-index-url https://pypi.ngc.nvidia.com
RUN python3 -m pip install -U "itsdangerous<2.1.0"
RUN doc-builder build transformers transformers/docs/source --build_dir doc-build-dev --notebook_dir notebooks/transformers_doc --clean --version pr_$PR_NUMBER
# Test if the image could successfully build the doc. before publishing the image
RUN doc-builder build transformers transformers/docs/source/en --build_dir doc-build-dev --notebook_dir notebooks/transformers_doc --clean
RUN rm -rf doc-build-dev

View File

@ -0,0 +1,43 @@
ARG BASE_DOCKER_IMAGE="nvidia/cuda:11.2.2-cudnn8-devel-ubuntu20.04"
FROM $BASE_DOCKER_IMAGE
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
# Use login shell to read variables from `~/.profile` (to pass dynamic created variables between RUN commands)
SHELL ["sh", "-lc"]
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg git-lfs
RUN git lfs install
RUN python3 -m pip install --no-cache-dir --upgrade pip
ARG REF=main
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime]
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop
ARG FRAMEWORK
ARG VERSION
# Remove all frameworks
# (`accelerate` requires `torch`, and this causes import issues for TF-only testing)
RUN python3 -m pip uninstall -y torch torchvision torchaudio accelerate tensorflow jax flax
# Get the libraries and their versions to install, and write installation command to `~/.profile`.
RUN python3 ./transformers/utils/past_ci_versions.py --framework $FRAMEWORK --version $VERSION
# Install the target framework
RUN echo "INSTALL_CMD = $INSTALL_CMD"
RUN $INSTALL_CMD
# Having installation problems for torch-scatter with torch <= 1.6. Disable so we have the same set of tests.
# (This part will be removed once the logic of using `past_ci_versions.py` is used in other Dockerfile files.)
# # Use installed torch version for `torch-scatter`.
# # (The env. variable $CUDA is defined in `past_ci_versions.py`)
# RUN [ "$FRAMEWORK" = "pytorch" ] && python3 -m pip install --no-cache-dir torch-scatter -f https://data.pyg.org/whl/torch-$(python3 -c "from torch import version; print(version.__version__.split('+')[0])")+$CUDA.html || echo "torch-scatter not to be installed"
RUN python3 -m pip install -U "itsdangerous<2.1.0"

View File

@ -3,16 +3,30 @@ LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
ARG PYTORCH='1.12.1'
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu113'
RUN apt -y update
RUN apt install -y libaio-dev
RUN python3 -m pip install --no-cache-dir --upgrade pip
ARG REF=main
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir -e ./transformers[testing,deepspeed]
RUN git clone https://github.com/microsoft/DeepSpeed && cd DeepSpeed && rm -rf build && \
DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 python3 -m pip install -e . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check 2>&1
# Install latest release PyTorch
# (PyTorch must be installed before pre-compiling any DeepSpeed c++/cuda ops.)
# (https://www.deepspeed.ai/tutorials/advanced-install/#pre-install-deepspeed-ops)
RUN python3 -m pip install --no-cache-dir -U torch==$PYTORCH torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip install --no-cache-dir ./transformers[deepspeed-testing]
# Pre-build **latest** DeepSpeed, so it would be ready for testing (otherwise, the 1st deepspeed test will timeout)
RUN python3 -m pip uninstall -y deepspeed
# This has to be run (again) inside the GPU VMs running the tests.
# The installation works here, but some tests fail, if we don't pre-build deepspeed again in the VMs running the tests.
# TODO: Find out why test fail.
RUN DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check 2>&1
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.

View File

@ -0,0 +1,35 @@
FROM nvcr.io/nvidia/pytorch:21.03-py3
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu113'
RUN apt -y update
RUN apt install -y libaio-dev
RUN python3 -m pip install --no-cache-dir --upgrade pip
ARG REF=main
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
# Install **nightly** release PyTorch (flag `--pre`)
# (PyTorch must be installed before pre-compiling any DeepSpeed c++/cuda ops.)
# (https://www.deepspeed.ai/tutorials/advanced-install/#pre-install-deepspeed-ops)
RUN python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA
RUN python3 -m pip install --no-cache-dir ./transformers[deepspeed-testing]
# Pre-build **nightly** release of DeepSpeed, so it would be ready for testing (otherwise, the 1st deepspeed test will timeout)
RUN python3 -m pip uninstall -y deepspeed
# This has to be run inside the GPU VMs running the tests. (So far, it fails here due to GPU checks during compilation.)
# Issue: https://github.com/microsoft/DeepSpeed/issues/2010
# RUN git clone https://github.com/microsoft/DeepSpeed && cd DeepSpeed && rm -rf build && \
# DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 python3 -m pip install . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check 2>&1
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop
# Disable for now as deepspeed is not installed above. To be enabled once the issue is fixed.
# RUN python3 -c "from deepspeed.launcher.runner import main"

View File

@ -12,12 +12,17 @@ RUN git clone https://github.com/huggingface/transformers && cd transformers &&
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch,testing]
# If set to nothing, will install the latest version
ARG PYTORCH=''
ARG PYTORCH='1.12.1'
ARG TORCH_VISION=''
ARG TORCH_AUDIO=''
RUN [ ${#PYTORCH} -gt 0 ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/cu113
RUN [ ${#TORCH_VISION} -gt 0 ] && VERSION='torchvision=='TORCH_VISION'.*' || VERSION='torchvision'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/cu113
RUN [ ${#TORCH_AUDIO} -gt 0 ] && VERSION='torchaudio=='TORCH_AUDIO'.*' || VERSION='torchaudio'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/cu113
RUN [ ${#PYTORCH} -gt 0 ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; python3 -m pip install --no-cache-dir -U $VERSION
RUN python3 -m pip uninstall -y tensorflow flax
RUN python3 -m pip install --no-cache-dir torch-scatter -f https://data.pyg.org/whl/torch-$(python3 -c "from torch import version; print(version.__version__.split('+')[0])")+cu102.html
RUN python3 -m pip install --no-cache-dir torch-scatter -f https://data.pyg.org/whl/torch-$(python3 -c "from torch import version; print(version.__version__.split('+')[0])")+cu113.html
RUN python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git pytesseract https://github.com/kpu/kenlm/archive/master.zip
RUN python3 -m pip install -U "itsdangerous<2.1.0"

View File

@ -22,4 +22,4 @@ COPY . transformers/
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"]
CMD ["/bin/bash"]

View File

@ -43,18 +43,39 @@ Once you have setup the `doc-builder` and additional packages, you can generate
typing the following command:
```bash
doc-builder build transformers docs/source/ --build_dir ~/tmp/test-build
doc-builder build transformers docs/source/en/ --build_dir ~/tmp/test-build
```
You can adapt the `--build_dir` to set any temporary folder that you prefer. This command will create it and generate
the MDX files that will be rendered as the documentation on the main website. You can inspect them in your favorite
Markdown editor.
## Previewing the documentation
To preview the docs, first install the `watchdog` module with:
```bash
pip install watchdog
```
Then run the following command:
```bash
doc-builder preview {package_name} {path_to_docs}
```
For example:
```bash
doc-builder preview transformers docs/source/en/
```
The docs will be viewable at [http://localhost:3000](http://localhost:3000). You can also preview the docs once you have opened a PR. You will see a bot add a comment to a link where the documentation with your changes lives.
---
**NOTE**
It's not possible to see locally how the final documentation will look like for now. Once you have opened a PR, you
will see a bot add a comment to a link where the documentation with your changes lives.
The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml` & restart `preview` command (`ctrl-c` to stop it & call `doc-builder preview ...` again).
---
@ -108,6 +129,11 @@ Make sure to put your new file under the proper section. It's unlikely to go in
depending on the intended targets (beginners, more advanced users or researchers) it should go in section two, three or
four.
### Translating
When translating, refer to the guide at [./TRANSLATING.md](https://github.com/huggingface/transformers/blob/main/docs/TRANSLATING.md).
### Adding a new model
When adding a new model:
@ -402,4 +428,4 @@ Here are a few tips to help you debug the doctests and make them pass:
* whitespace: one give whitespace (space, tabulation, new line) is equivalent to any number of whitespace, so you can add new lines where there are spaces to make your output more readable.
* numerical values: you should never put more than 4 or 5 digits to expected results as different setups or library versions might get you slightly different results. `doctest` is configure to ignore any difference lower than the precision to which you wrote (so 1e-4 if you write 4 digits).
- Don't leave a block of code that is very long to execute. If you can't make it fast, you can either not use the doctest syntax on it (so that it's ignored), or if you want to use the doctest syntax to show the results, you can add a comment `# doctest: +SKIP` at the end of the lines of code too long to execute
- Each line of code that produces a result needs to have that result written below. You can ignore an output if you don't want to show it in your code example by adding a comment ` # doctest: +IGNORE_RESULT` at the end of the line of code produing it.
- Each line of code that produces a result needs to have that result written below. You can ignore an output if you don't want to show it in your code example by adding a comment ` # doctest: +IGNORE_RESULT` at the end of the line of code producing it.

57
docs/TRANSLATING.md Normal file
View File

@ -0,0 +1,57 @@
### Translating the Transformers documentation into your language
As part of our mission to democratize machine learning, we'd love to make the Transformers library available in many more languages! Follow the steps below if you want to help translate the documentation into your language 🙏.
**🗞️ Open an issue**
To get started, navigate to the [Issues](https://github.com/huggingface/transformers/issues) page of this repo and check if anyone else has opened an issue for your language. If not, open a new issue by selecting the "Translation template" from the "New issue" button.
Once an issue exists, post a comment to indicate which chapters you'd like to work on, and we'll add your name to the list.
**🍴 Fork the repository**
First, you'll need to [fork the Transformers repo](https://docs.github.com/en/get-started/quickstart/fork-a-repo). You can do this by clicking on the **Fork** button on the top-right corner of this repo's page.
Once you've forked the repo, you'll want to get the files on your local machine for editing. You can do that by cloning the fork with Git as follows:
```bash
git clone https://github.com/YOUR-USERNAME/transformers.git
```
**📋 Copy-paste the English version with a new language code**
The documentation files are in one leading directory:
- [`docs/source`](https://github.com/huggingface/transformers/tree/main/docs/source): All the documentation materials are organized here by language.
You'll only need to copy the files in the [`docs/source/en`](https://github.com/huggingface/transformers/tree/main/docs/source/en) directory, so first navigate to your fork of the repo and run the following:
```bash
cd ~/path/to/transformers/docs
cp -r source/en source/LANG-ID
```
Here, `LANG-ID` should be one of the ISO 639-1 or ISO 639-2 language codes -- see [here](https://www.loc.gov/standards/iso639-2/php/code_list.php) for a handy table.
**✍️ Start translating**
The fun part comes - translating the text!
The first thing we recommend is translating the part of the `_toctree.yml` file that corresponds to your doc chapter. This file is used to render the table of contents on the website.
> 🙋 If the `_toctree.yml` file doesn't yet exist for your language, you can create one by copy-pasting from the English version and deleting the sections unrelated to your chapter. Just make sure it exists in the `docs/source/LANG-ID/` directory!
The fields you should add are `local` (with the name of the file containing the translation; e.g. `autoclass_tutorial`), and `title` (with the title of the doc in your language; e.g. `Load pretrained instances with an AutoClass`) -- as a reference, here is the `_toctree.yml` for [English](https://github.com/huggingface/transformers/blob/main/docs/source/en/_toctree.yml):
```yaml
- sections:
- local: pipeline_tutorial # Do not change this! Use the same name for your .md file
title: Pipelines for inference # Translate this!
...
title: Tutorials # Translate this!
```
Once you have translated the `_toctree.yml` file, you can start translating the [MDX](https://mdxjs.com/) files associated with your docs chapter.
> 🙋 If you'd like others to help you with the translation, you should [open an issue](https://github.com/huggingface/transformers/issues) and tag @sgugger.

14
docs/source/de/_config.py Normal file
View File

@ -0,0 +1,14 @@
# docstyle-ignore
INSTALL_CONTENT = """
# Transformers installation
! pip install transformers datasets
# To install from source instead of the last release, comment the command above and uncomment the following one.
# ! pip install git+https://github.com/huggingface/transformers.git
"""
notebook_first_cells = [{"type": "code", "content": INSTALL_CONTENT}]
black_avoid_patterns = {
"{processor_class}": "FakeProcessorClass",
"{model_class}": "FakeModelClass",
"{object_class}": "FakeObjectClass",
}

View File

@ -0,0 +1,12 @@
- sections:
- local: index
title: 🤗 Transformers
- local: quicktour
title: Schnellstart
- local: installation
title: Installation
title: Erste Schritte
- sections:
- local: pipeline_tutorial
title: Pipelines für Inferenzen
title: Tutorials

322
docs/source/de/index.mdx Normal file
View File

@ -0,0 +1,322 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# 🤗 Transformers
Maschinelles Lernen auf dem neuesten Stand der Technik für PyTorch, TensorFlow und JAX.
🤗 Transformers bietet APIs zum einfachen Herunterladen und Trainieren von vortrainierten Modellen auf dem neuesten Stand der Technik. Die Verwendung von vortrainierten Modellen kann Rechenkosten sparen und den CO2-Fußabdruck reduzieren und Zeit sparen, die für das Training eines Modells von Grund auf benötigt wird. Die Modelle können für verschiedene Modalitäten verwendet werden, wie z. B.:
* 📝 Text: Textklassifizierung, Informationsextrahierung, Beantwortung von Fragen, Zusammenfassung, Übersetzung und Texterstellung in über 100 Sprachen.
* 🖼️ Bilder: Bildklassifizierung, Objekterkennung und Segmentierung.
* 🗣️ Audio: Spracherkennung und Audioklassifizierung.
* 🐙 Multimodal: Beantwortung von Tabellenfragen, optische Zeichenerkennung, Informationsextraktion aus gescannten Dokumenten, Videoklassifizierung und Beantwortung visueller Fragen.
Unsere Bibliothek unterstützt die nahtlose Integration von drei der beliebtesten Deep-Learning-Bibliotheken: [PyTorch](https://pytorch.org/), [TensorFlow](https://www.tensorflow.org/) und [JAX](https://jax.readthedocs.io/en/latest/). Trainieren Sie Ihr Modell in drei Codezeilen in einem Framework und laden Sie es zur Inferenz mit einem anderen.
Jede 🤗 Transformers-Architektur ist in einem eigenständigen Python-Modul definiert, so dass sie leicht für Forschung und Experimente angepasst werden kann.
## Wenn Sie auf der Suche nach individueller Unterstützung durch das Hugging Face-Team sind
<a target="_blank" href="https://huggingface.co/support">
<img alt="HuggingFace Expert Acceleration Program" src="https://cdn-media.huggingface.co/marketing/transformers/new-support-improved.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
</a><br>
## Inhalt
Die Dokumentation ist in fünf Teile gegliedert:
- **GET STARTED** enthält eine kurze Tour und Installationsanweisungen, um mit 🤗 Transformers loszulegen.
- **TUTORIALS** sind ein hervorragender Ausgangspunkt, wenn Sie neu in unserer Bibliothek sind. Dieser Abschnitt hilft Ihnen, die grundlegenden Fähigkeiten zu erlangen, die Sie benötigen, um mit 🤗 Transformers zu arbeiten.
- **HOW-TO GUIDES** zeigen Ihnen, wie Sie ein bestimmtes Ziel erreichen können, z. B. die Feinabstimmung eines vortrainierten Modells für die Sprachmodellierung oder die Erstellung eines benutzerdefinierten Modellkopfs.
- **KONZEPTUELLE ANLEITUNGEN** bietet weitere Diskussionen und Erklärungen zu den zugrunde liegenden Konzepten und Ideen hinter Modellen, Aufgaben und der Designphilosophie von 🤗 Transformers.
- **API** beschreibt jede Klasse und Funktion, gruppiert in:
- **MAIN CLASSES** für die Hauptklassen, die die wichtigsten APIs der Bibliothek darstellen.
- MODELLE** für die Klassen und Funktionen, die zu jedem in der Bibliothek implementierten Modell gehören.
- **INTERNAL HELPERS** für die Klassen und Funktionen, die wir intern verwenden.
Die Bibliothek enthält derzeit JAX-, PyTorch- und TensorFlow-Implementierungen, vortrainierte Modellgewichte, Nutzungsskripte und Konvertierungsprogramme für die folgenden Modelle.
### Unterstütze Modelle
<!--This list is updated automatically from the README with _make fix-copies_. Do not update manually! -->
1. **[ALBERT](model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
1. **[BART](model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BARThez](model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
1. **[BARTpho](model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
1. **[BEiT](model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei.
1. **[BERT](model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERT For Sequence Generation](model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[BERTweet](model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
1. **[BigBird-Pegasus](model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[BigBird-RoBERTa](model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CANINE](model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
1. **[CLIP](model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[CodeGen](model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1. **[ConvBERT](model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
1. **[ConvNeXT](model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[CPM](model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CTRL](model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[CvT](model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
1. **[DeiT](model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DialoGPT](model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) and a German version of DistilBERT.
1. **[DiT](model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[DPR](model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[DPT](master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
1. **[ELECTRA](model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[FlauBERT](model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FLAVA](model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
1. **[FNet](model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
1. **[Funnel Transformer](model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GLPN](model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
1. **[GPT](model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT Neo](model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
1. **[GPT NeoX](model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
1. **[GPT-2](model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[GPT-J](model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
1. **[GroupViT](model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
1. **[Hubert](model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
1. **[I-BERT](model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
1. **[ImageGPT](model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
1. **[LayoutLM](model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LayoutLMv2](model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
1. **[LayoutLMv3](model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
1. **[LayoutXLM](model_doc/layoutxlm)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
1. **[LED](model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LeViT](model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze.
1. **[Longformer](model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LongT5](model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.
1. **[LUKE](model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
1. **[LXMERT](model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[M-CTC-T](model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert.
1. **[M2M100](model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
1. **[MarianMT](model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MaskFormer](model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
1. **[mBART](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[mBART-50](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[Megatron-BERT](model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[Megatron-GPT2](model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[mLUKE](model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MobileBERT](model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
1. **[MobileViT](model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari.
1. **[MPNet](model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[MVP](model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen.
1. **[Nezha](model_doc/nezha)** (from Huawei Noahs Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu.
1. **[NLLB](model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team.
1. **[Nyströmformer](model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[OWL-ViT](model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
1. **[Pegasus](model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[PLBart](model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[ProphetNet](model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[QDQBert](model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[RAG](model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
1. **[REALM](model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RegNet](model_doc/regnet)** (from META Platforms) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
1. **[RemBERT](model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/abs/2010.12821) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
1. **[ResNet](model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
1. **[RoBERTa](model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
1. **[RoFormer](model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
1. **[SegFormer](model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
1. **[SEW](model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
1. **[SEW-D](model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
1. **[SpeechToTextTransformer](model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
1. **[SpeechToTextTransformer2](model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
1. **[Splinter](model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
1. **[SqueezeBERT](model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[Swin Transformer V2](model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.
1. **[T5](model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[T5v1.1](model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[TAPEX](model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou.
1. **[Trajectory Transformer](model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine
1. **[Transformer-XL](model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[TrOCR](model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[UL2](model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UniSpeech](model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
1. **[UniSpeechSat](model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
1. **[VAN](model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/abs/2202.09741) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
1. **[VideoMAE](model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang.
1. **[ViLT](model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[Vision Transformer (ViT)](model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
1. **[VisualBERT](model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
1. **[ViTMAE](model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[Wav2Vec2](model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
1. **[Wav2Vec2-Conformer](model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino.
1. **[Wav2Vec2Phoneme](model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli.
1. **[WavLM](model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[XGLM](model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLM-RoBERTa-XL](model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
1. **[XLNet](model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[XLS-R](model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
1. **[XLSR-Wav2Vec2](model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. **[YOLOS](model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.
1. **[YOSO](model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
### Unterstützte Frameworks
Die folgende Tabelle zeigt die derzeitige Unterstützung in der Bibliothek für jedes dieser Modelle, unabhängig davon, ob sie einen Python
Tokenizer haben (als "langsam" bezeichnet), ein "schneller" Tokenizer, der von der 🤗 Tokenizers Bibliothek unterstützt wird, ob sie Unterstützung in Jax (via
Flax), PyTorch, und/oder TensorFlow haben.
<!--This table is updated automatically from the auto modules with _make fix-copies_. Do not update manually!-->
| Model | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support |
|:---------------------------:|:--------------:|:--------------:|:---------------:|:------------------:|:------------:|
| ALBERT | ✅ | ✅ | ✅ | ✅ | ✅ |
| BART | ✅ | ✅ | ✅ | ✅ | ✅ |
| BEiT | ❌ | ❌ | ✅ | ❌ | ✅ |
| BERT | ✅ | ✅ | ✅ | ✅ | ✅ |
| Bert Generation | ✅ | ❌ | ✅ | ❌ | ❌ |
| BigBird | ✅ | ✅ | ✅ | ❌ | ✅ |
| BigBird-Pegasus | ❌ | ❌ | ✅ | ❌ | ❌ |
| Blenderbot | ✅ | ✅ | ✅ | ✅ | ✅ |
| BlenderbotSmall | ✅ | ✅ | ✅ | ✅ | ✅ |
| BLOOM | ❌ | ✅ | ✅ | ❌ | ❌ |
| CamemBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| CANINE | ✅ | ❌ | ✅ | ❌ | ❌ |
| CLIP | ✅ | ✅ | ✅ | ✅ | ✅ |
| CodeGen | ✅ | ✅ | ✅ | ❌ | ❌ |
| ConvBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| ConvNeXT | ❌ | ❌ | ✅ | ✅ | ❌ |
| CTRL | ✅ | ❌ | ✅ | ✅ | ❌ |
| CvT | ❌ | ❌ | ✅ | ❌ | ❌ |
| Data2VecAudio | ❌ | ❌ | ✅ | ❌ | ❌ |
| Data2VecText | ❌ | ❌ | ✅ | ❌ | ❌ |
| Data2VecVision | ❌ | ❌ | ✅ | ✅ | ❌ |
| DeBERTa | ✅ | ✅ | ✅ | ✅ | ❌ |
| DeBERTa-v2 | ✅ | ✅ | ✅ | ✅ | ❌ |
| Decision Transformer | ❌ | ❌ | ✅ | ❌ | ❌ |
| DeiT | ❌ | ❌ | ✅ | ✅ | ❌ |
| DETR | ❌ | ❌ | ✅ | ❌ | ❌ |
| DistilBERT | ✅ | ✅ | ✅ | ✅ | ✅ |
| DPR | ✅ | ✅ | ✅ | ✅ | ❌ |
| DPT | ❌ | ❌ | ✅ | ❌ | ❌ |
| ELECTRA | ✅ | ✅ | ✅ | ✅ | ✅ |
| Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ |
| FairSeq Machine-Translation | ✅ | ❌ | ✅ | ❌ | ❌ |
| FlauBERT | ✅ | ❌ | ✅ | ✅ | ❌ |
| FLAVA | ❌ | ❌ | ✅ | ❌ | ❌ |
| FNet | ✅ | ✅ | ✅ | ❌ | ❌ |
| Funnel Transformer | ✅ | ✅ | ✅ | ✅ | ❌ |
| GLPN | ❌ | ❌ | ✅ | ❌ | ❌ |
| GPT Neo | ❌ | ❌ | ✅ | ❌ | ✅ |
| GPT NeoX | ❌ | ✅ | ✅ | ❌ | ❌ |
| GPT-J | ❌ | ❌ | ✅ | ✅ | ✅ |
| GroupViT | ❌ | ❌ | ✅ | ❌ | ❌ |
| Hubert | ❌ | ❌ | ✅ | ✅ | ❌ |
| I-BERT | ❌ | ❌ | ✅ | ❌ | ❌ |
| ImageGPT | ❌ | ❌ | ✅ | ❌ | ❌ |
| LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ |
| LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ |
| LayoutLMv3 | ✅ | ✅ | ✅ | ❌ | ❌ |
| LED | ✅ | ✅ | ✅ | ✅ | ❌ |
| LeViT | ❌ | ❌ | ✅ | ❌ | ❌ |
| Longformer | ✅ | ✅ | ✅ | ✅ | ❌ |
| LongT5 | ❌ | ❌ | ✅ | ❌ | ✅ |
| LUKE | ✅ | ❌ | ✅ | ❌ | ❌ |
| LXMERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| M-CTC-T | ❌ | ❌ | ✅ | ❌ | ❌ |
| M2M100 | ✅ | ❌ | ✅ | ❌ | ❌ |
| Marian | ✅ | ❌ | ✅ | ✅ | ✅ |
| MaskFormer | ❌ | ❌ | ✅ | ❌ | ❌ |
| mBART | ✅ | ✅ | ✅ | ✅ | ✅ |
| Megatron-BERT | ❌ | ❌ | ✅ | ❌ | ❌ |
| MobileBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| MobileViT | ❌ | ❌ | ✅ | ❌ | ❌ |
| MPNet | ✅ | ✅ | ✅ | ✅ | ❌ |
| MT5 | ✅ | ✅ | ✅ | ✅ | ✅ |
| MVP | ✅ | ✅ | ✅ | ❌ | ❌ |
| Nezha | ❌ | ❌ | ✅ | ❌ | ❌ |
| Nyströmformer | ❌ | ❌ | ✅ | ❌ | ❌ |
| OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ |
| OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ |
| OPT | ❌ | ❌ | ✅ | ✅ | ✅ |
| OWL-ViT | ❌ | ❌ | ✅ | ❌ | ❌ |
| Pegasus | ✅ | ✅ | ✅ | ✅ | ✅ |
| Perceiver | ✅ | ❌ | ✅ | ❌ | ❌ |
| PLBart | ✅ | ❌ | ✅ | ❌ | ❌ |
| PoolFormer | ❌ | ❌ | ✅ | ❌ | ❌ |
| ProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ |
| QDQBert | ❌ | ❌ | ✅ | ❌ | ❌ |
| RAG | ✅ | ❌ | ✅ | ✅ | ❌ |
| REALM | ✅ | ✅ | ✅ | ❌ | ❌ |
| Reformer | ✅ | ✅ | ✅ | ❌ | ❌ |
| RegNet | ❌ | ❌ | ✅ | ✅ | ❌ |
| RemBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| ResNet | ❌ | ❌ | ✅ | ✅ | ❌ |
| RetriBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ |
| RoFormer | ✅ | ✅ | ✅ | ✅ | ✅ |
| SegFormer | ❌ | ❌ | ✅ | ✅ | ❌ |
| SEW | ❌ | ❌ | ✅ | ❌ | ❌ |
| SEW-D | ❌ | ❌ | ✅ | ❌ | ❌ |
| Speech Encoder decoder | ❌ | ❌ | ✅ | ❌ | ✅ |
| Speech2Text | ✅ | ❌ | ✅ | ✅ | ❌ |
| Speech2Text2 | ✅ | ❌ | ❌ | ❌ | ❌ |
| Splinter | ✅ | ✅ | ✅ | ❌ | ❌ |
| SqueezeBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
| Swin Transformer | ❌ | ❌ | ✅ | ✅ | ❌ |
| Swin Transformer V2 | ❌ | ❌ | ✅ | ❌ | ❌ |
| T5 | ✅ | ✅ | ✅ | ✅ | ✅ |
| TAPAS | ✅ | ❌ | ✅ | ✅ | ❌ |
| Trajectory Transformer | ❌ | ❌ | ✅ | ❌ | ❌ |
| Transformer-XL | ✅ | ❌ | ✅ | ✅ | ❌ |
| TrOCR | ❌ | ❌ | ✅ | ❌ | ❌ |
| UniSpeech | ❌ | ❌ | ✅ | ❌ | ❌ |
| UniSpeechSat | ❌ | ❌ | ✅ | ❌ | ❌ |
| VAN | ❌ | ❌ | ✅ | ❌ | ❌ |
| VideoMAE | ❌ | ❌ | ✅ | ❌ | ❌ |
| ViLT | ❌ | ❌ | ✅ | ❌ | ❌ |
| Vision Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ |
| VisionTextDualEncoder | ❌ | ❌ | ✅ | ❌ | ✅ |
| VisualBERT | ❌ | ❌ | ✅ | ❌ | ❌ |
| ViT | ❌ | ❌ | ✅ | ✅ | ✅ |
| ViTMAE | ❌ | ❌ | ✅ | ✅ | ❌ |
| Wav2Vec2 | ✅ | ❌ | ✅ | ✅ | ✅ |
| Wav2Vec2-Conformer | ❌ | ❌ | ✅ | ❌ | ❌ |
| WavLM | ❌ | ❌ | ✅ | ❌ | ❌ |
| XGLM | ✅ | ✅ | ✅ | ❌ | ✅ |
| XLM | ✅ | ❌ | ✅ | ✅ | ❌ |
| XLM-ProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ |
| XLM-RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ |
| XLM-RoBERTa-XL | ❌ | ❌ | ✅ | ❌ | ❌ |
| XLNet | ✅ | ✅ | ✅ | ✅ | ❌ |
| YOLOS | ❌ | ❌ | ✅ | ❌ | ❌ |
| YOSO | ❌ | ❌ | ✅ | ❌ | ❌ |
<!-- End table-->

View File

@ -0,0 +1,246 @@
<!---
Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Installation
Installieren Sie 🤗 Transformers für die Deep-Learning-Bibliothek, mit der Sie arbeiten, richten Sie Ihren Cache ein und konfigurieren Sie 🤗 Transformers optional für den Offline-Betrieb.
🤗 Transformers wurde unter Python 3.6+, PyTorch 1.1.0+, TensorFlow 2.0+, und Flax getestet. Folgen Sie den Installationsanweisungen unten für die von Ihnen verwendete Deep-Learning-Bibliothek:
* [PyTorch](https://pytorch.org/get-started/locally/) installation instructions.
* [TensorFlow 2.0](https://www.tensorflow.org/install/pip) installation instructions.
* [Flax](https://flax.readthedocs.io/en/latest/) installation instructions.
## Installation mit pip
Sie sollten 🤗 Transformers in einer [virtuellen Umgebung](https://docs.python.org/3/library/venv.html) installieren. Wenn Sie mit virtuellen Python-Umgebungen nicht vertraut sind, werfen Sie einen Blick auf diese [Anleitung](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). Eine virtuelle Umgebung macht es einfacher, verschiedene Projekte zu verwalten und Kompatibilitätsprobleme zwischen Abhängigkeiten zu vermeiden.
Beginnen wir mit der Erstellung einer virtuellen Umgebung in Ihrem Projektverzeichnis:
```bash
python -m venv .env
```
Aktivieren wir die virtuelle Umgebung. Unter Linux und MacOs:
```bash
source .env/bin/activate
```
Aktivieren wir die virtuelle Umgebung unter Windows
```bash
.env/Scripts/activate
```
Jetzt können wir die 🤗 Transformers mit dem folgenden Befehl installieren:
```bash
pip install transformers
```
Bei reiner CPU-Unterstützung können wir 🤗 Transformers und eine Deep-Learning-Bibliothek bequem in einer Zeile installieren. Installieren wir zum Beispiel 🤗 Transformers und PyTorch mit:
```bash
pip install transformers[torch]
```
🤗 Transformers und TensorFlow 2.0:
```bash
pip install transformers[tf-cpu]
```
🤗 Transformers und Flax:
```bash
pip install transformers[flax]
```
Überprüfen wir abschließend, ob 🤗 Transformers ordnungsgemäß installiert wurde, indem wir den folgenden Befehl ausführen. Es wird ein vortrainiertes Modell heruntergeladen:
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))"
```
Dann wird die Kategorie und die Wahrscheinlichkeit ausgegeben:
```bash
[{'label': 'POSITIVE', 'score': 0.9998704791069031}]
```
## Installation aus dem Code
Installieren wir 🤗 Transformers aus dem Quellcode mit dem folgenden Befehl:
```bash
pip install git+https://github.com/huggingface/transformers
```
Dieser Befehl installiert die aktuelle `main` Version und nicht die neueste `stable` Version. Die `main`-Version ist nützlich, um mit den neuesten Entwicklungen Schritt zu halten. Zum Beispiel, wenn ein Fehler seit der letzten offiziellen Version behoben wurde, aber eine neue Version noch nicht veröffentlicht wurde. Das bedeutet jedoch, dass die "Hauptversion" nicht immer stabil ist. Wir bemühen uns, die Hauptversion einsatzbereit zu halten, und die meisten Probleme werden normalerweise innerhalb weniger Stunden oder eines Tages behoben. Wenn Sie auf ein Problem stoßen, öffnen Sie bitte ein [Issue] (https://github.com/huggingface/transformers/issues), damit wir es noch schneller beheben können!
Überprüfen wir, ob 🤗 Transformers richtig installiert wurde, indem Sie den folgenden Befehl ausführen:
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I love you'))"
```
## Editierbare Installation
Sie benötigen eine bearbeitbare Installation, wenn Sie:
* die "Haupt"-Version des Quellcodes verwenden möchten.
* Zu 🤗 Transformers beitragen und Änderungen am Code testen wollen.
Klonen Sie das Repository und installieren 🤗 Transformers mit den folgenden Befehlen:
```bash
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install -e .
```
Diese Befehle verknüpfen den Ordner, in den Sie das Repository geklont haben, mit den Pfaden Ihrer Python-Bibliotheken. Python wird nun in dem Ordner suchen, in den Sie geklont haben, zusätzlich zu den normalen Bibliothekspfaden. Wenn zum Beispiel Ihre Python-Pakete normalerweise in `~/anaconda3/envs/main/lib/python3.7/site-packages/` installiert sind, wird Python auch den Ordner durchsuchen, in den Sie geklont haben: `~/transformers/`.
<Tip warning={true}>
Sie müssen den Ordner `transformers` behalten, wenn Sie die Bibliothek weiter verwenden wollen.
</Tip>
Jetzt können Sie Ihren Klon mit dem folgenden Befehl ganz einfach auf die neueste Version von 🤗 Transformers aktualisieren:
```bash
cd ~/transformers/
git pull
```
Ihre Python-Umgebung wird beim nächsten Ausführen die `main`-Version von 🤗 Transformers finden.
## Installation mit conda
Installation von dem conda Kanal `huggingface`:
```bash
conda install -c huggingface transformers
```
## Cache Einrichtung
Vorgefertigte Modelle werden heruntergeladen und lokal zwischengespeichert unter: `~/.cache/huggingface/hub`. Dies ist das Standardverzeichnis, das durch die Shell-Umgebungsvariable "TRANSFORMERS_CACHE" vorgegeben ist. Unter Windows wird das Standardverzeichnis durch `C:\Benutzer\Benutzername\.cache\huggingface\hub` angegeben. Sie können die unten aufgeführten Shell-Umgebungsvariablen - in der Reihenfolge ihrer Priorität - ändern, um ein anderes Cache-Verzeichnis anzugeben:
1. Shell-Umgebungsvariable (Standard): `HUGGINGFACE_HUB_CACHE` oder `TRANSFORMERS_CACHE`.
2. Shell-Umgebungsvariable: `HF_HOME`.
3. Shell-Umgebungsvariable: `XDG_CACHE_HOME` + `/huggingface`.
<Tip>
Transformers verwendet die Shell-Umgebungsvariablen `PYTORCH_TRANSFORMERS_CACHE` oder `PYTORCH_PRETRAINED_BERT_CACHE`, wenn Sie von einer früheren Iteration dieser Bibliothek kommen und diese Umgebungsvariablen gesetzt haben, sofern Sie nicht die Shell-Umgebungsvariable `TRANSFORMERS_CACHE` angeben.
</Tip>
## Offline Modus
Transformers ist in der Lage, in einer Firewall- oder Offline-Umgebung zu laufen, indem es nur lokale Dateien verwendet. Setzen Sie die Umgebungsvariable `TRANSFORMERS_OFFLINE=1`, um dieses Verhalten zu aktivieren.
<Tip>
Fügen sie [🤗 Datasets](https://huggingface.co/docs/datasets/) zu Ihrem Offline-Trainingsworkflow hinzufügen, indem Sie die Umgebungsvariable `HF_DATASETS_OFFLINE=1` setzen.
</Tip>
So würden Sie beispielsweise ein Programm in einem normalen Netzwerk mit einer Firewall für externe Instanzen mit dem folgenden Befehl ausführen:
```bash
python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
```
Führen Sie das gleiche Programm in einer Offline-Instanz mit aus:
```bash
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
```
Das Skript sollte nun laufen, ohne sich aufzuhängen oder eine Zeitüberschreitung abzuwarten, da es weiß, dass es nur nach lokalen Dateien suchen soll.
### Abrufen von Modellen und Tokenizern zur Offline-Verwendung
Eine andere Möglichkeit, 🤗 Transformers offline zu verwenden, besteht darin, die Dateien im Voraus herunterzuladen und dann auf ihren lokalen Pfad zu verweisen, wenn Sie sie offline verwenden müssen. Es gibt drei Möglichkeiten, dies zu tun:
* Laden Sie eine Datei über die Benutzeroberfläche des [Model Hub](https://huggingface.co/models) herunter, indem Sie auf das ↓-Symbol klicken.
![download-icon](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/download-icon.png)
* Verwenden Sie den [PreTrainedModel.from_pretrained] und [PreTrainedModel.save_pretrained] Workflow:
1. Laden Sie Ihre Dateien im Voraus mit [`PreTrainedModel.from_pretrained`] herunter:
```py
>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
>>> tokenizer = AutoTokenizer.from_pretrained("bigscience/T0_3B")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0_3B")
```
2. Speichern Sie Ihre Dateien in einem bestimmten Verzeichnis mit [`PreTrainedModel.save_pretrained`]:
```py
>>> tokenizer.save_pretrained("./your/path/bigscience_t0")
>>> model.save_pretrained("./your/path/bigscience_t0")
```
3. Wenn Sie nun offline sind, laden Sie Ihre Dateien mit [`PreTrainedModel.from_pretrained`] aus dem bestimmten Verzeichnis:
```py
>>> tokenizer = AutoTokenizer.from_pretrained("./your/path/bigscience_t0")
>>> model = AutoModel.from_pretrained("./your/path/bigscience_t0")
```
* Programmatisches Herunterladen von Dateien mit der [huggingface_hub](https://github.com/huggingface/huggingface_hub/tree/main/src/huggingface_hub) Bibliothek:
1. Installieren Sie die "huggingface_hub"-Bibliothek in Ihrer virtuellen Umgebung:
```bash
python -m pip install huggingface_hub
```
2. Verwenden Sie die Funktion [`hf_hub_download`](https://huggingface.co/docs/hub/adding-a-library#download-files-from-the-hub), um eine Datei in einen bestimmten Pfad herunterzuladen. Der folgende Befehl lädt zum Beispiel die Datei "config.json" aus dem Modell [T0](https://huggingface.co/bigscience/T0_3B) in den gewünschten Pfad herunter:
```py
>>> from huggingface_hub import hf_hub_download
>>> hf_hub_download(repo_id="bigscience/T0_3B", filename="config.json", cache_dir="./your/path/bigscience_t0")
```
Sobald Ihre Datei heruntergeladen und lokal zwischengespeichert ist, geben Sie den lokalen Pfad an, um sie zu laden und zu verwenden:
```py
>>> from transformers import AutoConfig
>>> config = AutoConfig.from_pretrained("./your/path/bigscience_t0/config.json")
```
<Tip>
Weitere Informationen zum Herunterladen von Dateien, die auf dem Hub gespeichert sind, finden Sie im Abschnitt [Wie man Dateien vom Hub herunterlädt] (https://huggingface.co/docs/hub/how-to-downstream).
</Tip>

View File

@ -0,0 +1,171 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Pipelines für Inferenzen
Die [`pipeline`] macht es einfach, jedes beliebige Modell aus dem [Hub](https://huggingface.co/models) für die Inferenz auf jede Sprache, Computer Vision, Sprache und multimodale Aufgaben zu verwenden. Selbst wenn Sie keine Erfahrung mit einer bestimmten Modalität haben oder nicht mit dem zugrundeliegenden Code hinter den Modellen vertraut sind, können Sie sie mit der [`pipeline`] für Inferenzen verwenden! In diesem Beispiel lernen Sie, wie:
* Eine [`pipeline`] für Inferenz zu verwenden.
* Einen bestimmten Tokenizer oder ein bestimmtes Modell zu verwenden.
* Eine [`pipeline`] für Audio-, Vision- und multimodale Aufgaben zu verwenden.
<Tip>
Eine vollständige Liste der unterstützten Aufgaben und verfügbaren Parameter finden Sie in der [`pipeline`]-Dokumentation.
</Tip>
## Verwendung von Pipelines
Obwohl jede Aufgabe eine zugehörige [`pipeline`] hat, ist es einfacher, die allgemeine [`pipeline`]-Abstraktion zu verwenden, die alle aufgabenspezifischen Pipelines enthält. Die [`pipeline`] lädt automatisch ein Standardmodell und eine Vorverarbeitungsklasse, die für Ihre Aufgabe inferenzfähig ist.
1. Beginnen Sie mit der Erstellung einer [`pipeline`] und geben Sie eine Inferenzaufgabe an:
```py
>>> from transformers import pipeline
>>> generator = pipeline(task="text-generation")
```
2. Übergeben Sie Ihren Eingabetext an die [`pipeline`]:
```py
>>> generator(
... "Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone"
... ) # doctest: +SKIP
[{'generated_text': 'Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone, Seven for the Iron-priests at the door to the east, and thirteen for the Lord Kings at the end of the mountain'}]
```
Wenn Sie mehr als eine Eingabe haben, übergeben Sie die Eingabe als Liste:
```py
>>> generator(
... [
... "Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone",
... "Nine for Mortal Men, doomed to die, One for the Dark Lord on his dark throne",
... ]
... ) # doctest: +SKIP
```
Alle zusätzlichen Parameter für Ihre Aufgabe können auch in die [`pipeline`] aufgenommen werden. Die Aufgabe `Text-Generierung` hat eine [`~generation_utils.GenerationMixin.generate`]-Methode mit mehreren Parametern zur Steuerung der Ausgabe. Wenn Sie zum Beispiel mehr als eine Ausgabe erzeugen wollen, setzen Sie den Parameter `num_return_sequences`:
```py
>>> generator(
... "Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone",
... num_return_sequences=2,
... ) # doctest: +SKIP
```
### Wählen Sie ein Modell und einen Tokenizer
Die [`pipeline`] akzeptiert jedes Modell aus dem [Hub] (https://huggingface.co/models). Auf dem Hub gibt es Tags, mit denen Sie nach einem Modell filtern können, das Sie für Ihre Aufgabe verwenden möchten. Sobald Sie ein passendes Modell ausgewählt haben, laden Sie es mit der entsprechenden `AutoModelFor` und [`AutoTokenizer`] Klasse. Laden Sie zum Beispiel die Klasse [`AutoModelForCausalLM`] für eine kausale Sprachmodellierungsaufgabe:
```py
>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>> tokenizer = AutoTokenizer.from_pretrained("distilgpt2")
>>> model = AutoModelForCausalLM.from_pretrained("distilgpt2")
```
Erstellen Sie eine [`pipeline`] für Ihre Aufgabe, und geben Sie das Modell und den Tokenizer an, die Sie geladen haben:
```py
>>> from transformers import pipeline
>>> generator = pipeline(task="text-generation", model=model, tokenizer=tokenizer)
```
Übergeben Sie Ihren Eingabetext an die [`pipeline`] , um einen Text zu erzeugen:
```py
>>> generator(
... "Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone"
... ) # doctest: +SKIP
[{'generated_text': 'Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone, Seven for the Dragon-lords (for them to rule in a world ruled by their rulers, and all who live within the realm'}]
```
## Audio-Pipeline
Die [`pipeline`] unterstützt auch Audioaufgaben wie Audioklassifizierung und automatische Spracherkennung.
Lassen Sie uns zum Beispiel die Emotion in diesem Audioclip klassifizieren:
```py
>>> from datasets import load_dataset
>>> import torch
>>> torch.manual_seed(42) # doctest: +IGNORE_RESULT
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> audio_file = ds[0]["audio"]["path"]
```
Finden Sie ein [Audioklassifikation](https://huggingface.co/models?pipeline_tag=audio-classification) Modell auf dem Model Hub für Emotionserkennung und laden Sie es in die [`pipeline`]:
```py
>>> from transformers import pipeline
>>> audio_classifier = pipeline(
... task="audio-classification", model="ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition"
... )
```
Übergeben Sie die Audiodatei an die [`pipeline`]:
```py
>>> preds = audio_classifier(audio_file)
>>> preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
>>> preds
[{'score': 0.1315, 'label': 'calm'}, {'score': 0.1307, 'label': 'neutral'}, {'score': 0.1274, 'label': 'sad'}, {'score': 0.1261, 'label': 'fearful'}, {'score': 0.1242, 'label': 'happy'}]
```
## Bildverarbeitungs-Pipeline
Die Verwendung einer [`pipeline`] für Bildverarbeitungsaufgaben ist praktisch identisch.
Geben Sie Ihre Aufgabe an und übergeben Sie Ihr Bild an den Klassifikator. Das Bild kann ein Link oder ein lokaler Pfad zu dem Bild sein. Zum Beispiel: Welche Katzenart ist unten abgebildet?
![pipeline-cat-chonk](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg)
```py
>>> from transformers import pipeline
>>> vision_classifier = pipeline(task="image-classification")
>>> preds = vision_classifier(
... images="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
... )
>>> preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
>>> preds
[{'score': 0.4335, 'label': 'lynx, catamount'}, {'score': 0.0348, 'label': 'cougar, puma, catamount, mountain lion, painter, panther, Felis concolor'}, {'score': 0.0324, 'label': 'snow leopard, ounce, Panthera uncia'}, {'score': 0.0239, 'label': 'Egyptian cat'}, {'score': 0.0229, 'label': 'tiger cat'}]
```
## Multimodale Pipeline
Die [`pipeline`] unterstützt mehr als eine Modalität. Eine Aufgabe zur Beantwortung visueller Fragen (VQA) kombiniert zum Beispiel Text und Bild. Verwenden Sie einen beliebigen Bildlink und eine Frage, die Sie zu dem Bild stellen möchten. Das Bild kann eine URL oder ein lokaler Pfad zu dem Bild sein.
Wenn Sie zum Beispiel das gleiche Bild wie in der obigen Vision-Pipeline verwenden:
```py
>>> image = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
>>> question = "Where is the cat?"
```
Erstellen Sie eine Pipeline für "vqa" und übergeben Sie ihr das Bild und die Frage:
```py
>>> from transformers import pipeline
>>> vqa = pipeline(task="vqa")
>>> preds = vqa(image=image, question=question)
>>> preds = [{"score": round(pred["score"], 4), "answer": pred["answer"]} for pred in preds]
>>> preds
[{'score': 0.9112, 'answer': 'snow'}, {'score': 0.8796, 'answer': 'in snow'}, {'score': 0.6717, 'answer': 'outside'}, {'score': 0.0291, 'answer': 'on ground'}, {'score': 0.027, 'answer': 'ground'}]
```

View File

@ -0,0 +1,428 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Schnellstart
[[open-in-colab]]
Mit 🤗 Transformers können Sie sofort loslegen! Verwenden Sie die [`pipeline`] für schnelle Inferenz und laden Sie schnell ein vortrainiertes Modell und einen Tokenizer mit einer [AutoClass](./model_doc/auto), um Ihre Text-, Bild- oder Audioaufgabe zu lösen.
<Tip>
Alle in der Dokumentation vorgestellten Codebeispiele haben oben links einen Umschalter für PyTorch und TensorFlow. Wenn
nicht, wird erwartet, dass der Code für beide Backends ohne Änderungen funktioniert.
</Tip>
## Pipeline
[`pipeline`] ist der einfachste Weg, ein vortrainiertes Modell für eine bestimmte Aufgabe zu verwenden.
<Youtube id="tiZFewofSLM"/>
Die [`pipeline`] unterstützt viele gängige Aufgaben:
**Text**:
* Stimmungsanalyse: Klassifizierung der Polarität eines gegebenen Textes.
* Textgenerierung (auf Englisch): Generierung von Text aus einer gegebenen Eingabe.
* Name-Entity-Recognition (NER): Kennzeichnung jedes Worts mit der Entität, die es repräsentiert (Person, Datum, Ort usw.).
* Beantwortung von Fragen: Extrahieren der Antwort aus dem Kontext, wenn ein gewisser Kontext und eine Frage gegeben sind.
* Fill-mask: Ausfüllen von Lücken in einem Text mit maskierten Wörtern.
* Zusammenfassung: Erstellung einer Zusammenfassung einer langen Text- oder Dokumentensequenz.
* Übersetzung: Übersetzen eines Textes in eine andere Sprache.
* Merkmalsextraktion: Erstellen einer Tensordarstellung des Textes.
**Bild**:
* Bildklassifizierung: Klassifizierung eines Bildes.
* Bildsegmentierung: Klassifizierung jedes Pixels in einem Bild.
* Objekterkennung: Erkennen von Objekten innerhalb eines Bildes.
**Audio**:
* Audioklassifizierung: Zuweisung eines Labels zu einem bestimmten Audiosegment.
* Automatische Spracherkennung (ASR): Transkription von Audiodaten in Text.
<Tip>
Für mehr Details über die [`pipeline`] und assoziierte Aufgaben, schauen Sie in die Dokumentation [hier](./main_classes/pipelines).
</Tip>
### Verwendung der Pipeline
Im folgenden Beispiel werden Sie die [`pipeline`] für die Stimmungsanalyse verwenden.
Installieren Sie die folgenden Abhängigkeiten, falls Sie dies nicht bereits getan haben:
<frameworkcontent>
<pt>
```bash
pip install torch
```
</pt>
<tf>
```bash
pip install tensorflow
```
</tf>
</frameworkcontent>
Importieren sie die [`pipeline`] und spezifizieren sie die Aufgabe, welche sie lösen möchten:
```py
>>> from transformers import pipeline
>>> classifier = pipeline("sentiment-analysis")
```
Die Pipeline lädt ein standardmäßiges [vortrainiertes Modell] (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) und einen Tokenizer für die Stimmungs-Analyse herunter und speichert sie. Jetzt können Sie den "Klassifikator" auf Ihren Zieltext anwenden:
```py
>>> classifier("We are very happy to show you the 🤗 Transformers library.")
[{'label': 'POSITIVE', 'score': 0.9998}]
```
For more than one sentence, pass a list of sentences to the [`pipeline`] which returns a list of dictionaries:
```py
>>> results = classifier(["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."])
>>> for result in results:
... print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.5309
```
Die [`pipeline`] kann auch über einen ganzen Datensatz iterieren. Starten wir mit der Installation der [🤗 Datasets](https://huggingface.co/docs/datasets/) Bibliothek:
```bash
pip install datasets
```
Erstellen wir eine [`pipeline`] mit der Aufgabe die wir lösen und dem Modell welches wir nutzen möchten.
```py
>>> import torch
>>> from transformers import pipeline
>>> speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h")
```
Als nächstes laden wir den Datensatz (siehe 🤗 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart.html) für mehr Details) welches wir nutzen möchten. Zum Beispiel laden wir den [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) Datensatz:
```py
>>> from datasets import load_dataset, Audio
>>> dataset = load_dataset("PolyAI/minds14", name="en-US", split="train") # doctest: +IGNORE_RESULT
```
Wir müssen sicherstellen, dass die Abtastrate des Datensatzes der Abtastrate entspricht, mit der `facebook/wav2vec2-base-960h` trainiert wurde.
```py
>>> dataset = dataset.cast_column("audio", Audio(sampling_rate=speech_recognizer.feature_extractor.sampling_rate))
```
Audiodateien werden automatisch geladen und neu abgetastet, wenn die Spalte "audio" aufgerufen wird.
Extrahieren wir die rohen Wellenform-Arrays der ersten 4 Beispiele und übergeben wir sie als Liste an die Pipeline:
```py
>>> result = speech_recognizer(dataset[:4]["audio"])
>>> print([d["text"] for d in result])
['I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT', "FODING HOW I'D SET UP A JOIN TO HET WITH MY WIFE AND WHERE THE AP MIGHT BE", "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE AP SO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AND I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS", 'HOW DO I THURN A JOIN A COUNT']
```
Bei einem größeren Datensatz mit vielen Eingaben (wie bei Sprache oder Bildverarbeitung) sollten Sie einen Generator anstelle einer Liste übergeben, der alle Eingaben in den Speicher lädt. Weitere Informationen finden Sie in der [Pipeline-Dokumentation](./main_classes/pipelines).
### Ein anderes Modell und einen anderen Tokenizer in der Pipeline verwenden
Die [`pipeline`] kann jedes Modell aus dem [Model Hub] (https://huggingface.co/models) verwenden, wodurch es einfach ist, die [`pipeline`] für andere Anwendungsfälle anzupassen. Wenn Sie beispielsweise ein Modell wünschen, das französischen Text verarbeiten kann, verwenden Sie die Tags im Model Hub, um nach einem geeigneten Modell zu filtern. Das oberste gefilterte Ergebnis liefert ein mehrsprachiges [BERT-Modell](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment), das auf die Stimmungsanalyse abgestimmt ist. Großartig, verwenden wir dieses Modell!
```py
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
```
<frameworkcontent>
<pt>
Use the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and it's associated tokenizer (more on an `AutoClass` below):
```py
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
>>> model = AutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
</pt>
<tf>
Use the [`TFAutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and it's associated tokenizer (more on an `TFAutoClass` below):
```py
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
</tf>
</frameworkcontent>
Dann können Sie das Modell und den Tokenizer in der [`pipeline`] angeben und den `Klassifikator` auf Ihren Zieltext anwenden:
```py
>>> classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
>>> classifier("Nous sommes très heureux de vous présenter la bibliothèque 🤗 Transformers.")
[{'label': '5 stars', 'score': 0.7273}]
```
Wenn Sie kein Modell für Ihren Anwendungsfall finden können, müssen Sie ein vortrainiertes Modell auf Ihren Daten feinabstimmen. Schauen Sie sich unser [Feinabstimmungs-Tutorial](./training) an, um zu erfahren, wie das geht. Und schließlich, nachdem Sie Ihr trainiertes Modell verfeinert haben, sollten Sie es mit der Community im Model Hub teilen (siehe Tutorial [hier](./model_sharing)), um NLP für alle zu demokratisieren! 🤗
## AutoClass
<Youtube id="AhChOFRegn4"/>
Unter der Haube arbeiten die Klassen [`AutoModelForSequenceClassification`] und [`AutoTokenizer`] zusammen, um die [`pipeline`] zu betreiben. Eine [`AutoClass`](./model_doc/auto) ist eine Abkürzung, die automatisch die Architektur eines trainierten Modells aus dessen Namen oder Pfad abruft. Sie müssen nur die passende `AutoClass` für Ihre Aufgabe und den zugehörigen Tokenizer mit [`AutoTokenizer`] auswählen.
Kehren wir zu unserem Beispiel zurück und sehen wir uns an, wie Sie die `AutoClass` verwenden können, um die Ergebnisse der [`pipeline`] zu replizieren.
### AutoTokenizer
Ein Tokenizer ist für die Vorverarbeitung von Text in ein für das Modell verständliches Format zuständig. Zunächst zerlegt der Tokenisierer den Text in Wörter, die *Token* genannt werden. Es gibt mehrere Regeln für den Tokenisierungsprozess, z. B. wie und auf welcher Ebene ein Wort aufgespalten wird (weitere Informationen über Tokenisierung [hier](./tokenizer_summary)). Das Wichtigste ist jedoch, dass Sie den Tokenizer mit demselben Modellnamen instanziieren müssen, um sicherzustellen, dass Sie dieselben Tokenisierungsregeln verwenden, mit denen ein Modell zuvor trainiert wurde.
Laden sie einen Tokenizer mit [`AutoTokenizer`]:
```py
>>> from transformers import AutoTokenizer
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
Anschließend wandelt der Tokenizer die Token in Zahlen um, um einen Tensor als Eingabe für das Modell zu konstruieren. Dieser wird als *Vokabular* des Modells bezeichnet.
Übergeben Sie Ihren Text an den Tokenizer:
```py
>>> encoding = tokenizer("We are very happy to show you the 🤗 Transformers library.")
>>> print(encoding)
{'input_ids': [101, 11312, 10320, 12495, 19308, 10114, 11391, 10855, 10103, 100, 58263, 13299, 119, 102],
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
```
Der Tokenizer gibt ein Wörterbuch zurück, das Folgendes enthält:
* [input_ids](./glossary#input-ids): numerische Repräsentationen Ihrer Token.
* [atttention_mask](.glossary#attention-mask): gibt an, welche Token beachtet werden sollen.
Genau wie die [`pipeline`] akzeptiert der Tokenizer eine Liste von Eingaben. Darüber hinaus kann der Tokenizer den Text auch auffüllen und kürzen, um einen Stapel mit einheitlicher Länge zurückzugeben:
<frameworkcontent>
<pt>
```py
>>> pt_batch = tokenizer(
... ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
... padding=True,
... truncation=True,
... max_length=512,
... return_tensors="pt",
... )
```
</pt>
<tf>
```py
>>> tf_batch = tokenizer(
... ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
... padding=True,
... truncation=True,
... max_length=512,
... return_tensors="tf",
... )
```
</tf>
</frameworkcontent>
Lesen Sie das Tutorial [preprocessing](./preprocessing) für weitere Details zur Tokenisierung.
### AutoModel
<frameworkcontent>
<pt>
🤗 Transformers bietet eine einfache und einheitliche Möglichkeit, vortrainierte Instanzen zu laden. Das bedeutet, dass Sie ein [`AutoModel`] laden können, wie Sie einen [`AutoTokenizer`] laden würden. Der einzige Unterschied ist die Auswahl des richtigen [`AutoModel`] für die Aufgabe. Da Sie eine Text- oder Sequenzklassifizierung vornehmen, laden Sie [`AutoModelForSequenceClassification`]:
```py
>>> from transformers import AutoModelForSequenceClassification
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(model_name)
```
<Tip>
In der [Aufgabenzusammenfassung](./task_summary) steht, welche [AutoModel]-Klasse für welche Aufgabe zu verwenden ist.
</Tip>
Jetzt können Sie Ihren vorverarbeiteten Stapel von Eingaben direkt an das Modell übergeben. Sie müssen nur das Wörterbuch entpacken, indem Sie `**` hinzufügen:
```py
>>> pt_outputs = pt_model(**pt_batch)
```
Das Modell gibt die endgültigen Aktivierungen in dem Attribut "logits" aus. Wenden Sie die Softmax-Funktion auf die "logits" an, um die Wahrscheinlichkeiten zu erhalten:
```py
>>> from torch import nn
>>> pt_predictions = nn.functional.softmax(pt_outputs.logits, dim=-1)
>>> print(pt_predictions)
tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
[0.2084, 0.1826, 0.1969, 0.1755, 0.2365]], grad_fn=<SoftmaxBackward0>)
```
</pt>
<tf>
🤗 Transformers bietet eine einfache und einheitliche Methode zum Laden von vortrainierten Instanzen. Das bedeutet, dass Sie ein [`TFAutoModel`] genauso laden können, wie Sie einen [`AutoTokenizer`] laden würden. Der einzige Unterschied ist die Auswahl des richtigen [`TFAutoModel`] für die Aufgabe. Da Sie Text - oder Sequenz - Klassifizierung machen, laden Sie [`TFAutoModelForSequenceClassification`]:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
```
<Tip>
In der [Aufgabenzusammenfassung](./task_summary) steht, welche [AutoModel]-Klasse für welche Aufgabe zu verwenden ist.
</Tip>
Jetzt können Sie Ihren vorverarbeiteten Stapel von Eingaben direkt an das Modell übergeben, indem Sie die Wörterbuchschlüssel direkt an die Tensoren übergeben:
```py
>>> tf_outputs = tf_model(tf_batch)
```
Das Modell gibt die endgültigen Aktivierungen in dem Attribut "logits" aus. Wenden Sie die Softmax-Funktion auf die "logits" an, um die Wahrscheinlichkeiten zu erhalten:
```py
>>> import tensorflow as tf
>>> tf_predictions = tf.nn.softmax(tf_outputs.logits, axis=-1)
>>> tf_predictions # doctest: +IGNORE_RESULT
```
</tf>
</frameworkcontent>
<Tip>
Alle 🤗 Transformers-Modelle (PyTorch oder TensorFlow) geben die Tensoren *vor* der endgültigen Aktivierungsfunktion
Funktion (wie Softmax) aus, da die endgültige Aktivierungsfunktion oft mit dem Verlusten verschmolzen ist.
</Tip>
Modelle sind ein standardmäßiges [`torch.nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) oder ein [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model), sodass Sie sie in Ihrer üblichen Trainingsschleife verwenden können. Um jedoch die Dinge einfacher zu machen, bietet 🤗 Transformers eine [`Trainer`]-Klasse für PyTorch, die Funktionalität für verteiltes Training, gemischte Präzision und mehr bietet. Für TensorFlow können Sie die Methode `fit` aus [Keras](https://keras.io/) verwenden. Siehe das [training tutorial](./training) für weitere Details.
<Tip>
Transformers-Modellausgaben sind spezielle Datenklassen, so dass ihre Attribute in einer IDE automatisch vervollständigt werden.
Die Modellausgänge verhalten sich auch wie ein Tupel oder ein Wörterbuch (z.B. können Sie mit einem Integer, einem Slice oder einem String indexieren), wobei die Attribute, die "None" sind, ignoriert werden.
</Tip>
### Modell speichern
<frameworkcontent>
<pt>
Sobald Ihr Modell feinabgestimmt ist, können Sie es mit seinem Tokenizer speichern, indem Sie [`PreTrainedModel.save_pretrained`] verwenden:
```py
>>> pt_save_directory = "./pt_save_pretrained"
>>> tokenizer.save_pretrained(pt_save_directory) # doctest: +IGNORE_RESULT
>>> pt_model.save_pretrained(pt_save_directory)
```
Wenn Sie bereit sind, das Modell erneut zu verwenden, laden Sie es mit [`PreTrainedModel.from_pretrained`]:
```py
>>> pt_model = AutoModelForSequenceClassification.from_pretrained("./pt_save_pretrained")
```
</pt>
<tf>
Sobald Ihr Modell feinabgestimmt ist, können Sie es mit seinem Tokenizer unter Verwendung von [`TFPreTrainedModel.save_pretrained`] speichern:
```py
>>> tf_save_directory = "./tf_save_pretrained"
>>> tokenizer.save_pretrained(tf_save_directory) # doctest: +IGNORE_RESULT
>>> tf_model.save_pretrained(tf_save_directory)
```
Wenn Sie bereit sind, das Modell wieder zu verwenden, laden Sie es mit [`TFPreTrainedModel.from_pretrained`]:
```py
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained("./tf_save_pretrained")
```
</tf>
</frameworkcontent>
Ein besonders cooles 🤗 Transformers-Feature ist die Möglichkeit, ein Modell zu speichern und es entweder als PyTorch- oder TensorFlow-Modell wieder zu laden. Der Parameter "from_pt" oder "from_tf" kann das Modell von einem Framework in das andere konvertieren:
<frameworkcontent>
<pt>
```py
>>> from transformers import AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained(tf_save_directory)
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(tf_save_directory, from_tf=True)
```
</pt>
<tf>
```py
>>> from transformers import TFAutoModel
>>> tokenizer = AutoTokenizer.from_pretrained(pt_save_directory)
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(pt_save_directory, from_pt=True)
```
</tf>
</frameworkcontent>
## Custom model builds
Sie können die Konfigurationsklasse des Modells ändern, um zu bestimmen, wie ein Modell aufgebaut ist. Die Konfiguration legt die Attribute eines Modells fest, z. B. die Anzahl der verborgenen Schichten oder der Aufmerksamkeitsköpfe. Wenn Sie ein Modell aus einer benutzerdefinierten Konfigurationsklasse initialisieren, beginnen Sie bei Null. Die Modellattribute werden zufällig initialisiert, und Sie müssen das Modell trainieren, bevor Sie es verwenden können, um aussagekräftige Ergebnisse zu erhalten.
Beginnen Sie mit dem Import von [`AutoConfig`] und laden Sie dann das trainierte Modell, das Sie ändern möchten. Innerhalb von [`AutoConfig.from_pretrained`] können Sie das Attribut angeben, das Sie ändern möchten, z. B. die Anzahl der Aufmerksamkeitsköpfe:
```py
>>> from transformers import AutoConfig
>>> my_config = AutoConfig.from_pretrained("distilbert-base-uncased", n_heads=12)
```
<frameworkcontent>
<pt>
Create a model from your custom configuration with [`AutoModel.from_config`]:
```py
>>> from transformers import AutoModel
>>> my_model = AutoModel.from_config(my_config)
```
</pt>
<tf>
Create a model from your custom configuration with [`TFAutoModel.from_config`]:
```py
>>> from transformers import TFAutoModel
>>> my_model = TFAutoModel.from_config(my_config)
```
</tf>
</frameworkcontent>
Weitere Informationen zur Erstellung von benutzerdefinierten Konfigurationen finden Sie in der Anleitung [Erstellen einer benutzerdefinierten Architektur](./create_a_model).
## Wie geht es weiter?
Nachdem Sie nun die 🤗 Transformers-Kurztour abgeschlossen haben, schauen Sie sich unsere Anleitungen an und erfahren Sie, wie Sie spezifischere Dinge tun können, wie das Schreiben eines benutzerdefinierten Modells, die Feinabstimmung eines Modells für eine Aufgabe und wie man ein Modell mit einem Skript trainiert. Wenn Sie mehr über die Kernkonzepte von 🤗 Transformers erfahren möchten, nehmen Sie sich eine Tasse Kaffee und werfen Sie einen Blick auf unsere konzeptionellen Leitfäden!

View File

@ -1,4 +1,4 @@
- sections:
- sections:
- local: index
title: 🤗 Transformers
- local: quicktour
@ -21,70 +21,107 @@
title: Share a model
title: Tutorials
- sections:
- local: fast_tokenizers
title: "Use tokenizers from 🤗 Tokenizers"
- local: create_a_model
title: Create a custom architecture
- local: custom_models
title: Sharing custom models
- sections:
- local: tasks/sequence_classification
title: Text classification
- local: tasks/token_classification
title: Token classification
- local: tasks/question_answering
title: Question answering
- local: tasks/language_modeling
title: Language modeling
- local: tasks/translation
title: Translation
- local: tasks/summarization
title: Summarization
- local: tasks/multiple_choice
title: Multiple choice
- local: create_a_model
title: Create a custom architecture
- local: custom_models
title: Sharing custom models
- local: run_scripts
title: Train with a script
- local: sagemaker
title: Run training on Amazon SageMaker
- local: converting_tensorflow_models
title: Converting TensorFlow Checkpoints
- local: serialization
title: Export 🤗 Transformers models
- local: troubleshooting
title: Troubleshoot
title: General usage
- sections:
- local: fast_tokenizers
title: Use tokenizers from 🤗 Tokenizers
- local: multilingual
title: Inference for multilingual models
- sections:
- local: tasks/sequence_classification
title: Text classification
- local: tasks/token_classification
title: Token classification
- local: tasks/question_answering
title: Question answering
- local: tasks/language_modeling
title: Language modeling
- local: tasks/translation
title: Translation
- local: tasks/summarization
title: Summarization
- local: tasks/multiple_choice
title: Multiple choice
title: Task guides
isExpanded: false
title: Natural Language Processing
- sections:
- local: tasks/audio_classification
title: Audio classification
- local: tasks/asr
title: Automatic speech recognition
title: Audio
- sections:
- local: tasks/image_classification
title: Image classification
title: Fine-tune for downstream tasks
- local: run_scripts
title: Train with a script
- local: sagemaker
title: Run training on Amazon SageMaker
- local: multilingual
title: Inference for multilingual models
- local: converting_tensorflow_models
title: Converting TensorFlow Checkpoints
- local: serialization
title: Export 🤗 Transformers models
- local: performance
title: 'Performance and Scalability: How To Fit a Bigger Model and Train It Faster'
- local: parallelism
title: Model Parallelism
- local: tasks/semantic_segmentation
title: Semantic segmentation
title: Computer Vision
- sections:
- local: performance
title: Overview
- local: perf_train_gpu_one
title: Training on one GPU
- local: perf_train_gpu_many
title: Training on many GPUs
- local: perf_train_cpu
title: Training on CPU
- local: perf_train_cpu_many
title: Training on many CPUs
- local: perf_train_tpu
title: Training on TPUs
- local: perf_train_special
title: Training on Specialized Hardware
- local: perf_infer_cpu
title: Inference on CPU
- local: perf_infer_gpu_one
title: Inference on one GPU
- local: perf_infer_gpu_many
title: Inference on many GPUs
- local: perf_infer_special
title: Inference on Specialized Hardware
- local: perf_hardware
title: Custom hardware for training
- local: big_models
title: Instantiating a big model
- local: debugging
title: Debugging
title: Performance and scalability
- sections:
- local: contributing
title: How to contribute to transformers?
- local: add_new_model
title: How to add a model to 🤗 Transformers?
- local: add_new_pipeline
title: How to add a pipeline to 🤗 Transformers?
- local: testing
title: Testing
- local: pr_checks
title: Checks on a Pull Request
title: Contribute
- local: notebooks
title: 🤗 Transformers Notebooks
- local: community
title: Community resources
- local: benchmarks
title: Benchmarks
- local: migration
title: Migrating from previous packages
- local: troubleshooting
title: Troubleshoot
- local: debugging
title: Debugging
- local: notebooks
title: "🤗 Transformers Notebooks"
- local: community
title: Community
- local: contributing
title: How to contribute to transformers?
- local: add_new_model
title: "How to add a model to 🤗 Transformers?"
- local: add_new_pipeline
title: "How to add a pipeline to 🤗 Transformers?"
- local: testing
title: Testing
- local: pr_checks
title: Checks on a Pull Request
title: How-to guides
- sections:
- local: philosophy
@ -140,244 +177,311 @@
title: Feature Extractor
title: Main Classes
- sections:
- local: model_doc/albert
title: ALBERT
- local: model_doc/auto
title: Auto Classes
- local: model_doc/bart
title: BART
- local: model_doc/barthez
title: BARThez
- local: model_doc/bartpho
title: BARTpho
- local: model_doc/beit
title: BEiT
- local: model_doc/bert
title: BERT
- local: model_doc/bertweet
title: Bertweet
- local: model_doc/bert-generation
title: BertGeneration
- local: model_doc/bert-japanese
title: BertJapanese
- local: model_doc/big_bird
title: BigBird
- local: model_doc/bigbird_pegasus
title: BigBirdPegasus
- local: model_doc/blenderbot
title: Blenderbot
- local: model_doc/blenderbot-small
title: Blenderbot Small
- local: model_doc/bort
title: BORT
- local: model_doc/byt5
title: ByT5
- local: model_doc/camembert
title: CamemBERT
- local: model_doc/canine
title: CANINE
- local: model_doc/convnext
title: ConvNeXT
- local: model_doc/clip
title: CLIP
- local: model_doc/convbert
title: ConvBERT
- local: model_doc/cpm
title: CPM
- local: model_doc/ctrl
title: CTRL
- local: model_doc/data2vec
title: Data2Vec
- local: model_doc/deberta
title: DeBERTa
- local: model_doc/deberta-v2
title: DeBERTa-v2
- local: model_doc/decision_transformer
title: Decision Transformer
- local: model_doc/deit
title: DeiT
- local: model_doc/detr
title: DETR
- local: model_doc/dialogpt
title: DialoGPT
- local: model_doc/distilbert
title: DistilBERT
- local: model_doc/dit
title: DiT
- local: model_doc/dpr
title: DPR
- local: model_doc/dpt
title: DPT
- local: model_doc/electra
title: ELECTRA
- local: model_doc/encoder-decoder
title: Encoder Decoder Models
- local: model_doc/flaubert
title: FlauBERT
- local: model_doc/fnet
title: FNet
- local: model_doc/fsmt
title: FSMT
- local: model_doc/funnel
title: Funnel Transformer
- local: model_doc/glpn
title: GLPN
- local: model_doc/herbert
title: HerBERT
- local: model_doc/ibert
title: I-BERT
- local: model_doc/imagegpt
title: ImageGPT
- local: model_doc/layoutlm
title: LayoutLM
- local: model_doc/layoutlmv2
title: LayoutLMV2
- local: model_doc/layoutxlm
title: LayoutXLM
- local: model_doc/led
title: LED
- local: model_doc/longformer
title: Longformer
- local: model_doc/luke
title: LUKE
- local: model_doc/lxmert
title: LXMERT
- local: model_doc/marian
title: MarianMT
- local: model_doc/maskformer
title: MaskFormer
- local: model_doc/m2m_100
title: M2M100
- local: model_doc/mbart
title: MBart and MBart-50
- local: model_doc/megatron-bert
title: MegatronBERT
- local: model_doc/megatron_gpt2
title: MegatronGPT2
- local: model_doc/mluke
title: mLUKE
- local: model_doc/mobilebert
title: MobileBERT
- local: model_doc/mpnet
title: MPNet
- local: model_doc/mt5
title: MT5
- local: model_doc/nystromformer
title: Nyströmformer
- local: model_doc/openai-gpt
title: OpenAI GPT
- local: model_doc/gpt2
title: OpenAI GPT2
- local: model_doc/gptj
title: GPT-J
- local: model_doc/gpt_neo
title: GPT Neo
- local: model_doc/hubert
title: Hubert
- local: model_doc/perceiver
title: Perceiver
- local: model_doc/pegasus
title: Pegasus
- local: model_doc/phobert
title: PhoBERT
- local: model_doc/plbart
title: PLBart
- local: model_doc/poolformer
title: PoolFormer
- local: model_doc/prophetnet
title: ProphetNet
- local: model_doc/qdqbert
title: QDQBert
- local: model_doc/rag
title: RAG
- local: model_doc/realm
title: REALM
- local: model_doc/reformer
title: Reformer
- local: model_doc/rembert
title: RemBERT
- local: model_doc/regnet
title: RegNet
- local: model_doc/resnet
title: ResNet
- local: model_doc/retribert
title: RetriBERT
- local: model_doc/roberta
title: RoBERTa
- local: model_doc/roformer
title: RoFormer
- local: model_doc/segformer
title: SegFormer
- local: model_doc/sew
title: SEW
- local: model_doc/sew-d
title: SEW-D
- local: model_doc/speech-encoder-decoder
title: Speech Encoder Decoder Models
- local: model_doc/speech_to_text
title: Speech2Text
- local: model_doc/speech_to_text_2
title: Speech2Text2
- local: model_doc/splinter
title: Splinter
- local: model_doc/squeezebert
title: SqueezeBERT
- local: model_doc/swin
title: Swin Transformer
- local: model_doc/t5
title: T5
- local: model_doc/t5v1.1
title: T5v1.1
- local: model_doc/tapas
title: TAPAS
- local: model_doc/tapex
title: TAPEX
- local: model_doc/transfo-xl
title: Transformer XL
- local: model_doc/trocr
title: TrOCR
- local: model_doc/unispeech
title: UniSpeech
- local: model_doc/unispeech-sat
title: UniSpeech-SAT
- local: model_doc/van
title: VAN
- local: model_doc/vilt
title: ViLT
- local: model_doc/vision-encoder-decoder
title: Vision Encoder Decoder Models
- local: model_doc/vision-text-dual-encoder
title: Vision Text Dual Encoder
- local: model_doc/vit
title: Vision Transformer (ViT)
- local: model_doc/vit_mae
title: ViTMAE
- local: model_doc/visual_bert
title: VisualBERT
- local: model_doc/wav2vec2
title: Wav2Vec2
- local: model_doc/wav2vec2_phoneme
title: Wav2Vec2Phoneme
- local: model_doc/wavlm
title: WavLM
- local: model_doc/xglm
title: XGLM
- local: model_doc/xlm
title: XLM
- local: model_doc/xlm-prophetnet
title: XLM-ProphetNet
- local: model_doc/xlm-roberta
title: XLM-RoBERTa
- local: model_doc/xlm-roberta-xl
title: XLM-RoBERTa-XL
- local: model_doc/xlnet
title: XLNet
- local: model_doc/xlsr_wav2vec2
title: XLSR-Wav2Vec2
- local: model_doc/xls_r
title: XLS-R
- local: model_doc/yoso
title: YOSO
- isExpanded: false
sections:
- local: model_doc/albert
title: ALBERT
- local: model_doc/bart
title: BART
- local: model_doc/barthez
title: BARThez
- local: model_doc/bartpho
title: BARTpho
- local: model_doc/bert
title: BERT
- local: model_doc/bert-generation
title: BertGeneration
- local: model_doc/bert-japanese
title: BertJapanese
- local: model_doc/bertweet
title: Bertweet
- local: model_doc/big_bird
title: BigBird
- local: model_doc/bigbird_pegasus
title: BigBirdPegasus
- local: model_doc/blenderbot
title: Blenderbot
- local: model_doc/blenderbot-small
title: Blenderbot Small
- local: model_doc/bloom
title: BLOOM
- local: model_doc/bort
title: BORT
- local: model_doc/byt5
title: ByT5
- local: model_doc/camembert
title: CamemBERT
- local: model_doc/canine
title: CANINE
- local: model_doc/codegen
title: CodeGen
- local: model_doc/convbert
title: ConvBERT
- local: model_doc/cpm
title: CPM
- local: model_doc/ctrl
title: CTRL
- local: model_doc/deberta
title: DeBERTa
- local: model_doc/deberta-v2
title: DeBERTa-v2
- local: model_doc/dialogpt
title: DialoGPT
- local: model_doc/distilbert
title: DistilBERT
- local: model_doc/dpr
title: DPR
- local: model_doc/electra
title: ELECTRA
- local: model_doc/encoder-decoder
title: Encoder Decoder Models
- local: model_doc/ernie
title: ERNIE
- local: model_doc/flaubert
title: FlauBERT
- local: model_doc/fnet
title: FNet
- local: model_doc/fsmt
title: FSMT
- local: model_doc/funnel
title: Funnel Transformer
- local: model_doc/openai-gpt
title: GPT
- local: model_doc/gpt_neo
title: GPT Neo
- local: model_doc/gpt_neox
title: GPT NeoX
- local: model_doc/gptj
title: GPT-J
- local: model_doc/gpt2
title: GPT2
- local: model_doc/herbert
title: HerBERT
- local: model_doc/ibert
title: I-BERT
- local: model_doc/layoutlm
title: LayoutLM
- local: model_doc/led
title: LED
- local: model_doc/longformer
title: Longformer
- local: model_doc/longt5
title: LongT5
- local: model_doc/luke
title: LUKE
- local: model_doc/m2m_100
title: M2M100
- local: model_doc/marian
title: MarianMT
- local: model_doc/mbart
title: MBart and MBart-50
- local: model_doc/megatron-bert
title: MegatronBERT
- local: model_doc/megatron_gpt2
title: MegatronGPT2
- local: model_doc/mluke
title: mLUKE
- local: model_doc/mobilebert
title: MobileBERT
- local: model_doc/mpnet
title: MPNet
- local: model_doc/mt5
title: MT5
- local: model_doc/mvp
title: MVP
- local: model_doc/nezha
title: NEZHA
- local: model_doc/nllb
title: NLLB
- local: model_doc/nystromformer
title: Nyströmformer
- local: model_doc/opt
title: OPT
- local: model_doc/pegasus
title: Pegasus
- local: model_doc/pegasus_x
title: PEGASUS-X
- local: model_doc/phobert
title: PhoBERT
- local: model_doc/plbart
title: PLBart
- local: model_doc/prophetnet
title: ProphetNet
- local: model_doc/qdqbert
title: QDQBert
- local: model_doc/rag
title: RAG
- local: model_doc/realm
title: REALM
- local: model_doc/reformer
title: Reformer
- local: model_doc/rembert
title: RemBERT
- local: model_doc/retribert
title: RetriBERT
- local: model_doc/roberta
title: RoBERTa
- local: model_doc/roformer
title: RoFormer
- local: model_doc/splinter
title: Splinter
- local: model_doc/squeezebert
title: SqueezeBERT
- local: model_doc/t5
title: T5
- local: model_doc/t5v1.1
title: T5v1.1
- local: model_doc/tapas
title: TAPAS
- local: model_doc/tapex
title: TAPEX
- local: model_doc/transfo-xl
title: Transformer XL
- local: model_doc/ul2
title: UL2
- local: model_doc/xglm
title: XGLM
- local: model_doc/xlm
title: XLM
- local: model_doc/xlm-prophetnet
title: XLM-ProphetNet
- local: model_doc/xlm-roberta
title: XLM-RoBERTa
- local: model_doc/xlm-roberta-xl
title: XLM-RoBERTa-XL
- local: model_doc/xlnet
title: XLNet
- local: model_doc/yoso
title: YOSO
title: Text models
- isExpanded: false
sections:
- local: model_doc/beit
title: BEiT
- local: model_doc/convnext
title: ConvNeXT
- local: model_doc/cvt
title: CvT
- local: model_doc/deit
title: DeiT
- local: model_doc/detr
title: DETR
- local: model_doc/dit
title: DiT
- local: model_doc/dpt
title: DPT
- local: model_doc/glpn
title: GLPN
- local: model_doc/imagegpt
title: ImageGPT
- local: model_doc/levit
title: LeViT
- local: model_doc/maskformer
title: MaskFormer
- local: model_doc/mobilevit
title: MobileViT
- local: model_doc/poolformer
title: PoolFormer
- local: model_doc/regnet
title: RegNet
- local: model_doc/resnet
title: ResNet
- local: model_doc/segformer
title: SegFormer
- local: model_doc/swin
title: Swin Transformer
- local: model_doc/swinv2
title: Swin Transformer V2
- local: model_doc/van
title: VAN
- local: model_doc/videomae
title: VideoMAE
- local: model_doc/vit
title: Vision Transformer (ViT)
- local: model_doc/vit_mae
title: ViTMAE
- local: model_doc/yolos
title: YOLOS
title: Vision models
- isExpanded: false
sections:
- local: model_doc/hubert
title: Hubert
- local: model_doc/mctct
title: MCTCT
- local: model_doc/sew
title: SEW
- local: model_doc/sew-d
title: SEW-D
- local: model_doc/speech_to_text
title: Speech2Text
- local: model_doc/speech_to_text_2
title: Speech2Text2
- local: model_doc/unispeech
title: UniSpeech
- local: model_doc/unispeech-sat
title: UniSpeech-SAT
- local: model_doc/wav2vec2
title: Wav2Vec2
- local: model_doc/wav2vec2-conformer
title: Wav2Vec2-Conformer
- local: model_doc/wav2vec2_phoneme
title: Wav2Vec2Phoneme
- local: model_doc/wavlm
title: WavLM
- local: model_doc/xls_r
title: XLS-R
- local: model_doc/xlsr_wav2vec2
title: XLSR-Wav2Vec2
title: Audio models
- isExpanded: false
sections:
- local: model_doc/clip
title: CLIP
- local: model_doc/data2vec
title: Data2Vec
- local: model_doc/donut
title: Donut
- local: model_doc/flava
title: FLAVA
- local: model_doc/groupvit
title: GroupViT
- local: model_doc/layoutlmv2
title: LayoutLMV2
- local: model_doc/layoutlmv3
title: LayoutLMV3
- local: model_doc/layoutxlm
title: LayoutXLM
- local: model_doc/lxmert
title: LXMERT
- local: model_doc/owlvit
title: OWL-ViT
- local: model_doc/perceiver
title: Perceiver
- local: model_doc/speech-encoder-decoder
title: Speech Encoder Decoder Models
- local: model_doc/trocr
title: TrOCR
- local: model_doc/vilt
title: ViLT
- local: model_doc/vision-encoder-decoder
title: Vision Encoder Decoder Models
- local: model_doc/vision-text-dual-encoder
title: Vision Text Dual Encoder
- local: model_doc/visual_bert
title: VisualBERT
- local: model_doc/xclip
title: X-CLIP
title: Multimodal models
- isExpanded: false
sections:
- local: model_doc/decision_transformer
title: Decision Transformer
- local: model_doc/trajectory_transformer
title: Trajectory Transformer
title: Reinforcement learning models
title: Models
- sections:
- local: internal/modeling_utils
@ -393,4 +497,4 @@
- local: internal/file_utils
title: General Utilities
title: Internal Helpers
title: API
title: API

View File

@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# Distributed training with 🤗 Accelerate
As models get bigger, parallelism has emerged as a strategy for training larger models on limited hardware and accelerating training speed by several orders of magnitude. At Hugging Face, we created the [🤗 Accelerate](https://huggingface.co/docs/accelerate/index.html) library to help users easily train a 🤗 Transformers model on any type of distributed setup, whether it is multiple GPU's on one machine or multiple GPU's across several machines. In this tutorial, learn how to customize your native PyTorch training loop to enable training in a distributed environment.
As models get bigger, parallelism has emerged as a strategy for training larger models on limited hardware and accelerating training speed by several orders of magnitude. At Hugging Face, we created the [🤗 Accelerate](https://huggingface.co/docs/accelerate) library to help users easily train a 🤗 Transformers model on any type of distributed setup, whether it is multiple GPU's on one machine or multiple GPU's across several machines. In this tutorial, learn how to customize your native PyTorch training loop to enable training in a distributed environment.
## Setup
@ -22,7 +22,7 @@ Get started by installing 🤗 Accelerate:
pip install accelerate
```
Then import and create an [`Accelerator`](https://huggingface.co/docs/accelerate/accelerator.html#accelerate.Accelerator) object. `Accelerator` will automatically detect your type of distributed setup and initialize all the necessary components for training. You don't need to explicitly place your model on a device.
Then import and create an [`~accelerate.Accelerator`] object. The [`~accelerate.Accelerator`] will automatically detect your type of distributed setup and initialize all the necessary components for training. You don't need to explicitly place your model on a device.
```py
>>> from accelerate import Accelerator
@ -32,7 +32,7 @@ Then import and create an [`Accelerator`](https://huggingface.co/docs/accelerate
## Prepare to accelerate
The next step is to pass all the relevant training objects to the [`prepare`](https://huggingface.co/docs/accelerate/accelerator.html#accelerate.Accelerator.prepare) method. This includes your training and evaluation DataLoaders, a model and an optimizer:
The next step is to pass all the relevant training objects to the [`~accelerate.Accelerator.prepare`] method. This includes your training and evaluation DataLoaders, a model and an optimizer:
```py
>>> train_dataloader, eval_dataloader, model, optimizer = accelerator.prepare(
@ -42,7 +42,7 @@ The next step is to pass all the relevant training objects to the [`prepare`](ht
## Backward
The last addition is to replace the typical `loss.backward()` in your training loop with 🤗 Accelerate's [`backward`](https://huggingface.co/docs/accelerate/accelerator.html#accelerate.Accelerator.backward) method:
The last addition is to replace the typical `loss.backward()` in your training loop with 🤗 Accelerate's [`~accelerate.Accelerator.backward`]method:
```py
>>> for epoch in range(num_epochs):
@ -121,7 +121,7 @@ accelerate launch train.py
### Train with a notebook
🤗 Accelerate can also run in a notebook if you're planning on using Colaboratory's TPUs. Wrap all the code responsible for training in a function, and pass it to `notebook_launcher`:
🤗 Accelerate can also run in a notebook if you're planning on using Colaboratory's TPUs. Wrap all the code responsible for training in a function, and pass it to [`~accelerate.notebook_launcher`]:
```py
>>> from accelerate import notebook_launcher
@ -129,4 +129,4 @@ accelerate launch train.py
>>> notebook_launcher(training_function)
```
For more information about 🤗 Accelerate and it's rich features, refer to the [documentation](https://huggingface.co/docs/accelerate/index.html).
For more information about 🤗 Accelerate and it's rich features, refer to the [documentation](https://huggingface.co/docs/accelerate).

View File

@ -813,13 +813,9 @@ checkpoint and to get the required access rights to be able to upload the model
*brand_new_bert*. The `push_to_hub` method, present in all models in `transformers`, is a quick and efficient way to push your checkpoint to the hub. A little snippet is pasted below:
```python
brand_new_bert.push_to_hub(
repo_path_or_name="brand_new_bert",
# Uncomment the following line to push to an organization
# organization="<ORGANIZATION>",
commit_message="Add model",
use_temp_dir=True,
)
brand_new_bert.push_to_hub("brand_new_bert")
# Uncomment the following line to push to an organization.
# brand_new_bert.push_to_hub("<organization>/brand_new_bert")
```
It is worth spending some time to create fitting model cards for each checkpoint. The model cards should highlight the

View File

@ -9,7 +9,10 @@ Unless required by applicable law or agreed to in writing, software distributed
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-->
# How to add a pipeline to 🤗 Transformers?
# How to create a custom pipeline?
In this guide, we will see how to create a custom pipeline and share it on the [Hub](hf.co/models) or add it to the
Transformers library.
First and foremost, you need to decide the raw entries the pipeline will be able to take. It can be strings, raw bytes,
dictionaries or whatever seems to be the most likely desired input. Try to keep these inputs as pure Python as possible
@ -99,7 +102,7 @@ def _sanitize_parameters(self, **kwargs):
postprocess_kwargs = {}
if "top_k" in kwargs:
preprocess_kwargs["top_k"] = kwargs["top_k"]
postprocess_kwargs["top_k"] = kwargs["top_k"]
return preprocess_kwargs, {}, postprocess_kwargs
```
@ -111,12 +114,123 @@ of arguments for ease of use (audio files, can be filenames, URLs or pure bytes)
## Adding it to the list of supported tasks
Go to `src/transformers/pipelines/__init__.py` and fill in `SUPPORTED_TASKS` with your newly created pipeline.
If possible it should provide a default model.
To register your `new-task` to the list of supported tasks, you have to add it to the `PIPELINE_REGISTRY`:
## Adding tests
```python
from transformers.pipelines import PIPELINE_REGISTRY
Create a new file `tests/test_pipelines_MY_PIPELINE.py` with example with the other tests.
PIPELINE_REGISTRY.register_pipeline(
"new-task",
pipeline_class=MyPipeline,
pt_model=AutoModelForSequenceClassification,
)
```
You can specify a default model if you want, in which case it should come with a specific revision (which can be the name of a branch or a commit hash, here we took `"abcdef"`) as well was the type:
```python
PIPELINE_REGISTRY.register_pipeline(
"new-task",
pipeline_class=MyPipeline,
pt_model=AutoModelForSequenceClassification,
default={"pt": ("user/awesome_model", "abcdef")},
type="text", # current support type: text, audio, image, multimodal
)
```
## Share your pipeline on the Hub
To share your custom pipeline on the Hub, you just have to save the custom code of your `Pipeline` subclass in a
python file. For instance, let's say we want to use a custom pipeline for sentence pair classification like this:
```py
import numpy as np
from transformers import Pipeline
def softmax(outputs):
maxes = np.max(outputs, axis=-1, keepdims=True)
shifted_exp = np.exp(outputs - maxes)
return shifted_exp / shifted_exp.sum(axis=-1, keepdims=True)
class PairClassificationPipeline(Pipeline):
def _sanitize_parameters(self, **kwargs):
preprocess_kwargs = {}
if "second_text" in kwargs:
preprocess_kwargs["second_text"] = kwargs["second_text"]
return preprocess_kwargs, {}, {}
def preprocess(self, text, second_text=None):
return self.tokenizer(text, text_pair=second_text, return_tensors=self.framework)
def _forward(self, model_inputs):
return self.model(**model_inputs)
def postprocess(self, model_outputs):
logits = model_outputs.logits[0].numpy()
probabilities = softmax(logits)
best_class = np.argmax(probabilities)
label = self.model.config.id2label[best_class]
score = probabilities[best_class].item()
logits = logits.tolist()
return {"label": label, "score": score, "logits": logits}
```
The implementation is framework agnostic, and will work for PyTorch and TensorFlow models. If we have saved this in
a file named `pair_classification.py`, we can then import it and register it like this:
```py
from pair_classification import PairClassificationPipeline
from transformers.pipelines import PIPELINE_REGISTRY
from transformers import AutoModelForSequenceClassification, TFAutoModelForSequenceClassification
PIPELINE_REGISTRY.register_pipeline(
"pair-classification",
pipeline_class=PairClassificationPipeline,
pt_model=AutoModelForSequenceClassification,
tf_model=TFAutoModelForSequenceClassification,
)
```
Once this is done, we can use it with a pretrained model. For instance `sgugger/finetuned-bert-mrpc` has been
fine-tuned on the MRPC dataset, which classifies pairs of sentences as paraphrases or not.
```py
from transformers import pipeline
classifier = pipeline("pair-classification", model="sgugger/finetuned-bert-mrpc")
```
Then we can share it on the Hub by using the `save_pretrained` method in a `Repository`:
```py
from huggingface_hub import Repository
repo = Repository("test-dynamic-pipeline", clone_from="{your_username}/test-dynamic-pipeline")
classifier.save_pretrained("test-dynamic-pipeline")
repo.push_to_hub()
```
This will copy the file where you defined `PairClassificationPipeline` inside the folder `"test-dynamic-pipeline"`,
along with saving the model and tokenizer of the pipeline, before pushing everything in the repository
`{your_username}/test-dynamic-pipeline`. After that anyone can use it as long as they provide the option
`trust_remote_code=True`:
```py
from transformers import pipeline
classifier = pipeline(model="{your_username}/test-dynamic-pipeline", trust_remote_code=True)
```
## Add the pipeline to Transformers
If you want to contribute your pipeline to Transformers, you will need to add a new module in the `pipelines` submodule
with the code of your pipeline, then add it in the list of tasks defined in `pipelines/__init__.py`.
Then you will need to add tests. Create a new file `tests/test_pipelines_MY_PIPELINE.py` with example with the other tests.
The `run_pipeline_test` function will be very generic and run on small random models on every possible
architecture as defined by `model_mapping` and `tf_model_mapping`.

View File

@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# Load pretrained instances with an AutoClass
With so many different Transformer architectures, it can be challenging to create one for your checkpoint. As a part of 🤗 Transformers core philosophy to make the library easy, simple and flexible to use, an `AutoClass` automatically infer and load the correct architecture from a given checkpoint. The `from_pretrained` method lets you quickly load a pretrained model for any architecture so you don't have to devote time and resources to train a model from scratch. Producing this type of checkpoint-agnostic code means if your code works for one checkpoint, it will work with another checkpoint - as long as it was trained for a similar task - even if the architecture is different.
With so many different Transformer architectures, it can be challenging to create one for your checkpoint. As a part of 🤗 Transformers core philosophy to make the library easy, simple and flexible to use, an `AutoClass` automatically infer and load the correct architecture from a given checkpoint. The `from_pretrained()` method lets you quickly load a pretrained model for any architecture so you don't have to devote time and resources to train a model from scratch. Producing this type of checkpoint-agnostic code means if your code works for one checkpoint, it will work with another checkpoint - as long as it was trained for a similar task - even if the architecture is different.
<Tip>
@ -95,6 +95,14 @@ Easily reuse the same checkpoint to load an architecture for a different task:
>>> model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased")
```
<Tip warning={true}>
For PyTorch models, the `from_pretrained()` method uses `torch.load()` which internally uses `pickle` and is known to be insecure. In general, never load a model that could have come from an untrusted source, or that could have been tampered with. This security risk is partially mitigated for public models hosted on the Hugging Face Hub, which are [scanned for malware](https://huggingface.co/docs/hub/security-malware) at each commit. See the [Hub documentation](https://huggingface.co/docs/hub/security) for best practices like [signed commit verification](https://huggingface.co/docs/hub/security-gpg#signing-commits-with-gpg) with GPG.
TensorFlow and Flax checkpoints are not affected, and can be loaded within PyTorch architectures using the `from_tf` and `from_flax` kwargs for the `from_pretrained` method to circumvent this issue.
</Tip>
Generally, we recommend using the `AutoTokenizer` class and the `AutoModelFor` class to load pretrained instances of models. This will ensure you load the correct architecture every time. In the next [tutorial](preprocessing), learn how to use your newly loaded tokenizer, feature extractor and processor to preprocess a dataset for fine-tuning.
</pt>
<tf>

View File

@ -0,0 +1,119 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Instantiating a big model
When you want to use a very big pretrained model, one challenge is to minimize the use of the RAM. The usual workflow
from PyTorch is:
1. Create your model with random weights.
2. Load your pretrained weights.
3. Put those pretrained weights in your random model.
Step 1 and 2 both require a full version of the model in memory, which is not a problem in most cases, but if your model starts weighing several GigaBytes, those two copies can make you got our of RAM. Even worse, if you are using `torch.distributed` to launch a distributed training, each process will load the pretrained model and store these two copies in RAM.
<Tip>
Note that the randomly created model is initialized with "empty" tensors, which take the space in memory without filling it (thus the random values are whatever was in this chunk of memory at a given time). The random initialization following the appropriate distribution for the kind of model/parameters instatiated (like a normal distribution for instance) is only performed after step 3 on the non-initialized weights, to be as fast as possible!
</Tip>
In this guide, we explore the solutions Transformers offer to deal with this issue. Note that this is an area of active development, so the APIs explained here may change slightly in the future.
## Sharded checkpoints
Since version 4.18.0, model checkpoints that end up taking more than 10GB of space are automatically sharded in smaller pieces. In terms of having one single checkpoint when you do `model.save_pretrained(save_dir)`, you will end up with several partial checkpoints (each of which being of size < 10GB) and an index that maps parameter names to the files they are stored in.
You can control the maximum size before sharding with the `max_shard_size` parameter, so for the sake of an example, we'll use a normal-size models with a small shard size: let's take a traditional BERT model.
```py
from transformers import AutoModel
model = AutoModel.from_pretrained("bert-base-cased")
```
If you save it using [`~PreTrainedModel.save_pretrained`], you will get a new folder with two files: the config of the model and its weights:
```py
>>> import os
>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmp_dir:
... model.save_pretrained(tmp_dir)
... print(sorted(os.listdir(tmp_dir)))
['config.json', 'pytorch_model.bin']
```
Now let's use a maximum shard size of 200MB:
```py
>>> with tempfile.TemporaryDirectory() as tmp_dir:
... model.save_pretrained(tmp_dir, max_shard_size="200MB")
... print(sorted(os.listdir(tmp_dir)))
['config.json', 'pytorch_model-00001-of-00003.bin', 'pytorch_model-00002-of-00003.bin', 'pytorch_model-00003-of-00003.bin', 'pytorch_model.bin.index.json']
```
On top of the configuration of the model, we see three different weights files, and an `index.json` file which is our index. A checkpoint like this can be fully reloaded using the [`~PreTrainedModel.from_pretrained`] method:
```py
>>> with tempfile.TemporaryDirectory() as tmp_dir:
... model.save_pretrained(tmp_dir, max_shard_size="200MB")
... new_model = AutoModel.from_pretrained(tmp_dir)
```
The main advantage of doing this for big models is that during step 2 of the workflow shown above, each shard of the checkpoint is loaded after the previous one, capping the memory usage in RAM to the model size plus the size of the biggest shard.
Beind the scenes, the index file is used to determine which keys are in the checkpoint, and where the corresponding weights are stored. We can load that index like any json and get a dictionary:
```py
>>> import json
>>> with tempfile.TemporaryDirectory() as tmp_dir:
... model.save_pretrained(tmp_dir, max_shard_size="200MB")
... with open(os.path.join(tmp_dir, "pytorch_model.bin.index.json"), "r") as f:
... index = json.load(f)
>>> print(index.keys())
dict_keys(['metadata', 'weight_map'])
```
The metadata just consists of the total size of the model for now. We plan to add several other informations in the future:
```py
>>> index["metadata"]
{'total_size': 433245184}
```
The weights map is the main part of this index, which maps each parameter name (as usually found in a PyTorch model `state_dict`) to the file it's stored in:
```py
>>> index["weight_map"]
{'embeddings.LayerNorm.bias': 'pytorch_model-00001-of-00003.bin',
'embeddings.LayerNorm.weight': 'pytorch_model-00001-of-00003.bin',
...
```
If you want to directly load such a sharded checkpoint inside a model without using [`~PreTrainedModel.from_pretrained`] (like you would do `model.load_state_dict()` for a full checkpoint) you should use [`~modeling_utils.load_sharded_checkpoint`]:
```py
>>> from transformers.modeling_utils import load_sharded_checkpoint
>>> with tempfile.TemporaryDirectory() as tmp_dir:
... model.save_pretrained(tmp_dir, max_shard_size="200MB")
... load_sharded_checkpoint(model, tmp_dir)
```
## Low memory loading
Sharded checkpoints reduce the memory usage during step 2 of the workflow mentioned above, but in order to use that model in a low memory setting, we recommend leveraging our tools based on the Accelerate library.
Please read the following guide for more information: [Large model loading using Accelerate](./main_classes/model#large-model-loading)

View File

@ -106,7 +106,7 @@ directly upload your config to the Hub.
Now that we have our ResNet configuration, we can go on writing the model. We will actually write two: one that
extracts the hidden features from a batch of images (like [`BertModel`]) and one that is suitable for image
classification (like [`BertModelForSequenceClassification`]).
classification (like [`BertForSequenceClassification`]).
As we mentioned before, we'll only write a loose wrapper of the model to keep it simple for this example. The only
thing we need to do before writing this class is a map between the block types and actual block classes. Then the
@ -289,7 +289,7 @@ from huggingface_hub import notebook_login
notebook_login()
```
You can then push to to your own namespace (or an organization you are a member of) like this:
You can then push to your own namespace (or an organization you are a member of) like this:
```py
resnet50d.push_to_hub("custom-resnet50d")

View File

@ -44,7 +44,7 @@ specific language governing permissions and limitations under the License.
Every model is different yet bears similarities with the others. Therefore most models use the same inputs, which are
detailed here alongside usage examples.
<a id='input-ids'></a>
### Input IDs
@ -113,7 +113,7 @@ we will see
because this is the way a [`BertModel`] is going to expect its inputs.
<a id='attention-mask'></a>
### Attention mask
@ -171,7 +171,7 @@ in the dictionary returned by the tokenizer under the key "attention_mask":
[[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
```
<a id='token-type-ids'></a>
### Token Type IDs
@ -224,7 +224,7 @@ second sequence, corresponding to the "question", has all its tokens represented
Some models, like [`XLNetModel`] use an additional token represented by a `2`.
<a id='position-ids'></a>
### Position IDs
@ -238,7 +238,7 @@ absolute positional embeddings.
Absolute positional embeddings are selected in the range `[0, config.max_position_embeddings - 1]`. Some models use
other types of positional embeddings, such as sinusoidal position embeddings or relative position embeddings.
<a id='labels'></a>
### Labels
@ -266,7 +266,7 @@ These labels are different according to the model head, for example:
The base models (e.g., [`BertModel`]) do not accept labels, as these are the base transformer
models, simply outputting features.
<a id='decoder-input-ids'></a>
### Decoder input IDs
@ -279,7 +279,6 @@ such models, passing the `labels` is the preferred way to handle training.
Please check each model's docs to see how they handle these input IDs for sequence to sequence training.
<a id='feed-forward-chunking'></a>
### Feed Forward Chunking

View File

@ -12,40 +12,38 @@ specific language governing permissions and limitations under the License.
# 🤗 Transformers
State-of-the-art Machine Learning for PyTorch, TensorFlow and JAX.
State-of-the-art Machine Learning for [PyTorch](https://pytorch.org/), [TensorFlow](https://www.tensorflow.org/), and [JAX](https://jax.readthedocs.io/en/latest/).
🤗 Transformers provides APIs to easily download and train state-of-the-art pretrained models. Using pretrained models can reduce your compute costs, carbon footprint, and save you time from training a model from scratch. The models can be used across different modalities such as:
🤗 Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. Using pretrained models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch. These models support common tasks in different modalities, such as:
* 📝 Text: text classification, information extraction, question answering, summarization, translation, and text generation in over 100 languages.
* 🖼️ Images: image classification, object detection, and segmentation.
* 🗣️ Audio: speech recognition and audio classification.
* 🐙 Multimodal: table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
📝 **Natural Language Processing**: text classification, named entity recognition, question answering, language modeling, summarization, translation, multiple choice, and text generation.<br>
🖼️ **Computer Vision**: image classification, object detection, and segmentation.<br>
🗣️ **Audio**: automatic speech recognition and audio classification.<br>
🐙 **Multimodal**: table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
Our library supports seamless integration between three of the most popular deep learning libraries: [PyTorch](https://pytorch.org/), [TensorFlow](https://www.tensorflow.org/) and [JAX](https://jax.readthedocs.io/en/latest/). Train your model in three lines of code in one framework, and load it for inference with another.
🤗 Transformers support framework interoperability between PyTorch, TensorFlow, and JAX. This provides the flexibility to use a different framework at each stage of a model's life; train a model in three lines of code in one framework, and load it for inference in another. Models can also be exported to a format like ONNX and TorchScript for deployment in production environments.
Each 🤗 Transformers architecture is defined in a standalone Python module so they can be easily customized for research and experiments.
Join the growing community on the [Hub](https://huggingface.co/models), [forum](https://discuss.huggingface.co/), or [Discord](https://discord.com/invite/JfAtkvEtRb) today!
## If you are looking for custom support from the Hugging Face team
<a target="_blank" href="https://huggingface.co/support">
<img alt="HuggingFace Expert Acceleration Program" src="https://huggingface.co/front/thumbnails/support.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
<img alt="HuggingFace Expert Acceleration Program" src="https://cdn-media.huggingface.co/marketing/transformers/new-support-improved.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
</a><br>
## Contents
The documentation is organized in five parts:
The documentation is organized into five sections:
- **GET STARTED** contains a quick tour and installation instructions to get up and running with 🤗 Transformers.
- **TUTORIALS** are a great place to begin if you are new to our library. This section will help you gain the basic skills you need to start using 🤗 Transformers.
- **HOW-TO GUIDES** will show you how to achieve a specific goal like fine-tuning a pretrained model for language modeling or how to create a custom model head.
- **CONCEPTUAL GUIDES** provides more discussion and explanation of the underlying concepts and ideas behind models, tasks, and the design philosophy of 🤗 Transformers.
- **API** describes each class and function, grouped in:
- **GET STARTED** provides a quick tour of the library and installation instructions to get up and running.
- **TUTORIALS** are a great place to start if you're a beginner. This section will help you gain the basic skills you need to start using the library.
- **HOW-TO GUIDES** show you how to achieve a specific goal, like finetuning a pretrained model for language modeling or how to write and share a custom model.
- **CONCEPTUAL GUIDES** offers more discussion and explanation of the underlying concepts and ideas behind models, tasks, and the design philosophy of 🤗 Transformers.
- **API** describes all classes and functions:
- **MAIN CLASSES** for the main classes exposing the important APIs of the library.
- **MODELS** for the classes and functions related to each model implemented in the library.
- **INTERNAL HELPERS** for the classes and functions we use internally.
The library currently contains JAX, PyTorch and TensorFlow implementations, pretrained model weights, usage scripts and conversion utilities for the following models.
- **MAIN CLASSES** details the most important classes like configuration, model, tokenizer, and pipeline.
- **MODELS** details the classes and functions related to each model implemented in the library.
- **INTERNAL HELPERS** details utility classes and functions used internally.
### Supported models
@ -57,74 +55,95 @@ The library currently contains JAX, PyTorch and TensorFlow implementations, pret
1. **[BARTpho](model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
1. **[BEiT](model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei.
1. **[BERT](model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERTweet](model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
1. **[BERT For Sequence Generation](model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[BigBird-RoBERTa](model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[BERTweet](model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
1. **[BigBird-Pegasus](model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[BigBird-RoBERTa](model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CANINE](model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
1. **[ConvNeXT](model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[CLIP](model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[CodeGen](model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1. **[ConvBERT](model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
1. **[ConvNeXT](model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[CPM](model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CTRL](model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[CvT](model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
1. **[DiT](model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[DeiT](model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DialoGPT](model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) and a German version of DistilBERT.
1. **[DiT](model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[Donut](model_doc/donut)** (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
1. **[DPR](model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[DPT](master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
1. **[EncoderDecoder](model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[ELECTRA](model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[ERNIE](model_doc/ernie)** (from Baidu) released with the paper [ERNIE: Enhanced Representation through Knowledge Integration](https://arxiv.org/abs/1904.09223) by Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian, Hua Wu.
1. **[FlauBERT](model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FLAVA](model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
1. **[FNet](model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
1. **[Funnel Transformer](model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GLPN](model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
1. **[GPT](model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT Neo](model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
1. **[GPT NeoX](model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
1. **[GPT-2](model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[GPT-J](model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
1. **[GPT Neo](model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
1. **[GroupViT](model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
1. **[Hubert](model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
1. **[I-BERT](model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
1. **[ImageGPT](model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
1. **[LayoutLM](model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LayoutLMv2](model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
1. **[LayoutXLM](model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
1. **[LayoutLMv3](model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
1. **[LayoutXLM](model_doc/layoutxlm)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
1. **[LED](model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LeViT](model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze.
1. **[Longformer](model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LongT5](model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.
1. **[LUKE](model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
1. **[mLUKE](model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[LXMERT](model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[M-CTC-T](model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert.
1. **[M2M100](model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
1. **[MarianMT](model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MaskFormer](model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
1. **[MBart](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[MBart-50](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[mBART](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[mBART-50](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[Megatron-BERT](model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[Megatron-GPT2](model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[mLUKE](model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MobileBERT](model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
1. **[MobileViT](model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari.
1. **[MPNet](model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[MVP](model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen.
1. **[Nezha](model_doc/nezha)** (from Huawei Noahs Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu.
1. **[NLLB](model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team.
1. **[Nyströmformer](model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[OWL-ViT](model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
1. **[Pegasus](model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[PEGASUS-X](model_doc/pegasus_x)** (from Google) released with the paper [Investigating Efficiently Extending Transformers for Long Input Summarization](https://arxiv.org/abs/2208.04347) by Jason Phang, Yao Zhao, and Peter J. Liu.
1. **[Perceiver IO](model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[PLBart](model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[ProphetNet](model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[QDQBert](model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[RAG](model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
1. **[REALM](model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RemBERT](model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/abs/2010.12821) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
1. **[RegNet](model_doc/regnet)** (from META Platforms) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
1. **[RemBERT](model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/abs/2010.12821) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
1. **[ResNet](model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
1. **[RoBERTa](model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
1. **[RoFormer](model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
@ -134,32 +153,39 @@ The library currently contains JAX, PyTorch and TensorFlow implementations, pret
1. **[SpeechToTextTransformer](model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
1. **[SpeechToTextTransformer2](model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
1. **[Splinter](model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
1. **[SqueezeBert](model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[SqueezeBERT](model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[Swin Transformer V2](model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.
1. **[T5](model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[T5v1.1](model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[TAPEX](model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou.
1. **[Trajectory Transformer](model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine
1. **[Transformer-XL](model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[TrOCR](model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[UL2](model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UniSpeech](model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
1. **[UniSpeechSat](model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
1. **[VAN](model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/abs/2202.09741) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
1. **[VideoMAE](model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang.
1. **[ViLT](model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[Vision Transformer (ViT)](model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
1. **[ViTMAE](model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[VisualBERT](model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
1. **[WavLM](model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[ViTMAE](model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[Wav2Vec2](model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
1. **[Wav2Vec2-Conformer](model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino.
1. **[Wav2Vec2Phoneme](model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli.
1. **[WavLM](model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[X-CLIP](model_doc/xclip)** (from Microsoft Research) released with the paper [Expanding Language-Image Pretrained Models for General Video Recognition](https://arxiv.org/abs/2208.02816) by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling.
1. **[XGLM](model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLM-RoBERTa-XL](model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
1. **[XLNet](model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[XLSR-Wav2Vec2](model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. **[XLS-R](model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
1. **[XLSR-Wav2Vec2](model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. **[YOLOS](model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.
1. **[YOSO](model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
@ -179,70 +205,89 @@ Flax), PyTorch, and/or TensorFlow.
| BERT | ✅ | ✅ | ✅ | ✅ | ✅ |
| Bert Generation | ✅ | ❌ | ✅ | ❌ | ❌ |
| BigBird | ✅ | ✅ | ✅ | ❌ | ✅ |
| BigBirdPegasus | ❌ | ❌ | ✅ | ❌ | ❌ |
| BigBird-Pegasus | ❌ | ❌ | ✅ | ❌ | ❌ |
| Blenderbot | ✅ | ✅ | ✅ | ✅ | ✅ |
| BlenderbotSmall | ✅ | ✅ | ✅ | ✅ | ✅ |
| BLOOM | ❌ | ✅ | ✅ | ❌ | ❌ |
| CamemBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| Canine | ✅ | ❌ | ✅ | ❌ | ❌ |
| CANINE | ✅ | ❌ | ✅ | ❌ | ❌ |
| CLIP | ✅ | ✅ | ✅ | ✅ | ✅ |
| CodeGen | ✅ | ✅ | ✅ | ❌ | ❌ |
| ConvBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| ConvNext | ❌ | ❌ | ✅ | ✅ | ❌ |
| ConvNeXT | ❌ | ❌ | ✅ | ✅ | ❌ |
| CTRL | ✅ | ❌ | ✅ | ✅ | ❌ |
| CvT | ❌ | ❌ | ✅ | ❌ | ❌ |
| Data2VecAudio | ❌ | ❌ | ✅ | ❌ | ❌ |
| Data2VecText | ❌ | ❌ | ✅ | ❌ | ❌ |
| Data2VecVision | ❌ | ❌ | ✅ | ✅ | ❌ |
| DeBERTa | ✅ | ✅ | ✅ | ✅ | ❌ |
| DeBERTa-v2 | ✅ | | ✅ | ✅ | ❌ |
| DeBERTa-v2 | ✅ | | ✅ | ✅ | ❌ |
| Decision Transformer | ❌ | ❌ | ✅ | ❌ | ❌ |
| DeiT | ❌ | ❌ | ✅ | | ❌ |
| DeiT | ❌ | ❌ | ✅ | | ❌ |
| DETR | ❌ | ❌ | ✅ | ❌ | ❌ |
| DistilBERT | ✅ | ✅ | ✅ | ✅ | ✅ |
| DonutSwin | ❌ | ❌ | ✅ | ❌ | ❌ |
| DPR | ✅ | ✅ | ✅ | ✅ | ❌ |
| DPT | ❌ | ❌ | ✅ | ❌ | ❌ |
| ELECTRA | ✅ | ✅ | ✅ | ✅ | ✅ |
| Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ |
| ERNIE | ❌ | ❌ | ✅ | ❌ | ❌ |
| FairSeq Machine-Translation | ✅ | ❌ | ✅ | ❌ | ❌ |
| FlauBERT | ✅ | ❌ | ✅ | ✅ | ❌ |
| FLAVA | ❌ | ❌ | ✅ | ❌ | ❌ |
| FNet | ✅ | ✅ | ✅ | ❌ | ❌ |
| Funnel Transformer | ✅ | ✅ | ✅ | ✅ | ❌ |
| GLPN | ❌ | ❌ | ✅ | ❌ | ❌ |
| GPT Neo | ❌ | ❌ | ✅ | ❌ | ✅ |
| GPT NeoX | ❌ | ✅ | ✅ | ❌ | ❌ |
| GPT-J | ❌ | ❌ | ✅ | ✅ | ✅ |
| GroupViT | ❌ | ❌ | ✅ | ❌ | ❌ |
| Hubert | ❌ | ❌ | ✅ | ✅ | ❌ |
| I-BERT | ❌ | ❌ | ✅ | ❌ | ❌ |
| ImageGPT | ❌ | ❌ | ✅ | ❌ | ❌ |
| LayoutLM | ✅ | ✅ | ✅ | ✅ | ❌ |
| LayoutLMv2 | ✅ | ✅ | ✅ | ❌ | ❌ |
| LayoutLMv3 | ✅ | ✅ | ✅ | ✅ | ❌ |
| LED | ✅ | ✅ | ✅ | ✅ | ❌ |
| LeViT | ❌ | ❌ | ✅ | ❌ | ❌ |
| Longformer | ✅ | ✅ | ✅ | ✅ | ❌ |
| LongT5 | ❌ | ❌ | ✅ | ❌ | ✅ |
| LUKE | ✅ | ❌ | ✅ | ❌ | ❌ |
| LXMERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| M-CTC-T | ❌ | ❌ | ✅ | ❌ | ❌ |
| M2M100 | ✅ | ❌ | ✅ | ❌ | ❌ |
| Marian | ✅ | ❌ | ✅ | ✅ | ✅ |
| MaskFormer | ❌ | ❌ | ✅ | ❌ | ❌ |
| mBART | ✅ | ✅ | ✅ | ✅ | ✅ |
| MegatronBert | ❌ | ❌ | ✅ | ❌ | ❌ |
| Megatron-BERT | ❌ | ❌ | ✅ | ❌ | ❌ |
| MobileBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| MobileViT | ❌ | ❌ | ✅ | ✅ | ❌ |
| MPNet | ✅ | ✅ | ✅ | ✅ | ❌ |
| mT5 | ✅ | ✅ | ✅ | ✅ | ✅ |
| Nystromformer | | | ✅ | ❌ | ❌ |
| MT5 | ✅ | ✅ | ✅ | ✅ | ✅ |
| MVP | | | ✅ | ❌ | ❌ |
| Nezha | ❌ | ❌ | ✅ | ❌ | ❌ |
| Nyströmformer | ❌ | ❌ | ✅ | ❌ | ❌ |
| OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ |
| OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ |
| OPT | ❌ | ❌ | ✅ | ✅ | ✅ |
| OWL-ViT | ❌ | ❌ | ✅ | ❌ | ❌ |
| Pegasus | ✅ | ✅ | ✅ | ✅ | ✅ |
| PEGASUS-X | ❌ | ❌ | ✅ | ❌ | ❌ |
| Perceiver | ✅ | ❌ | ✅ | ❌ | ❌ |
| PLBart | ✅ | ❌ | ✅ | ❌ | ❌ |
| PoolFormer | ❌ | ❌ | ✅ | ❌ | ❌ |
| ProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ |
| QDQBert | ❌ | ❌ | ✅ | ❌ | ❌ |
| RAG | ✅ | ❌ | ✅ | ✅ | ❌ |
| Realm | ✅ | ✅ | ✅ | ❌ | ❌ |
| REALM | ✅ | ✅ | ✅ | ❌ | ❌ |
| Reformer | ✅ | ✅ | ✅ | ❌ | ❌ |
| RegNet | ❌ | ❌ | ✅ | | ❌ |
| RegNet | ❌ | ❌ | ✅ | | ❌ |
| RemBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| ResNet | ❌ | ❌ | ✅ | | ❌ |
| ResNet | ❌ | ❌ | ✅ | | ❌ |
| RetriBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ |
| RoFormer | ✅ | ✅ | ✅ | ✅ | ✅ |
| SegFormer | ❌ | ❌ | ✅ | | ❌ |
| SegFormer | ❌ | ❌ | ✅ | | ❌ |
| SEW | ❌ | ❌ | ✅ | ❌ | ❌ |
| SEW-D | ❌ | ❌ | ✅ | ❌ | ❌ |
| Speech Encoder decoder | ❌ | ❌ | ✅ | ❌ | ✅ |
@ -250,29 +295,34 @@ Flax), PyTorch, and/or TensorFlow.
| Speech2Text2 | ✅ | ❌ | ❌ | ❌ | ❌ |
| Splinter | ✅ | ✅ | ✅ | ❌ | ❌ |
| SqueezeBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
| Swin | ❌ | ❌ | ✅ | | ❌ |
| Swin Transformer | ❌ | ❌ | ✅ | | ❌ |
| Swin Transformer V2 | ❌ | ❌ | ✅ | ❌ | ❌ |
| T5 | ✅ | ✅ | ✅ | ✅ | ✅ |
| TAPAS | ✅ | ❌ | ✅ | ✅ | ❌ |
| TAPEX | | | ✅ | | |
| Trajectory Transformer | | | ✅ | | |
| Transformer-XL | ✅ | ❌ | ✅ | ✅ | ❌ |
| TrOCR | ❌ | ❌ | ✅ | ❌ | ❌ |
| UniSpeech | ❌ | ❌ | ✅ | ❌ | ❌ |
| UniSpeechSat | ❌ | ❌ | ✅ | ❌ | ❌ |
| VAN | ❌ | ❌ | ✅ | ❌ | ❌ |
| VideoMAE | ❌ | ❌ | ✅ | ❌ | ❌ |
| ViLT | ❌ | ❌ | ✅ | ❌ | ❌ |
| Vision Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ |
| VisionTextDualEncoder | ❌ | ❌ | ✅ | ❌ | ✅ |
| VisualBert | ❌ | ❌ | ✅ | ❌ | ❌ |
| VisualBERT | ❌ | ❌ | ✅ | ❌ | ❌ |
| ViT | ❌ | ❌ | ✅ | ✅ | ✅ |
| ViTMAE | ❌ | ❌ | ✅ | ✅ | ❌ |
| Wav2Vec2 | ✅ | ❌ | ✅ | ✅ | ✅ |
| Wav2Vec2-Conformer | ❌ | ❌ | ✅ | ❌ | ❌ |
| WavLM | ❌ | ❌ | ✅ | ❌ | ❌ |
| XGLM | | | ✅ | ❌ | |
| X-CLIP | | | ✅ | ❌ | |
| XGLM | ✅ | ✅ | ✅ | ✅ | ✅ |
| XLM | ✅ | ❌ | ✅ | ✅ | ❌ |
| XLM-ProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ |
| XLM-RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ |
| XLM-RoBERTa-XL | ❌ | ❌ | ✅ | ❌ | ❌ |
| XLMProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ |
| XLNet | ✅ | ✅ | ✅ | ✅ | ❌ |
| YOLOS | ❌ | ❌ | ✅ | ❌ | ❌ |
| YOSO | ❌ | ❌ | ✅ | ❌ | ❌ |
<!-- End table-->

View File

@ -34,11 +34,16 @@ Start by creating a virtual environment in your project directory:
python -m venv .env
```
Activate the virtual environment:
Activate the virtual environment. On Linux and MacOs:
```bash
source .env/bin/activate
```
Activate Virtual environment on Windows
```bash
.env/Scripts/activate
```
Now you're ready to install 🤗 Transformers with the following command:
@ -134,11 +139,11 @@ conda install -c huggingface transformers
## Cache setup
Pretrained models are downloaded and locally cached at: `~/.cache/huggingface/transformers/`. This is the default directory given by the shell environment variable `TRANSFORMERS_CACHE`. On Windows, the default directory is given by `C:\Users\username\.cache\huggingface\transformers`. You can change the shell environment variables shown below - in order of priority - to specify a different cache directory:
Pretrained models are downloaded and locally cached at: `~/.cache/huggingface/hub`. This is the default directory given by the shell environment variable `TRANSFORMERS_CACHE`. On Windows, the default directory is given by `C:\Users\username\.cache\huggingface\hub`. You can change the shell environment variables shown below - in order of priority - to specify a different cache directory:
1. Shell environment variable (default): `TRANSFORMERS_CACHE`.
2. Shell environment variable: `HF_HOME` + `transformers/`.
3. Shell environment variable: `XDG_CACHE_HOME` + `/huggingface/transformers`.
1. Shell environment variable (default): `HUGGINGFACE_HUB_CACHE` or `TRANSFORMERS_CACHE`.
2. Shell environment variable: `HF_HOME`.
3. Shell environment variable: `XDG_CACHE_HOME` + `/huggingface`.
<Tip>

View File

@ -127,6 +127,9 @@ generation.
[[autodoc]] TopKLogitsWarper
- __call__
[[autodoc]] TypicalLogitsWarper
- __call__
[[autodoc]] NoRepeatNGramLogitsProcessor
- __call__

View File

@ -22,6 +22,8 @@ Most of those are only useful if you are studying the code of the Trainer in the
[[autodoc]] IntervalStrategy
[[autodoc]] enable_full_determinism
[[autodoc]] set_seed
[[autodoc]] torch_distributed_zero_first

View File

@ -32,6 +32,7 @@ By default a [`Trainer`] will use the following callbacks:
- [`~integrations.WandbCallback`] if [wandb](https://www.wandb.com/) is installed.
- [`~integrations.CometCallback`] if [comet_ml](https://www.comet.ml/site/) is installed.
- [`~integrations.MLflowCallback`] if [mlflow](https://www.mlflow.org/) is installed.
- [`~integrations.NeptuneCallback`] if [neptune](https://neptune.ai/) is installed.
- [`~integrations.AzureMLCallback`] if [azureml-sdk](https://pypi.org/project/azureml-sdk/) is
installed.
- [`~integrations.CodeCarbonCallback`] if [codecarbon](https://pypi.org/project/codecarbon/) is
@ -70,6 +71,8 @@ Here is the list of the available [`TrainerCallback`] in the library:
[[autodoc]] integrations.CodeCarbonCallback
[[autodoc]] integrations.NeptuneCallback
## TrainerCallback
[[autodoc]] TrainerCallback

View File

@ -37,7 +37,7 @@ won't be possible on a single GPU.
2. If you don't use [`Trainer`] and want to use your own Trainer where you integrated DeepSpeed
yourself, core functionality functions like `from_pretrained` and `from_config` include integration of essential
parts of DeepSpeed like `zero.Init` for ZeRO stage 3 and higher. To tap into this feature read the docs on
[deepspeed-non-trainer-integration](#deepspeed-non-trainer-integration).
[non-Trainer DeepSpeed Integration](#nontrainer-deepspeed-integration).
What is integrated:
@ -1849,7 +1849,6 @@ In this case you usually need to raise the value of `initial_scale_power`. Setti
<a id='deepspeed-non-trainer-integration'></a>
## Non-Trainer Deepspeed Integration

View File

@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# Feature Extractor
A feature extractor is in charge of preparing input features for a multi-modal model. This includes feature extraction
A feature extractor is in charge of preparing input features for audio or vision models. This includes feature extraction
from sequences, *e.g.*, pre-processing audio files to Log-Mel Spectrogram features, feature extraction from images
*e.g.* cropping image image files, but also padding, normalization, and conversion to Numpy, PyTorch, and TensorFlow
tensors.

View File

@ -40,29 +40,17 @@ Additionally, some `warnings` can be disabled by setting the environment variabl
TRANSFORMERS_NO_ADVISORY_WARNINGS=1 ./myprogram.py
```
Here is an example of how to use `logging` in a module:
Here is an example of how to use the same logger as the library in your own module or script:
```python
from transformers.utils import logging
logging.set_verbosity_info()
logger = logging.get_logger(__name__)
logger = logging.get_logger("transformers")
logger.info("INFO")
logger.warning("WARN")
```
Above, a `logger` instance is created from `logging.get_logger(__name__)`. If you want to use `logging` in a script, you shouldn't pass `__name__` to `logging.get_logger`. For example:
```python
from transformers.utils import logging
if __name__ == "__main__":
logging.set_verbosity_info()
# leave it empy or use a string
logger = logging.get_logger()
logger.info("INFO")
logger.warning("WARN")
```
All the methods of this logging module are documented below, the main ones are
[`logging.get_verbosity`] to get the current level of verbosity in the logger and

View File

@ -38,6 +38,75 @@ for text generation, [`~generation_utils.GenerationMixin`] (for the PyTorch mode
<a id='from_pretrained-torch-dtype'></a>
### Large model loading
In Transformers 4.20.0, the [`~PreTrainedModel.from_pretrained`] method has been reworked to accommodate large models using [Accelerate](https://huggingface.co/docs/accelerate/big_modeling). This requires Accelerate >= 0.9.0 and PyTorch >= 1.9.0. Instead of creating the full model, then loading the pretrained weights inside it (which takes twice the size of the model in RAM, one for the randomly initialized model, one for the weights), there is an option to create the model as an empty shell, then only materialize its parameters when the pretrained weights are loaded.
This option can be activated with `low_cpu_mem_usage=True`. The model is first created on the Meta device (with empty weights) and the state dict is then loaded inside it (shard by shard in the case of a sharded checkpoint). This way the maximum RAM used is the full size of the model only.
```py
from transformers import AutoModelForSeq2SeqLM
t0pp = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0pp", low_cpu_mem_usage=True)
```
Moreover, you can directly place the model on different devices if it doesn't fully fit in RAM (only works for inference for now). With `device_map="auto"`, Accelerate will determine where to put each layer to maximize the use of your fastest devices (GPUs) and offload the rest on the CPU, or even the hard drive if you don't have enough GPU RAM (or CPU RAM). Even if the model is split across several devices, it will run as you would normally expect.
When passing a `device_map`, `low_cpu_mem_usage` is automatically set to `True`, so you don't need to specify it:
```py
from transformers import AutoModelForSeq2SeqLM
t0pp = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0pp", device_map="auto")
```
You can inspect how the model was split across devices by looking at its `hf_device_map` attribute:
```py
t0pp.hf_device_map
```
```python out
{'shared': 0,
'decoder.embed_tokens': 0,
'encoder': 0,
'decoder.block.0': 0,
'decoder.block.1': 1,
'decoder.block.2': 1,
'decoder.block.3': 1,
'decoder.block.4': 1,
'decoder.block.5': 1,
'decoder.block.6': 1,
'decoder.block.7': 1,
'decoder.block.8': 1,
'decoder.block.9': 1,
'decoder.block.10': 1,
'decoder.block.11': 1,
'decoder.block.12': 1,
'decoder.block.13': 1,
'decoder.block.14': 1,
'decoder.block.15': 1,
'decoder.block.16': 1,
'decoder.block.17': 1,
'decoder.block.18': 1,
'decoder.block.19': 1,
'decoder.block.20': 1,
'decoder.block.21': 1,
'decoder.block.22': 'cpu',
'decoder.block.23': 'cpu',
'decoder.final_layer_norm': 'cpu',
'decoder.dropout': 'cpu',
'lm_head': 'cpu'}
```
You can also write your own device map following the same format (a dictionary layer name to device). It should map all parameters of the model to a given device, but you don't have to detail where all the submosules of one layer go if that layer is entirely on the same device. For instance, the following device map would work properly for T0pp (as long as you have the GPU memory):
```python
device_map = {"shared": 0, "encoder": 0, "decoder": 1, "lm_head": 1}
```
Another way to minimize the memory impact of your model is to instantiate it at a lower precision dtype (like `torch.float16`) or use direct quantization techniques as described below.
### Model Instantiation dtype
Under Pytorch a model normally gets instantiated with `torch.float32` format. This can be an issue if one tries to
@ -65,7 +134,6 @@ model = AutoModel.from_config(config)
Due to Pytorch design, this functionality is only available for floating dtypes.
## ModuleUtilsMixin
[[autodoc]] modeling_utils.ModuleUtilsMixin
@ -89,3 +157,7 @@ Due to Pytorch design, this functionality is only available for floating dtypes.
## Pushing to the Hub
[[autodoc]] utils.PushToHubMixin
## Sharded checkpoints
[[autodoc]] modeling_utils.load_sharded_checkpoint

View File

@ -16,7 +16,7 @@ All models have outputs that are instances of subclasses of [`~utils.ModelOutput
data structures containing all the information returned by the model, but that can also be used as tuples or
dictionaries.
Let's see of this looks on an example:
Let's see how this looks in an example:
```python
from transformers import BertTokenizer, BertForSequenceClassification
@ -136,6 +136,30 @@ documented on their corresponding model page.
[[autodoc]] modeling_outputs.Seq2SeqQuestionAnsweringModelOutput
## SemanticSegmenterOutput
[[autodoc]] modeling_outputs.SemanticSegmenterOutput
## ImageClassifierOutput
[[autodoc]] modeling_outputs.ImageClassifierOutput
## ImageClassifierOutputWithNoAttention
[[autodoc]] modeling_outputs.ImageClassifierOutputWithNoAttention
## DepthEstimatorOutput
[[autodoc]] modeling_outputs.DepthEstimatorOutput
## Wav2Vec2BaseModelOutput
[[autodoc]] modeling_outputs.Wav2Vec2BaseModelOutput
## XVectorOutput
[[autodoc]] modeling_outputs.XVectorOutput
## TFBaseModelOutput
[[autodoc]] modeling_tf_outputs.TFBaseModelOutput

View File

@ -25,10 +25,12 @@ There are two categories of pipeline abstractions to be aware about:
- [`AudioClassificationPipeline`]
- [`AutomaticSpeechRecognitionPipeline`]
- [`ConversationalPipeline`]
- [`DocumentQuestionAnsweringPipeline`]
- [`FeatureExtractionPipeline`]
- [`FillMaskPipeline`]
- [`ImageClassificationPipeline`]
- [`ImageSegmentationPipeline`]
- [`ImageToTextPipeline`]
- [`ObjectDetectionPipeline`]
- [`QuestionAnsweringPipeline`]
- [`SummarizationPipeline`]
@ -38,6 +40,7 @@ There are two categories of pipeline abstractions to be aware about:
- [`Text2TextGenerationPipeline`]
- [`TokenClassificationPipeline`]
- [`TranslationPipeline`]
- [`VisualQuestionAnsweringPipeline`]
- [`ZeroShotClassificationPipeline`]
- [`ZeroShotImageClassificationPipeline`]
@ -340,6 +343,12 @@ That should enable you to do all the custom code you want.
- __call__
- all
### DocumentQuestionAnsweringPipeline
[[autodoc]] DocumentQuestionAnsweringPipeline
- __call__
- all
### FeatureExtractionPipeline
[[autodoc]] FeatureExtractionPipeline
@ -364,6 +373,12 @@ That should enable you to do all the custom code you want.
- __call__
- all
### ImageToTextPipeline
[[autodoc]] ImageToTextPipeline
- __call__
- all
### NerPipeline
[[autodoc]] NerPipeline
@ -423,6 +438,12 @@ See [`TokenClassificationPipeline`] for all details.
- __call__
- all
### VisualQuestionAnsweringPipeline
[[autodoc]] VisualQuestionAnsweringPipeline
- __call__
- all
### ZeroShotClassificationPipeline
[[autodoc]] ZeroShotClassificationPipeline

View File

@ -18,9 +18,7 @@ Rust library [🤗 Tokenizers](https://github.com/huggingface/tokenizers). The "
1. a significant speed-up in particular when doing batched tokenization and
2. additional methods to map between the original string (character and words) and the token space (e.g. getting the
index of the token comprising a given character or the span of characters corresponding to a given token). Currently
no "Fast" implementation is available for the SentencePiece-based tokenizers (for T5, ALBERT, CamemBERT, XLM-RoBERTa
and XLNet models).
index of the token comprising a given character or the span of characters corresponding to a given token).
The base classes [`PreTrainedTokenizer`] and [`PreTrainedTokenizerFast`]
implement the common methods for encoding string inputs in model inputs (see below) and instantiating/saving python and

View File

@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
# Trainer
The [`Trainer`] class provides an API for feature-complete training in PyTorch for most standard use cases. It's used in most of the [example scripts](../examples).
The [`Trainer`] class provides an API for feature-complete training in PyTorch for most standard use cases. It's used in most of the [example scripts](https://github.com/huggingface/transformers/tree/main/examples).
Before instantiating your [`Trainer`], create a [`TrainingArguments`] to access all the points of customization during training.
@ -291,10 +291,10 @@ Also if you do set this environment variable it's the best to set it in your `~/
The [`Trainer`] has been extended to support libraries that may dramatically improve your training
time and fit much bigger models.
Currently it supports third party solutions, [DeepSpeed](https://github.com/microsoft/DeepSpeed) and [FairScale](https://github.com/facebookresearch/fairscale/), which implement parts of the paper [ZeRO: Memory Optimizations
Currently it supports third party solutions, [DeepSpeed](https://github.com/microsoft/DeepSpeed), [PyTorch FSDP](https://pytorch.org/docs/stable/fsdp.html) and [FairScale](https://github.com/facebookresearch/fairscale/), which implement parts of the paper [ZeRO: Memory Optimizations
Toward Training Trillion Parameter Models, by Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He](https://arxiv.org/abs/1910.02054).
This provided support is new and experimental as of this writing.
This provided support is new and experimental as of this writing. While the support for DeepSpeed and PyTorch FSDP is active and we welcome issues around it, we don't support the FairScale integration anymore since it has been integrated in PyTorch main (see the [PyTorch FSDP integration](#pytorch-fully-sharded-data-parallel))
<a id='zero-install-notes'></a>
@ -408,6 +408,12 @@ As always make sure to edit the paths in the example to match your situation.
### FairScale
<Tip warning={true}>
This integration is not supported anymore, we recommend you either use DeepSpeed or PyTorch FSDP.
</Tip>
By integrating [FairScale](https://github.com/facebookresearch/fairscale/) the [`Trainer`]
provides support for the following features from [the ZeRO paper](https://arxiv.org/abs/1910.02054):
@ -540,6 +546,110 @@ Known caveats:
`FullyShardedDataParallelism` of fairscale. It should be used with the option `auto_wrap` if you are not
doing this yourself: `--sharded_ddp "zero_dp_3 auto_wrap"`.
### PyTorch Fully Sharded Data parallel
To accelerate training huge models on larger batch sizes, we can use a fully sharded data parallel model.
This type of data parallel paradigm enables fitting more data and larger models by sharding the optimizer states, gradients and parameters.
To read more about it and the benefits, check out the [Fully Sharded Data Parallel blog](https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/).
We have integrated the latest PyTorch's Fully Sharded Data Parallel (FSDP) training feature.
All you need to do is enable it through the config.
**Required PyTorch version for FSDP support**: PyTorch Nightly (or 1.12.0 if you read this after it has been released)
as the model saving with FSDP activated is only available with recent fixes.
**Usage**:
- Make sure you have added the distributed launcher
`-m torch.distributed.launch --nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE` if you haven't been using it already.
- **Sharding Strategy**:
- FULL_SHARD : Shards optimizer states + gradients + model parameters across data parallel workers/GPUs.
For this, add `--fsdp full_shard` to the command line arguments.
- SHARD_GRAD_OP : Shards optimizer states + gradients across data parallel workers/GPUs.
For this, add `--fsdp shard_grad_op` to the command line arguments.
- NO_SHARD : No sharding. For this, add `--fsdp no_shard` to the command line arguments.
- To offload the parameters and gradients to the CPU,
add `--fsdp "full_shard offload"` or `--fsdp "shard_grad_op offload"` to the command line arguments.
- To automatically recursively wrap layers with FSDP using `default_auto_wrap_policy`,
add `--fsdp "full_shard auto_wrap"` or `--fsdp "shard_grad_op auto_wrap"` to the command line arguments.
- To enable both CPU offloading and auto wrapping,
add `--fsdp "full_shard offload auto_wrap"` or `--fsdp "shard_grad_op offload auto_wrap"` to the command line arguments.
- If auto wrapping is enabled, you can either use transformer based auto wrap policy or size based auto wrap policy.
- For transformer based auto wrap policy, please add `--fsdp_transformer_layer_cls_to_wrap <value>` to command line arguments.
This specifies the transformer layer class name (case-sensitive) to wrap ,e.g, `BertLayer`, `GPTJBlock`, `T5Block` ....
This is important because submodules that share weights (e.g., embedding layer) should not end up in different FSDP wrapped units.
Using this policy, wrapping happens for each block containing Multi-Head Attention followed by couple of MLP layers.
Remaining layers including the shared embeddings are conviniently wrapped in same outermost FSDP unit.
Therefore, use this for transformer based models.
- For size based auto wrap policy, please add `--fsdp_min_num_params <number>` to command line arguments.
It specifies FSDP's minimum number of parameters for auto wrapping.
**Few caveats to be aware of**
- Mixed precision is currently not supported with FSDP as we wait for PyTorch to fix support for it.
More details in this [issues](https://github.com/pytorch/pytorch/issues/75676).
- FSDP currently doesn't support multiple parameter groups.
More details mentioned in this [issue](https://github.com/pytorch/pytorch/issues/76501)
(`The original model parameters' .grads are not set, meaning that they cannot be optimized separately (which is why we cannot support multiple parameter groups)`).
### Using Trainer for accelerated PyTorch Training on Mac
With PyTorch v1.12 release, developers and researchers can take advantage of Apple silicon GPUs for significantly faster model training.
This unlocks the ability to perform machine learning workflows like prototyping and fine-tuning locally, right on Mac.
Apple's Metal Performance Shaders (MPS) as a backend for PyTorch enables this and can be used via the new `"mps"` device.
This will map computational graphs and primitives on the MPS Graph framework and tuned kernels provided by MPS.
For more information please refer official documents [Introducing Accelerated PyTorch Training on Mac](https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/)
and [MPS BACKEND](https://pytorch.org/docs/stable/notes/mps.html).
<Tip warning={false}>
We strongly recommend to install PyTorch >= 1.13 (nightly version at the time of writing) on your MacOS machine.
It has major fixes related to model correctness and performance improvements for transformer based models.
Please refer to https://github.com/pytorch/pytorch/issues/82707 for more details.
</Tip>
**Benefits of Training and Inference using Apple Silicon Chips**
1. Enables users to train larger networks or batch sizes locally
2. Reduces data retrieval latency and provides the GPU with direct access to the full memory store due to unified memory architecture.
Therefore, improving end-to-end performance.
3. Reduces costs associated with cloud-based development or the need for additional local GPUs.
**Pre-requisites**: To install torch with mps support,
please follow this nice medium article [GPU-Acceleration Comes to PyTorch on M1 Macs](https://medium.com/towards-data-science/gpu-acceleration-comes-to-pytorch-on-m1-macs-195c399efcc1).
**Usage**:
User has to just pass `--use_mps_device` argument.
For example, you can run the offical Glue text classififcation task (from the root folder) using Apple Silicon GPU with below command:
```bash
export TASK_NAME=mrpc
python examples/pytorch/text-classification/run_glue.py \
--model_name_or_path bert-base-cased \
--task_name $TASK_NAME \
--do_train \
--do_eval \
--max_seq_length 128 \
--per_device_train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3 \
--output_dir /tmp/$TASK_NAME/ \
--use_mps_device \
--overwrite_output_dir
```
**A few caveats to be aware of**
1. Some PyTorch operations have not been implemented in mps and will throw an error.
One way to get around that is to set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1`,
which will fallback to CPU for these operations. It still throws a UserWarning however.
2. Distributed setups `gloo` and `nccl` are not working with `mps` device.
This means that currently only single GPU of `mps` device type can be used.
Finally, please, remember that, 🤗 `Trainer` only integrates MPS backend, therefore if you
have any problems or questions with regards to MPS backend usage, please,
file an issue with [PyTorch GitHub](https://github.com/pytorch/pytorch/issues).
Sections that were moved:

View File

@ -114,14 +114,26 @@ Likewise, if your `NewModel` is a subclass of [`PreTrainedModel`], make sure its
[[autodoc]] AutoModelForTableQuestionAnswering
## AutoModelForDocumentQuestionAnswering
[[autodoc]] AutoModelForDocumentQuestionAnswering
## AutoModelForImageClassification
[[autodoc]] AutoModelForImageClassification
## AutoModelForVideoClassification
[[autodoc]] AutoModelForVideoClassification
## AutoModelForVision2Seq
[[autodoc]] AutoModelForVision2Seq
## AutoModelForVisualQuestionAnswering
[[autodoc]] AutoModelForVisualQuestionAnswering
## AutoModelForAudioClassification
[[autodoc]] AutoModelForAudioClassification
@ -178,6 +190,10 @@ Likewise, if your `NewModel` is a subclass of [`PreTrainedModel`], make sure its
[[autodoc]] TFAutoModelForImageClassification
## TFAutoModelForSemanticSegmentation
[[autodoc]] TFAutoModelForSemanticSegmentation
## TFAutoModelForMaskedLM
[[autodoc]] TFAutoModelForMaskedLM
@ -194,10 +210,18 @@ Likewise, if your `NewModel` is a subclass of [`PreTrainedModel`], make sure its
[[autodoc]] TFAutoModelForMultipleChoice
## TFAutoModelForNextSentencePrediction
[[autodoc]] TFAutoModelForNextSentencePrediction
## TFAutoModelForTableQuestionAnswering
[[autodoc]] TFAutoModelForTableQuestionAnswering
## TFAutoModelForDocumentQuestionAnswering
[[autodoc]] TFAutoModelForDocumentQuestionAnswering
## TFAutoModelForTokenClassification
[[autodoc]] TFAutoModelForTokenClassification

View File

@ -49,7 +49,7 @@ Usage:
>>> input_ids = tokenizer(
... "This is a long article to summarize", add_special_tokens=False, return_tensors="pt"
>>> ).input_ids
... ).input_ids
>>> labels = tokenizer("This is a short summary", return_tensors="pt").input_ids
>>> # train...
@ -67,7 +67,7 @@ Usage:
>>> input_ids = tokenizer(
... "This is the first sentence. This is the second sentence.", add_special_tokens=False, return_tensors="pt"
>>> ).input_ids
... ).input_ids
>>> outputs = sentence_fuser.generate(input_ids)

View File

@ -58,6 +58,10 @@ This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The o
[[autodoc]] BertTokenizerFast
## TFBertTokenizer
[[autodoc]] TFBertTokenizer
## Bert specific outputs
[[autodoc]] models.bert.modeling_bert.BertForPreTrainingOutput
@ -166,6 +170,11 @@ This model was contributed by [thomwolf](https://huggingface.co/thomwolf). The o
[[autodoc]] FlaxBertForPreTraining
- __call__
## FlaxBertForCausalLM
[[autodoc]] FlaxBertForCausalLM
- __call__
## FlaxBertForMaskedLM
[[autodoc]] FlaxBertForMaskedLM

View File

@ -120,6 +120,11 @@ This model was contributed by [vasudevgupta](https://huggingface.co/vasudevgupta
[[autodoc]] FlaxBigBirdForPreTraining
- __call__
## FlaxBigBirdForCausalLM
[[autodoc]] FlaxBigBirdForCausalLM
- __call__
## FlaxBigBirdForMaskedLM
[[autodoc]] FlaxBigBirdForMaskedLM

View File

@ -0,0 +1,57 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# BLOOM
## Overview
The BLOOM model has been proposed with its various versions through the [BigScience Workshop](https://bigscience.huggingface.co/). BigScience is inspired by other open science initiatives where researchers have pooled their time and resources to collectively achieve a higher impact.
The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on 46 different languages and 13 programming languages.
Several smaller versions of the models have been trained on the same dataset. BLOOM is available in the following versions:
- [bloom-560m](https://huggingface.co/bigscience/bloom-560m)
- [bloom-1b1](https://huggingface.co/bigscience/bloom-1b1)
- [bloom-1b7](https://huggingface.co/bigscience/bloom-1b7)
- [bloom-3b](https://huggingface.co/bigscience/bloom-3b)
- [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1)
- [bloom](https://huggingface.co/bigscience/bloom) (176B parameters)
## BloomConfig
[[autodoc]] BloomConfig
- all
## BloomModel
[[autodoc]] BloomModel
- forward
## BloomTokenizerFast
[[autodoc]] BloomTokenizerFast
- all
## BloomForCausalLM
[[autodoc]] BloomForCausalLM
- forward
## BloomForSequenceClassification
[[autodoc]] BloomForSequenceClassification
- forward
## BloomForTokenClassification
[[autodoc]] BloomForTokenClassification
- forward

View File

@ -0,0 +1,81 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# CodeGen
## Overview
The CodeGen model was proposed in [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong.
CodeGen is an autoregressive language model for program synthesis trained sequentially on [The Pile](https://pile.eleuther.ai/), BigQuery, and BigPython.
The abstract from the paper is the following:
*Program synthesis strives to generate a computer program as a solution to a given problem specification. We propose a conversational program synthesis approach via large language models, which addresses the challenges of searching over a vast program space and user intent specification faced in prior approaches. Our new approach casts the process of writing a specification and program as a multi-turn conversation between a user and a system. It treats program synthesis as a sequence prediction problem, in which the specification is expressed in natural language and the desired program is conditionally sampled. We train a family of large language models, called CodeGen, on natural language and programming language data. With weak supervision in the data and the scaling up of data size and model size, conversational capacities emerge from the simple autoregressive language modeling. To study the model behavior on conversational program synthesis, we develop a multi-turn programming benchmark (MTPB), where solving each problem requires multi-step synthesis via multi-turn conversation between the user and the model. Our findings show the emergence of conversational capabilities and the effectiveness of the proposed conversational program synthesis paradigm. In addition, our model CodeGen (with up to 16B parameters trained on TPU-v4) outperforms OpenAI's Codex on the HumanEval benchmark. We make the training library JaxFormer including checkpoints available as open source contribution: [this https URL](https://github.com/salesforce/codegen).*
This model was contributed by [Hiroaki Hayashi](https://huggingface.co/rooa).
The original code can be found [here](https://github.com/salesforce/codegen).
## Checkpoint Naming
* CodeGen model [checkpoints](https://huggingface.co/models?other=codegen) are available on different pre-training data with variable sizes.
* The format is: `Salesforce/codegen-{size}-{data}`, where
* `size`: `350M`, `2B`, `6B`, `16B`
* `data`:
* `nl`: Pre-trained on the Pile
* `multi`: Initialized with `nl`, then further pre-trained on multiple programming languages data
* `mono`: Initialized with `multi`, then further pre-trained on Python data
* For example, `Salesforce/codegen-350M-mono` offers a 350 million-parameter checkpoint pre-trained sequentially on the Pile, multiple programming languages, and Python.
## How to use
```python
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> checkpoint = "Salesforce/codegen-350M-mono"
>>> model = AutoModelForCausalLM.from_pretrained(checkpoint)
>>> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
>>> text = "def hello_world():"
>>> completion = model.generate(**tokenizer(text, return_tensors="pt"))
>>> print(tokenizer.decode(completion[0]))
def hello_world():
print("Hello World")
hello_world()
```
## CodeGenConfig
[[autodoc]] CodeGenConfig
- all
## CodeGenTokenizer
[[autodoc]] CodeGenTokenizer
- save_vocabulary
## CodeGenTokenizerFast
[[autodoc]] CodeGenTokenizerFast
## CodeGenModel
[[autodoc]] CodeGenModel
- forward
## CodeGenForCausalLM
[[autodoc]] CodeGenForCausalLM
- forward

View File

@ -38,3 +38,7 @@ Note: We only have a tokenizer here, since the model architecture is the same as
## CpmTokenizer
[[autodoc]] CpmTokenizer
## CpmTokenizerFast
[[autodoc]] CpmTokenizerFast

View File

@ -0,0 +1,53 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Convolutional Vision Transformer (CvT)
## Overview
The CvT model was proposed in [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan and Lei Zhang. The Convolutional vision Transformer (CvT) improves the [Vision Transformer (ViT)](vit) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.
The abstract from the paper is the following:
*We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT)
in performance and efficiency by introducing convolutions into ViT to yield the best of both designs. This is accomplished through
two primary modifications: a hierarchy of Transformers containing a new convolutional token embedding, and a convolutional Transformer
block leveraging a convolutional projection. These changes introduce desirable properties of convolutional neural networks (CNNs)
to the ViT architecture (\ie shift, scale, and distortion invariance) while maintaining the merits of Transformers (\ie dynamic attention,
global context, and better generalization). We validate CvT by conducting extensive experiments, showing that this approach achieves
state-of-the-art performance over other Vision Transformers and ResNets on ImageNet-1k, with fewer parameters and lower FLOPs. In addition,
performance gains are maintained when pretrained on larger datasets (\eg ImageNet-22k) and fine-tuned to downstream tasks. Pre-trained on
ImageNet-22k, our CvT-W24 obtains a top-1 accuracy of 87.7\% on the ImageNet-1k val set. Finally, our results show that the positional encoding,
a crucial component in existing Vision Transformers, can be safely removed in our model, simplifying the design for higher resolution vision tasks.*
Tips:
- CvT models are regular Vision Transformers, but trained with convolutions. They outperform the [original model (ViT)](vit) when fine-tuned on ImageNet-1K and CIFAR-100.
- You can check out demo notebooks regarding inference as well as fine-tuning on custom data [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/VisionTransformer) (you can just replace [`ViTFeatureExtractor`] by [`AutoFeatureExtractor`] and [`ViTForImageClassification`] by [`CvtForImageClassification`]).
- The available checkpoints are either (1) pre-trained on [ImageNet-22k](http://www.image-net.org/) (a collection of 14 million images and 22k classes) only, (2) also fine-tuned on ImageNet-22k or (3) also fine-tuned on [ImageNet-1k](http://www.image-net.org/challenges/LSVRC/2012/) (also referred to as ILSVRC 2012, a collection of 1.3 million
images and 1,000 classes).
This model was contributed by [anugunj](https://huggingface.co/anugunj). The original code can be found [here](https://github.com/microsoft/CvT).
## CvtConfig
[[autodoc]] CvtConfig
## CvtModel
[[autodoc]] CvtModel
- forward
## CvtForImageClassification
[[autodoc]] CvtForImageClassification
- forward

View File

@ -33,11 +33,19 @@ Models and code are available at www.github.com/pytorch/fairseq/tree/master/exam
Tips:
- Both Data2VecAudio and Data2VecText have been trained using the same self-supervised learning method.
In the case of Data2VecAudio, preprocessing is identical to [`RobertaModel`], including tokenization.
- Data2VecAudio, Data2VecText, and Data2VecVision have all been trained using the same self-supervised learning method.
- For Data2VecAudio, preprocessing is identical to [`Wav2Vec2Model`], including feature extraction
- For Data2VecText, preprocessing is identical to [`RobertaModel`], including tokenization.
- For Data2VecVision, preprocessing is identical to [`BeitModel`], including feature extraction.
- To know how a pre-trained Data2Vec vision model can be fine-tuned on the task of image classification, you can check out
[this notebook](https://colab.research.google.com/github/sayakpaul/TF-2.0-Hacks/blob/master/data2vec_vision_image_classification.ipynb).
This model was contributed by [edugp](https://huggingface.co/edugp).
The original code can be found [here](https://github.com/pytorch/fairseq/tree/main/examples/data2vec).
This model was contributed by [edugp](https://huggingface.co/edugp) and [patrickvonplaten](https://huggingface.co/patrickvonplaten).
[sayakpaul](https://github.com/sayakpaul) and [Rocketknight1](https://github.com/Rocketknight1) contributed Data2Vec for vision in TensorFlow.
The original code (for NLP and Speech) can be found [here](https://github.com/pytorch/fairseq/tree/main/examples/data2vec).
The original code for vision can be found [here](https://github.com/facebookresearch/data2vec_vision/tree/main/beit).
## Data2VecTextConfig
@ -48,12 +56,16 @@ The original code can be found [here](https://github.com/pytorch/fairseq/tree/ma
[[autodoc]] Data2VecAudioConfig
## Data2VecVisionConfig
[[autodoc]] Data2VecVisionConfig
## Data2VecAudioModel
[[autodoc]] Data2VecAudioModel
- forward
## Data2VecAudioForAudioFrameClassification
[[autodoc]] Data2VecAudioForAudioFrameClassification
@ -108,3 +120,33 @@ The original code can be found [here](https://github.com/pytorch/fairseq/tree/ma
[[autodoc]] Data2VecTextForQuestionAnswering
- forward
## Data2VecVisionModel
[[autodoc]] Data2VecVisionModel
- forward
## Data2VecVisionForImageClassification
[[autodoc]] Data2VecVisionForImageClassification
- forward
## Data2VecVisionForSemanticSegmentation
[[autodoc]] Data2VecVisionForSemanticSegmentation
- forward
## TFData2VecVisionModel
[[autodoc]] TFData2VecVisionModel
- call
## TFData2VecVisionForImageClassification
[[autodoc]] TFData2VecVisionForImageClassification
- call
## TFData2VecVisionForSemanticSegmentation
[[autodoc]] TFData2VecVisionForSemanticSegmentation
- call

View File

@ -71,6 +71,12 @@ contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code
- create_token_type_ids_from_sequences
- save_vocabulary
## DebertaV2TokenizerFast
[[autodoc]] DebertaV2TokenizerFast
- build_inputs_with_special_tokens
- create_token_type_ids_from_sequences
## DebertaV2Model
[[autodoc]] DebertaV2Model
@ -101,6 +107,11 @@ contributed by [kamalkraj](https://huggingface.co/kamalkraj). The original code
[[autodoc]] DebertaV2ForQuestionAnswering
- forward
## DebertaV2ForMultipleChoice
[[autodoc]] DebertaV2ForMultipleChoice
- forward
## TFDebertaV2Model
[[autodoc]] TFDebertaV2Model

View File

@ -69,7 +69,7 @@ Tips:
*facebook/deit-base-patch16-384*. Note that one should use [`DeiTFeatureExtractor`] in order to
prepare images for the model.
This model was contributed by [nielsr](https://huggingface.co/nielsr).
This model was contributed by [nielsr](https://huggingface.co/nielsr). The TensorFlow version of this model was added by [amyeroberts](https://huggingface.co/amyeroberts).
## DeiTConfig
@ -100,3 +100,23 @@ This model was contributed by [nielsr](https://huggingface.co/nielsr).
[[autodoc]] DeiTForImageClassificationWithTeacher
- forward
## TFDeiTModel
[[autodoc]] TFDeiTModel
- call
## TFDeiTForMaskedImageModeling
[[autodoc]] TFDeiTForMaskedImageModeling
- call
## TFDeiTForImageClassification
[[autodoc]] TFDeiTForImageClassification
- call
## TFDeiTForImageClassificationWithTeacher
[[autodoc]] TFDeiTForImageClassificationWithTeacher
- call

View File

@ -113,6 +113,28 @@ Tips:
- The size of the images will determine the amount of memory being used, and will thus determine the `batch_size`.
It is advised to use a batch size of 2 per GPU. See [this Github thread](https://github.com/facebookresearch/detr/issues/150) for more info.
There are three ways to instantiate a DETR model (depending on what you prefer):
Option 1: Instantiate DETR with pre-trained weights for entire model
```py
>>> from transformers import DetrForObjectDetection
>>> model = DetrForObjectDetection.from_pretrained("facebook/resnet-50")
```
Option 2: Instantiate DETR with randomly initialized weights for Transformer, but pre-trained weights for backbone
```py
>>> from transformers import DetrConfig, DetrForObjectDetection
>>> config = DetrConfig()
>>> model = DetrForObjectDetection(config)
```
Option 3: Instantiate DETR with randomly initialized weights for backbone + Transformer
```py
>>> config = DetrConfig(use_pretrained_backbone=False)
>>> model = DetrForObjectDetection(config)
```
As a summary, consider the following table:
| Task | Object detection | Instance segmentation | Panoptic segmentation |
@ -166,4 +188,4 @@ mean Average Precision (mAP) and Panoptic Quality (PQ). The latter objects are i
## DetrForSegmentation
[[autodoc]] DetrForSegmentation
- forward
- forward

View File

@ -0,0 +1,214 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the
License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->
# Donut
## Overview
The Donut model was proposed in [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by
Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
Donut consists of an image Transformer encoder and an autoregressive text Transformer decoder to perform document understanding
tasks such as document image classification, form understanding and visual question answering.
The abstract from the paper is the following:
*Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and a holistic understanding of the document. Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs. Although such OCR-based approaches have shown promising performance, they suffer from 1) high computational costs for using OCR; 2) inflexibility of OCR models on languages or types of document; 3) OCR error propagation to the subsequent process. To address these issues, in this paper, we introduce a novel OCR-free VDU model named Donut, which stands for Document understanding transformer. As the first step in OCR-free VDU research, we propose a simple architecture (i.e., Transformer) with a pre-training objective (i.e., cross-entropy loss). Donut is conceptually simple yet effective. Through extensive experiments and analyses, we show a simple OCR-free VDU model, Donut, achieves state-of-the-art performances on various VDU tasks in terms of both speed and accuracy. In addition, we offer a synthetic data generator that helps the model pre-training to be flexible in various languages and domains.*
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/donut_architecture.jpg"
alt="drawing" width="600"/>
<small> Donut high-level overview. Taken from the <a href="https://arxiv.org/abs/2111.15664">original paper</a>. </small>
This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found
[here](https://github.com/clovaai/donut).
Tips:
- The quickest way to get started with Donut is by checking the [tutorial
notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/Donut), which show how to use the model
at inference time as well as fine-tuning on custom data.
- Donut is always used within the [VisionEncoderDecoder](vision-encoder-decoder) framework.
## Inference
Donut's [`VisionEncoderDecoder`] model accepts images as input and makes use of
[`~generation_utils.GenerationMixin.generate`] to autoregressively generate text given the input image.
The [`DonutFeatureExtractor`] class is responsible for preprocessing the input image and
[`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`] decodes the generated target tokens to the target string. The
[`DonutProcessor`] wraps [`DonutFeatureExtractor`] and [`XLMRobertaTokenizer`/`XLMRobertaTokenizerFast`]
into a single instance to both extract the input features and decode the predicted token ids.
- Step-by-step Document Image Classification
```py
>>> import re
>>> from transformers import DonutProcessor, VisionEncoderDecoderModel
>>> from datasets import load_dataset
>>> import torch
>>> processor = DonutProcessor.from_pretrained("naver-clova-ix/donut-base-finetuned-rvlcdip")
>>> model = VisionEncoderDecoderModel.from_pretrained("naver-clova-ix/donut-base-finetuned-rvlcdip")
>>> device = "cuda" if torch.cuda.is_available() else "cpu"
>>> model.to(device) # doctest: +IGNORE_RESULT
>>> # load document image
>>> dataset = load_dataset("hf-internal-testing/example-documents", split="test")
>>> image = dataset[1]["image"]
>>> # prepare decoder inputs
>>> task_prompt = "<s_rvlcdip>"
>>> decoder_input_ids = processor.tokenizer(task_prompt, add_special_tokens=False, return_tensors="pt").input_ids
>>> pixel_values = processor(image, return_tensors="pt").pixel_values
>>> outputs = model.generate(
... pixel_values.to(device),
... decoder_input_ids=decoder_input_ids.to(device),
... max_length=model.decoder.config.max_position_embeddings,
... early_stopping=True,
... pad_token_id=processor.tokenizer.pad_token_id,
... eos_token_id=processor.tokenizer.eos_token_id,
... use_cache=True,
... num_beams=1,
... bad_words_ids=[[processor.tokenizer.unk_token_id]],
... return_dict_in_generate=True,
... )
>>> sequence = processor.batch_decode(outputs.sequences)[0]
>>> sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
>>> sequence = re.sub(r"<.*?>", "", sequence, count=1).strip() # remove first task start token
>>> print(processor.token2json(sequence))
{'class': 'advertisement'}
```
- Step-by-step Document Parsing
```py
>>> import re
>>> from transformers import DonutProcessor, VisionEncoderDecoderModel
>>> from datasets import load_dataset
>>> import torch
>>> processor = DonutProcessor.from_pretrained("naver-clova-ix/donut-base-finetuned-cord-v2")
>>> model = VisionEncoderDecoderModel.from_pretrained("naver-clova-ix/donut-base-finetuned-cord-v2")
>>> device = "cuda" if torch.cuda.is_available() else "cpu"
>>> model.to(device) # doctest: +IGNORE_RESULT
>>> # load document image
>>> dataset = load_dataset("hf-internal-testing/example-documents", split="test")
>>> image = dataset[2]["image"]
>>> # prepare decoder inputs
>>> task_prompt = "<s_cord-v2>"
>>> decoder_input_ids = processor.tokenizer(task_prompt, add_special_tokens=False, return_tensors="pt").input_ids
>>> pixel_values = processor(image, return_tensors="pt").pixel_values
>>> outputs = model.generate(
... pixel_values.to(device),
... decoder_input_ids=decoder_input_ids.to(device),
... max_length=model.decoder.config.max_position_embeddings,
... early_stopping=True,
... pad_token_id=processor.tokenizer.pad_token_id,
... eos_token_id=processor.tokenizer.eos_token_id,
... use_cache=True,
... num_beams=1,
... bad_words_ids=[[processor.tokenizer.unk_token_id]],
... return_dict_in_generate=True,
... )
>>> sequence = processor.batch_decode(outputs.sequences)[0]
>>> sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
>>> sequence = re.sub(r"<.*?>", "", sequence, count=1).strip() # remove first task start token
>>> print(processor.token2json(sequence))
{'menu': {'nm': 'CINNAMON SUGAR', 'unitprice': '17,000', 'cnt': '1 x', 'price': '17,000'}, 'sub_total': {'subtotal_price': '17,000'}, 'total': {'total_price': '17,000', 'cashprice': '20,000', 'changeprice': '3,000'}}
```
- Step-by-step Document Visual Question Answering (DocVQA)
```py
>>> import re
>>> from transformers import DonutProcessor, VisionEncoderDecoderModel
>>> from datasets import load_dataset
>>> import torch
>>> processor = DonutProcessor.from_pretrained("naver-clova-ix/donut-base-finetuned-docvqa")
>>> model = VisionEncoderDecoderModel.from_pretrained("naver-clova-ix/donut-base-finetuned-docvqa")
>>> device = "cuda" if torch.cuda.is_available() else "cpu"
>>> model.to(device) # doctest: +IGNORE_RESULT
>>> # load document image from the DocVQA dataset
>>> dataset = load_dataset("hf-internal-testing/example-documents", split="test")
>>> image = dataset[0]["image"]
>>> # prepare decoder inputs
>>> task_prompt = "<s_docvqa><s_question>{user_input}</s_question><s_answer>"
>>> question = "When is the coffee break?"
>>> prompt = task_prompt.replace("{user_input}", question)
>>> decoder_input_ids = processor.tokenizer(prompt, add_special_tokens=False, return_tensors="pt").input_ids
>>> pixel_values = processor(image, return_tensors="pt").pixel_values
>>> outputs = model.generate(
... pixel_values.to(device),
... decoder_input_ids=decoder_input_ids.to(device),
... max_length=model.decoder.config.max_position_embeddings,
... early_stopping=True,
... pad_token_id=processor.tokenizer.pad_token_id,
... eos_token_id=processor.tokenizer.eos_token_id,
... use_cache=True,
... num_beams=1,
... bad_words_ids=[[processor.tokenizer.unk_token_id]],
... return_dict_in_generate=True,
... )
>>> sequence = processor.batch_decode(outputs.sequences)[0]
>>> sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
>>> sequence = re.sub(r"<.*?>", "", sequence, count=1).strip() # remove first task start token
>>> print(processor.token2json(sequence))
{'question': 'When is the coffee break?', 'answer': '11-14 to 11:39 a.m.'}
```
See the [model hub](https://huggingface.co/models?filter=donut) to look for Donut checkpoints.
## Training
We refer to the [tutorial notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/Donut).
## DonutSwinConfig
[[autodoc]] DonutSwinConfig
## DonutFeatureExtractor
[[autodoc]] DonutFeatureExtractor
- __call__
## DonutProcessor
[[autodoc]] DonutProcessor
- __call__
- from_pretrained
- save_pretrained
- batch_decode
- decode
## DonutSwinModel
[[autodoc]] DonutSwinModel
- forward

View File

@ -158,6 +158,11 @@ This model was contributed by [lysandre](https://huggingface.co/lysandre). The o
[[autodoc]] FlaxElectraForPreTraining
- __call__
## FlaxElectraForCausalLM
[[autodoc]] FlaxElectraForCausalLM
- __call__
## FlaxElectraForMaskedLM
[[autodoc]] FlaxElectraForMaskedLM

View File

@ -12,6 +12,8 @@ specific language governing permissions and limitations under the License.
# Encoder Decoder Models
## Overview
The [`EncoderDecoderModel`] can be used to initialize a sequence-to-sequence model with any
pretrained autoencoding model as the encoder and any pretrained autoregressive model as the decoder.
@ -25,15 +27,77 @@ any other models (see the examples for more information).
An application of this architecture could be to leverage two pretrained [`BertModel`] as the encoder
and decoder for a summarization model as was shown in: [Text Summarization with Pretrained Encoders](https://arxiv.org/abs/1908.08345) by Yang Liu and Mirella Lapata.
The [`~TFEncoderDecoderModel.from_pretrained`] currently doesn't support initializing the model from a
## Randomly initializing `EncoderDecoderModel` from model configurations.
[`EncoderDecoderModel`] can be randomly initialized from an encoder and a decoder config. In the following example, we show how to do this using the default [`BertModel`] configuration for the encoder and the default [`BertForCausalLM`] configuration for the decoder.
```python
>>> from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel
>>> config_encoder = BertConfig()
>>> config_decoder = BertConfig()
>>> config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)
>>> model = EncoderDecoderModel(config=config)
```
## Initialising `EncoderDecoderModel` from a pretrained encoder and a pretrained decoder.
[`EncoderDecoderModel`] can be initialized from a pretrained encoder checkpoint and a pretrained decoder checkpoint. Note that any pretrained auto-encoding model, *e.g.* BERT, can serve as the encoder and both pretrained auto-encoding models, *e.g.* BERT, pretrained causal language models, *e.g.* GPT2, as well as the pretrained decoder part of sequence-to-sequence models, *e.g.* decoder of BART, can be used as the decoder.
Depending on which architecture you choose as the decoder, the cross-attention layers might be randomly initialized.
Initializing [`EncoderDecoderModel`] from a pretrained encoder and decoder checkpoint requires the model to be fine-tuned on a downstream task, as has been shown in [the *Warm-starting-encoder-decoder blog post*](https://huggingface.co/blog/warm-starting-encoder-decoder).
To do so, the `EncoderDecoderModel` class provides a [`EncoderDecoderModel.from_encoder_decoder_pretrained`] method.
```python
>>> from transformers import EncoderDecoderModel, BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
>>> model = EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "bert-base-uncased")
```
## Loading an existing `EncoderDecoderModel` checkpoint and perform inference.
To load fine-tuned checkpoints of the `EncoderDecoderModel` class, [`EncoderDecoderModel`] provides the `from_pretrained(...)` method just like any other model architecture in Transformers.
To perform inference, one uses the [`generate`] method, which allows to autoregressively generate text. This method supports various forms of decoding, such as greedy, beam search and multinomial sampling.
```python
>>> from transformers import AutoTokenizer, EncoderDecoderModel
>>> # load a fine-tuned seq2seq model and corresponding tokenizer
>>> model = EncoderDecoderModel.from_pretrained("patrickvonplaten/bert2bert_cnn_daily_mail")
>>> tokenizer = AutoTokenizer.from_pretrained("patrickvonplaten/bert2bert_cnn_daily_mail")
>>> # let's perform inference on a long piece of text
>>> ARTICLE_TO_SUMMARIZE = (
... "PG&E stated it scheduled the blackouts in response to forecasts for high winds "
... "amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were "
... "scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."
... )
>>> input_ids = tokenizer(ARTICLE_TO_SUMMARIZE, return_tensors="pt").input_ids
>>> # autoregressively generate summary (uses greedy decoding by default)
>>> generated_ids = model.generate(input_ids)
>>> generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
>>> print(generated_text)
nearly 800 thousand customers were affected by the shutoffs. the aim is to reduce the risk of wildfires. nearly 800, 000 customers were expected to be affected by high winds amid dry conditions. pg & e said it scheduled the blackouts to last through at least midday tomorrow.
```
## Loading a PyTorch checkpoint into `TFEncoderDecoderModel`.
[`TFEncoderDecoderModel.from_pretrained`] currently doesn't support initializing the model from a
pytorch checkpoint. Passing `from_pt=True` to this method will throw an exception. If there are only pytorch
checkpoints for a particular encoder-decoder model, a workaround is:
```python
>>> # a workaround to load from pytorch checkpoint
>>> from transformers import EncoderDecoderModel, TFEncoderDecoderModel
>>> _model = EncoderDecoderModel.from_pretrained("patrickvonplaten/bert2bert-cnn_dailymail-fp16")
>>> _model.encoder.save_pretrained("./encoder")
>>> _model.decoder.save_pretrained("./decoder")
>>> model = TFEncoderDecoderModel.from_encoder_decoder_pretrained(
... "./encoder", "./decoder", encoder_from_pt=True, decoder_from_pt=True
... )
@ -41,6 +105,38 @@ checkpoints for a particular encoder-decoder model, a workaround is:
>>> model.config = _model.config
```
## Training
Once the model is created, it can be fine-tuned similar to BART, T5 or any other encoder-decoder model.
As you can see, only 2 inputs are required for the model in order to compute a loss: `input_ids` (which are the
`input_ids` of the encoded input sequence) and `labels` (which are the `input_ids` of the encoded
target sequence).
```python
>>> from transformers import BertTokenizer, EncoderDecoderModel
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
>>> model = EncoderDecoderModel.from_encoder_decoder_pretrained("bert-base-uncased", "bert-base-uncased")
>>> model.config.decoder_start_token_id = tokenizer.cls_token_id
>>> model.config.pad_token_id = tokenizer.pad_token_id
>>> input_ids = tokenizer(
... "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side.During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft).Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.",
... return_tensors="pt",
... ).input_ids
>>> labels = tokenizer(
... "the eiffel tower surpassed the washington monument to become the tallest structure in the world. it was the first structure to reach a height of 300 metres in paris in 1930. it is now taller than the chrysler building by 5. 2 metres ( 17 ft ) and is the second tallest free - standing structure in paris.",
... return_tensors="pt",
... ).input_ids
>>> # the forward function automatically creates the correct decoder_input_ids
>>> loss = model(input_ids=input_ids, labels=labels).loss
```
Detailed [colab](https://colab.research.google.com/drive/1WIk2bxglElfZewOHboPFNj8H44_VAyKE?usp=sharing#scrollTo=ZwQIEhKOrJpl) for training.
This model was contributed by [thomwolf](https://github.com/thomwolf). This model's TensorFlow and Flax versions
were contributed by [ydshieh](https://github.com/ydshieh).

View File

@ -0,0 +1,102 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# ERNIE
## Overview
ERNIE is a series of powerful models proposed by baidu, especially in Chinese tasks,
including [ERNIE1.0](https://arxiv.org/abs/1904.09223), [ERNIE2.0](https://ojs.aaai.org/index.php/AAAI/article/view/6428),
[ERNIE3.0](https://arxiv.org/abs/2107.02137), [ERNIE-Gram](https://arxiv.org/abs/2010.12148), [ERNIE-health](https://arxiv.org/abs/2110.07244), etc.
These models are contributed by [nghuyong](https://huggingface.co/nghuyong) and the official code can be found in [PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP) (in PaddlePaddle).
### How to use
Take `ernie-1.0-base-zh` as an example:
```Python
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("nghuyong/ernie-1.0-base-zh")
model = AutoModel.from_pretrained("nghuyong/ernie-1.0-base-zh")
```
### Supported Models
| Model Name | Language | Description |
|:-------------------:|:--------:|:-------------------------------:|
| ernie-1.0-base-zh | Chinese | Layer:12, Heads:12, Hidden:768 |
| ernie-2.0-base-en | English | Layer:12, Heads:12, Hidden:768 |
| ernie-2.0-large-en | English | Layer:24, Heads:16, Hidden:1024 |
| ernie-3.0-base-zh | Chinese | Layer:12, Heads:12, Hidden:768 |
| ernie-3.0-medium-zh | Chinese | Layer:6, Heads:12, Hidden:768 |
| ernie-3.0-mini-zh | Chinese | Layer:6, Heads:12, Hidden:384 |
| ernie-3.0-micro-zh | Chinese | Layer:4, Heads:12, Hidden:384 |
| ernie-3.0-nano-zh | Chinese | Layer:4, Heads:12, Hidden:312 |
| ernie-health-zh | Chinese | Layer:12, Heads:12, Hidden:768 |
| ernie-gram-zh | Chinese | Layer:12, Heads:12, Hidden:768 |
You can find all the supported models from huggingface's model hub: [huggingface.co/nghuyong](https://huggingface.co/nghuyong), and model details from paddle's official
repo: [PaddleNLP](https://paddlenlp.readthedocs.io/zh/latest/model_zoo/transformers/ERNIE/contents.html)
and [ERNIE](https://github.com/PaddlePaddle/ERNIE/blob/repro).
## ErnieConfig
[[autodoc]] ErnieConfig
- all
## Ernie specific outputs
[[autodoc]] models.ernie.modeling_ernie.ErnieForPreTrainingOutput
## ErnieModel
[[autodoc]] ErnieModel
- forward
## ErnieForPreTraining
[[autodoc]] ErnieForPreTraining
- forward
## ErnieForCausalLM
[[autodoc]] ErnieForCausalLM
- forward
## ErnieForMaskedLM
[[autodoc]] ErnieForMaskedLM
- forward
## ErnieForNextSentencePrediction
[[autodoc]] ErnieForNextSentencePrediction
- forward
## ErnieForSequenceClassification
[[autodoc]] ErnieForSequenceClassification
- forward
## ErnieForMultipleChoice
[[autodoc]] ErnieForMultipleChoice
- forward
## ErnieForTokenClassification
[[autodoc]] ErnieForTokenClassification
- forward
## ErnieForQuestionAnswering
[[autodoc]] ErnieForQuestionAnswering
- forward

View File

@ -0,0 +1,96 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# FLAVA
## Overview
The FLAVA model was proposed in [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela and is accepted at CVPR 2022.
The paper aims at creating a single unified foundation model which can work across vision, language
as well as vision-and-language multimodal tasks.
The abstract from the paper is the following:
*State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic pretraining for obtaining good performance on a variety
of downstream tasks. Generally, such models are often either cross-modal (contrastive) or multi-modal
(with earlier fusion) but not both; and they often only target specific modalities or tasks. A promising
direction would be to use a single holistic universal model, as a "foundation", that targets all modalities
at once -- a true vision and language foundation model should be good at vision tasks, language tasks, and
cross- and multi-modal vision and language tasks. We introduce FLAVA as such a model and demonstrate
impressive performance on a wide range of 35 tasks spanning these target modalities.*
This model was contributed by [aps](https://huggingface.co/aps). The original code can be found [here](https://github.com/facebookresearch/multimodal/tree/main/examples/flava).
## FlavaConfig
[[autodoc]] FlavaConfig
## FlavaTextConfig
[[autodoc]] FlavaTextConfig
## FlavaImageConfig
[[autodoc]] FlavaImageConfig
## FlavaMultimodalConfig
[[autodoc]] FlavaMultimodalConfig
## FlavaImageCodebookConfig
[[autodoc]] FlavaImageCodebookConfig
## FlavaProcessor
[[autodoc]] FlavaProcessor
## FlavaFeatureExtractor
[[autodoc]] FlavaFeatureExtractor
## FlavaForPreTraining
[[autodoc]] FlavaForPreTraining
- forward
## FlavaModel
[[autodoc]] FlavaModel
- forward
- get_text_features
- get_image_features
## FlavaImageCodebook
[[autodoc]] FlavaImageCodebook
- forward
- get_codebook_indices
- get_codebook_probs
## FlavaTextModel
[[autodoc]] FlavaTextModel
- forward
## FlavaImageModel
[[autodoc]] FlavaImageModel
- forward
## FlavaMultimodalModel
[[autodoc]] FlavaMultimodalModel
- forward

View File

@ -0,0 +1,76 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# GPT-NeoX
## Overview
We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model trained on the Pile, whose weights will
be made freely and openly available to the public through a permissive license. It is, to the best of our knowledge,
the largest dense autoregressive model that has publicly available weights at the time of submission. In this work,
we describe GPT-NeoX-20B's architecture and training and evaluate its performance on a range of language-understanding,
mathematics, and knowledge-based tasks. We find that GPT-NeoX-20B is a particularly powerful few-shot reasoner and
gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models. We open-source
the training and evaluation code, as well as the model weights, at [https://github.com/EleutherAI/gpt-neox](https://github.com/EleutherAI/gpt-neox).
Development of the model was led by Sid Black, Stella Biderman and Eric Hallahan, and the model was trained with
generous the support of [CoreWeave](https://www.coreweave.com/).
GPT-NeoX-20B was trained with fp16, thus it is recommended to initialize the model as follows:
```python
model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b").half().cuda()
```
GPT-NeoX-20B also has a different tokenizer from the one used in GPT-J-6B and GPT-Neo. The new tokenizer allocates
additional tokens to whitespace characters, making the model more suitable for certain tasks like code generation.
### Generation
The `generate()` method can be used to generate text using GPT Neo model.
```python
>>> from transformers import GPTNeoXForCausalLM, GPTNeoXTokenizerFast
>>> model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b")
>>> tokenizer = GPTNeoXTokenizerFast.from_pretrained("EleutherAI/gpt-neox-20b")
>>> prompt = "GPTNeoX20B is a 20B-parameter autoregressive Transformer model developed by EleutherAI."
>>> input_ids = tokenizer(prompt, return_tensors="pt").input_ids
>>> gen_tokens = model.generate(
... input_ids,
... do_sample=True,
... temperature=0.9,
... max_length=100,
... )
>>> gen_text = tokenizer.batch_decode(gen_tokens)[0]
```
## GPTNeoXConfig
[[autodoc]] GPTNeoXConfig
## GPTNeoXTokenizerFast
[[autodoc]] GPTNeoXTokenizerFast
## GPTNeoXModel
[[autodoc]] GPTNeoXModel
- forward
## GPTNeoXForCausalLM
[[autodoc]] GPTNeoXForCausalLM
- forward

View File

@ -0,0 +1,61 @@
<!--Copyright 2022 NVIDIA and The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# GroupViT
## Overview
The GroupViT model was proposed in [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
Inspired by [CLIP](clip), GroupViT is a vision-language model that can perform zero-shot semantic segmentation on any given vocabulary categories.
The abstract from the paper is the following:
*Grouping and recognition are important components of visual scene understanding, e.g., for object detection and semantic segmentation. With end-to-end deep learning systems, grouping of image regions usually happens implicitly via top-down supervision from pixel-level recognition labels. Instead, in this paper, we propose to bring back the grouping mechanism into deep networks, which allows semantic segments to emerge automatically with only text supervision. We propose a hierarchical Grouping Vision Transformer (GroupViT), which goes beyond the regular grid structure representation and learns to group image regions into progressively larger arbitrary-shaped segments. We train GroupViT jointly with a text encoder on a large-scale image-text dataset via contrastive losses. With only text supervision and without any pixel-level annotations, GroupViT learns to group together semantic regions and successfully transfers to the task of semantic segmentation in a zero-shot manner, i.e., without any further fine-tuning. It achieves a zero-shot accuracy of 52.3% mIoU on the PASCAL VOC 2012 and 22.4% mIoU on PASCAL Context datasets, and performs competitively to state-of-the-art transfer-learning methods requiring greater levels of supervision.*
Tips:
- You may specify `output_segmentation=True` in the forward of `GroupViTModel` to get the segmentation logits of input texts.
- The quickest way to get started with GroupViT is by checking the [example notebooks](https://github.com/xvjiarui/GroupViT/blob/main/demo/GroupViT_hf_inference_notebook.ipynb) (which showcase zero-shot segmentation inference). One can also check out the [HuggingFace Spaces demo](https://huggingface.co/spaces/xvjiarui/GroupViT) to play with GroupViT.
This model was contributed by [xvjiarui](https://huggingface.co/xvjiarui).
The original code can be found [here](https://github.com/NVlabs/GroupViT).
## GroupViTConfig
[[autodoc]] GroupViTConfig
- from_text_vision_configs
## GroupViTTextConfig
[[autodoc]] GroupViTTextConfig
## GroupViTVisionConfig
[[autodoc]] GroupViTVisionConfig
## GroupViTModel
[[autodoc]] GroupViTModel
- forward
- get_text_features
- get_image_features
## GroupViTTextModel
[[autodoc]] GroupViTTextModel
- forward
## GroupViTVisionModel
[[autodoc]] GroupViTVisionModel
- forward

View File

@ -67,7 +67,8 @@ occurs. Those can be obtained using the Python Image Library (PIL) library for e
```python
from PIL import Image
image = Image.open("name_of_your_document - can be a png file, pdf, etc.")
# Document can be a png, jpg, etc. PDFs must be converted to images.
image = Image.open(name_of_your_document).convert("RGB")
width, height = image.size
```
@ -107,6 +108,10 @@ This model was contributed by [liminghao1630](https://huggingface.co/liminghao16
[[autodoc]] LayoutLMForTokenClassification
## LayoutLMForQuestionAnswering
[[autodoc]] LayoutLMForQuestionAnswering
## TFLayoutLMModel
[[autodoc]] TFLayoutLMModel
@ -122,3 +127,7 @@ This model was contributed by [liminghao1630](https://huggingface.co/liminghao16
## TFLayoutLMForTokenClassification
[[autodoc]] TFLayoutLMForTokenClassification
## TFLayoutLMForQuestionAnswering
[[autodoc]] TFLayoutLMForQuestionAnswering

View File

@ -44,6 +44,14 @@ including FUNSD (0.7895 -> 0.8420), CORD (0.9493 -> 0.9601), SROIE (0.9524 -> 0.
RVL-CDIP (0.9443 -> 0.9564), and DocVQA (0.7295 -> 0.8672). The pre-trained LayoutLMv2 model is publicly available at
this https URL.*
LayoutLMv2 depends on `detectron2`, `torchvision` and `tesseract`. Run the
following to install them:
```
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
python -m pip install torchvision tesseract
```
(If you are developing for LayoutLMv2, note that passing the doctests also requires the installation of these packages.)
Tips:
- The main difference between LayoutLMv1 and LayoutLMv2 is that the latter incorporates visual embeddings during

View File

@ -0,0 +1,106 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# LayoutLMv3
## Overview
The LayoutLMv3 model was proposed in [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
LayoutLMv3 simplifies [LayoutLMv2](layoutlmv2) by using patch embeddings (as in [ViT](vit)) instead of leveraging a CNN backbone, and pre-trains the model on 3 objectives: masked language modeling (MLM), masked image modeling (MIM)
and word-patch alignment (WPA).
The abstract from the paper is the following:
*Self-supervised pre-training techniques have achieved remarkable progress in Document AI. Most multimodal pre-trained models use a masked language modeling objective to learn bidirectional representations on the text modality, but they differ in pre-training objectives for the image modality. This discrepancy adds difficulty to multimodal representation learning. In this paper, we propose LayoutLMv3 to pre-train multimodal Transformers for Document AI with unified text and image masking. Additionally, LayoutLMv3 is pre-trained with a word-patch alignment objective to learn cross-modal alignment by predicting whether the corresponding image patch of a text word is masked. The simple unified architecture and training objectives make LayoutLMv3 a general-purpose pre-trained model for both text-centric and image-centric Document AI tasks. Experimental results show that LayoutLMv3 achieves state-of-the-art performance not only in text-centric tasks, including form understanding, receipt understanding, and document visual question answering, but also in image-centric tasks such as document image classification and document layout analysis.*
Tips:
- In terms of data processing, LayoutLMv3 is identical to its predecessor [LayoutLMv2](layoutlmv2), except that:
- images need to be resized and normalized with channels in regular RGB format. LayoutLMv2 on the other hand normalizes the images internally and expects the channels in BGR format.
- text is tokenized using byte-pair encoding (BPE), as opposed to WordPiece.
Due to these differences in data preprocessing, one can use [`LayoutLMv3Processor`] which internally combines a [`LayoutLMv3FeatureExtractor`] (for the image modality) and a [`LayoutLMv3Tokenizer`]/[`LayoutLMv3TokenizerFast`] (for the text modality) to prepare all data for the model.
- Regarding usage of [`LayoutLMv3Processor`], we refer to the [usage guide](layoutlmv2#usage-layoutlmv2processor) of its predecessor.
- Demo notebooks for LayoutLMv3 can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/LayoutLMv3).
- Demo scripts can be found [here](https://github.com/huggingface/transformers/tree/main/examples/research_projects/layoutlmv3).
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/layoutlmv3_architecture.png"
alt="drawing" width="600"/>
<small> LayoutLMv3 architecture. Taken from the <a href="https://arxiv.org/abs/2204.08387">original paper</a>. </small>
This model was contributed by [nielsr](https://huggingface.co/nielsr). The TensorFlow version of this model was added by [chriskoo](https://huggingface.co/chriskoo), [tokec](https://huggingface.co/tokec), and [lre](https://huggingface.co/lre). The original code can be found [here](https://github.com/microsoft/unilm/tree/master/layoutlmv3).
## LayoutLMv3Config
[[autodoc]] LayoutLMv3Config
## LayoutLMv3FeatureExtractor
[[autodoc]] LayoutLMv3FeatureExtractor
- __call__
## LayoutLMv3Tokenizer
[[autodoc]] LayoutLMv3Tokenizer
- __call__
- save_vocabulary
## LayoutLMv3TokenizerFast
[[autodoc]] LayoutLMv3TokenizerFast
- __call__
## LayoutLMv3Processor
[[autodoc]] LayoutLMv3Processor
- __call__
## LayoutLMv3Model
[[autodoc]] LayoutLMv3Model
- forward
## LayoutLMv3ForSequenceClassification
[[autodoc]] LayoutLMv3ForSequenceClassification
- forward
## LayoutLMv3ForTokenClassification
[[autodoc]] LayoutLMv3ForTokenClassification
- forward
## LayoutLMv3ForQuestionAnswering
[[autodoc]] LayoutLMv3ForQuestionAnswering
- forward
## TFLayoutLMv3Model
[[autodoc]] TFLayoutLMv3Model
- call
## TFLayoutLMv3ForSequenceClassification
[[autodoc]] TFLayoutLMv3ForSequenceClassification
- call
## TFLayoutLMv3ForTokenClassification
[[autodoc]] TFLayoutLMv3ForTokenClassification
- call
## TFLayoutLMv3ForQuestionAnswering
[[autodoc]] TFLayoutLMv3ForQuestionAnswering
- call

View File

@ -44,8 +44,10 @@ Tips:
- LED makes use of *global attention* by means of the `global_attention_mask` (see
[`LongformerModel`]). For summarization, it is advised to put *global attention* only on the first
`<s>` token. For question answering, it is advised to put *global attention* on all tokens of the question.
- To fine-tune LED on all 16384, it is necessary to enable *gradient checkpointing* by executing
`model.gradient_checkpointing_enable()`.
- To fine-tune LED on all 16384, *gradient checkpointing* can be enabled in case training leads to out-of-memory (OOM)
errors. This can be done by executing `model.gradient_checkpointing_enable()`.
Moreover, the `use_cache=False`
flag can be used to disable the caching mechanism to save memory.
- A notebook showing how to evaluate LED, can be accessed [here](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing).
- A notebook showing how to fine-tune LED, can be accessed [here](https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing).

View File

@ -0,0 +1,87 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# LeViT
## Overview
The LeViT model was proposed in [LeViT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze. LeViT improves the [Vision Transformer (ViT)](vit) in performance and efficiency by a few architectural differences such as activation maps with decreasing resolutions in Transformers and the introduction of an attention bias to integrate positional information.
The abstract from the paper is the following:
*We design a family of image classification architectures that optimize the trade-off between accuracy
and efficiency in a high-speed regime. Our work exploits recent findings in attention-based architectures,
which are competitive on highly parallel processing hardware. We revisit principles from the extensive
literature on convolutional neural networks to apply them to transformers, in particular activation maps
with decreasing resolutions. We also introduce the attention bias, a new way to integrate positional information
in vision transformers. As a result, we propose LeVIT: a hybrid neural network for fast inference image classification.
We consider different measures of efficiency on different hardware platforms, so as to best reflect a wide range of
application scenarios. Our extensive experiments empirically validate our technical choices and show they are suitable
to most architectures. Overall, LeViT significantly outperforms existing convnets and vision transformers with respect
to the speed/accuracy tradeoff. For example, at 80% ImageNet top-1 accuracy, LeViT is 5 times faster than EfficientNet on CPU. *
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/levit_architecture.png"
alt="drawing" width="600"/>
<small> LeViT Architecture. Taken from the <a href="https://arxiv.org/abs/2104.01136">original paper</a>.</small>
Tips:
- Compared to ViT, LeViT models use an additional distillation head to effectively learn from a teacher (which, in the LeViT paper, is a ResNet like-model). The distillation head is learned through backpropagation under supervision of a ResNet like-model. They also draw inspiration from convolution neural networks to use activation maps with decreasing resolutions to increase the efficiency.
- There are 2 ways to fine-tune distilled models, either (1) in a classic way, by only placing a prediction head on top
of the final hidden state and not using the distillation head, or (2) by placing both a prediction head and distillation
head on top of the final hidden state. In that case, the prediction head is trained using regular cross-entropy between
the prediction of the head and the ground-truth label, while the distillation prediction head is trained using hard distillation
(cross-entropy between the prediction of the distillation head and the label predicted by the teacher). At inference time,
one takes the average prediction between both heads as final prediction. (2) is also called "fine-tuning with distillation",
because one relies on a teacher that has already been fine-tuned on the downstream dataset. In terms of models, (1) corresponds
to [`LevitForImageClassification`] and (2) corresponds to [`LevitForImageClassificationWithTeacher`].
- All released checkpoints were pre-trained and fine-tuned on [ImageNet-1k](https://huggingface.co/datasets/imagenet-1k)
(also referred to as ILSVRC 2012, a collection of 1.3 million images and 1,000 classes). only. No external data was used. This is in
contrast with the original ViT model, which used external data like the JFT-300M dataset/Imagenet-21k for
pre-training.
- The authors of LeViT released 5 trained LeViT models, which you can directly plug into [`LevitModel`] or [`LevitForImageClassification`].
Techniques like data augmentation, optimization, and regularization were used in order to simulate training on a much larger dataset
(while only using ImageNet-1k for pre-training). The 5 variants available are (all trained on images of size 224x224):
*facebook/levit-128S*, *facebook/levit-128*, *facebook/levit-192*, *facebook/levit-256* and
*facebook/levit-384*. Note that one should use [`LevitFeatureExtractor`] in order to
prepare images for the model.
- [`LevitForImageClassificationWithTeacher`] currently supports only inference and not training or fine-tuning.
- You can check out demo notebooks regarding inference as well as fine-tuning on custom data [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/VisionTransformer)
(you can just replace [`ViTFeatureExtractor`] by [`LevitFeatureExtractor`] and [`ViTForImageClassification`] by [`LevitForImageClassification`] or [`LevitForImageClassificationWithTeacher`]).
This model was contributed by [anugunj](https://huggingface.co/anugunj). The original code can be found [here](https://github.com/facebookresearch/LeViT).
## LevitConfig
[[autodoc]] LevitConfig
## LevitFeatureExtractor
[[autodoc]] LevitFeatureExtractor
- __call__
## LevitModel
[[autodoc]] LevitModel
- forward
## LevitForImageClassification
[[autodoc]] LevitForImageClassification
- forward
## LevitForImageClassificationWithTeacher
[[autodoc]] LevitForImageClassificationWithTeacher
- forward

View File

@ -0,0 +1,121 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# LongT5
## Overview
The LongT5 model was proposed in [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916)
by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung and Yinfei Yang. It's an
encoder-decoder transformer pre-trained in a text-to-text denoising generative setting. LongT5 model is an extension of
T5 model, and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2)
Transient-Global attention.
The abstract from the paper is the following:
*Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the
performance of Transformer-based neural models. In this paper, we present a new model, called LongT5, with which we
explore the effects of scaling both the input length and model size at the same time. Specifically, we integrated
attention ideas from long-input transformers (ETC), and adopted pre-training strategies from summarization pre-training
(PEGASUS) into the scalable T5 architecture. The result is a new attention mechanism we call {\em Transient Global}
(TGlobal), which mimics ETC's local/global attention mechanism, but without requiring additional side-inputs. We are
able to achieve state-of-the-art results on several summarization tasks and outperform the original T5 models on
question answering tasks.*
Tips:
- [`LongT5ForConditionalGeneration`] is an extension of [`T5ForConditionalGeneration`] exchanging the traditional
encoder *self-attention* layer with efficient either *local* attention or *transient-global* (*tglobal*) attention.
- Unlike the T5 model, LongT5 does not use a task prefix. Furthermore, it uses a different pre-training objective
inspired by the pre-training of [`PegasusForConditionalGeneration`].
- LongT5 model is designed to work efficiently and very well on long-range *sequence-to-sequence* tasks where the
input sequence exceeds commonly used 512 tokens. It is capable of handling input sequences of a length up to 16,384 tokens.
- For *Local Attention*, the sparse sliding-window local attention operation allows a given token to attend only `r`
tokens to the left and right of it (with `r=127` by default). *Local Attention* does not introduce any new parameters
to the model. The complexity of the mechanism is linear in input sequence length `l`: `O(l*r)`.
- *Transient Global Attention* is an extension of the *Local Attention*. It, furthermore, allows each input token to
interact with all other tokens in the layer. This is achieved via splitting an input sequence into blocks of a fixed
length `k` (with a default `k=16`). Then, a global token for such a block is obtained via summing and normalizing the embeddings of every token
in the block. Thanks to this, the attention allows each token to attend to both nearby tokens like in Local attention, and
also every global token like in the case of standard global attention (*transient* represents the fact the global tokens
are constructed dynamically within each attention operation). As a consequence, *TGlobal* attention introduces
a few new parameters -- global relative position biases and a layer normalization for global token's embedding.
The complexity of this mechanism is `O(l(r + l/k))`.
- An example showing how to evaluate a fine-tuned LongT5 model on the [pubmed dataset](https://huggingface.co/datasets/scientific_papers) is below.
```python
>>> import evaluate
>>> from datasets import load_dataset
>>> from transformers import AutoTokenizer, LongT5ForConditionalGeneration
>>> dataset = load_dataset("scientific_papers", "pubmed", split="validation")
>>> model = (
... LongT5ForConditionalGeneration.from_pretrained("Stancld/longt5-tglobal-large-16384-pubmed-3k_steps")
... .to("cuda")
... .half()
... )
>>> tokenizer = AutoTokenizer.from_pretrained("Stancld/longt5-tglobal-large-16384-pubmed-3k_steps")
>>> def generate_answers(batch):
... inputs_dict = tokenizer(
... batch["article"], max_length=16384, padding="max_length", truncation=True, return_tensors="pt"
... )
... input_ids = inputs_dict.input_ids.to("cuda")
... attention_mask = inputs_dict.attention_mask.to("cuda")
... output_ids = model.generate(input_ids, attention_mask=attention_mask, max_length=512, num_beams=2)
... batch["predicted_abstract"] = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
... return batch
>>> result = dataset.map(generate_answer, batched=True, batch_size=2)
>>> rouge = evaluate.load("rouge")
>>> rouge.compute(predictions=result["predicted_abstract"], references=result["abstract"])
```
This model was contributed by [stancld](https://huggingface.co/stancld).
The original code can be found [here](https://github.com/google-research/longt5).
## LongT5Config
[[autodoc]] LongT5Config
## LongT5Model
[[autodoc]] LongT5Model
- forward
## LongT5ForConditionalGeneration
[[autodoc]] LongT5ForConditionalGeneration
- forward
## LongT5EncoderModel
[[autodoc]] LongT5EncoderModel
- forward
## FlaxLongT5Model
[[autodoc]] FlaxLongT5Model
- __call__
- encode
- decode
## FlaxLongT5ForConditionalGeneration
[[autodoc]] FlaxLongT5ForConditionalGeneration
- __call__
- encode
- decode

View File

@ -97,7 +97,7 @@ Example:
>>> entities = [
... "Beyoncé",
... "Los Angeles",
>>> ] # Wikipedia entity titles corresponding to the entity mentions "Beyoncé" and "Los Angeles"
... ] # Wikipedia entity titles corresponding to the entity mentions "Beyoncé" and "Los Angeles"
>>> entity_spans = [(0, 7), (17, 28)] # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
>>> inputs = tokenizer(text, entities=entities, entity_spans=entity_spans, add_prefix_space=True, return_tensors="pt")
>>> outputs = model(**inputs)
@ -152,3 +152,23 @@ This model was contributed by [ikuyamada](https://huggingface.co/ikuyamada) and
[[autodoc]] LukeForEntitySpanClassification
- forward
## LukeForSequenceClassification
[[autodoc]] LukeForSequenceClassification
- forward
## LukeForMultipleChoice
[[autodoc]] LukeForMultipleChoice
- forward
## LukeForTokenClassification
[[autodoc]] LukeForTokenClassification
- forward
## LukeForQuestionAnswering
[[autodoc]] LukeForQuestionAnswering
- forward

View File

@ -55,9 +55,7 @@ tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M", src_lang="en
src_text = "Life is like a box of chocolates."
tgt_text = "La vie est comme une boîte de chocolat."
model_inputs = tokenizer(src_text, return_tensors="pt")
with tokenizer.as_target_tokenizer():
labels = tokenizer(tgt_text, return_tensors="pt").input_ids
model_inputs = tokenizer(src_text, text_target=tgt_text, return_tensors="pt")
loss = model(**model_inputs, labels=labels) # forward pass
```

View File

@ -155,7 +155,7 @@ Example of translating english to many romance languages, using old-style 2 char
## MarianTokenizer
[[autodoc]] MarianTokenizer
- as_target_tokenizer
- build_inputs_with_special_tokens
## MarianModel

Some files were not shown because too many files have changed in this diff Show More