Compare commits

...

651 Commits

Author SHA1 Message Date
d7503f5068 Add docs README 2022-03-08 11:55:07 -05:00
e82782cc10 Update for doc-builder -> hf-doc-utils 2022-03-08 11:52:43 -05:00
70203b5937 TF generate refactor - past without encoder outputs (#15944)
* Remove packed past from generation_tf_utils

* update models with the new past format

* update template accordingly
2022-03-08 14:46:44 +00:00
62d847602a Update TF multiple choice example (#15868) 2022-03-08 13:16:34 +00:00
ab2f8d12a7 add hf hub to env version command (#15981) 2022-03-08 14:03:03 +01:00
72983303c5 Fix TFEncoderDecoderModelTest - Pytorch device (#15979)
* fix device

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-03-08 13:37:20 +01:00
f5a080dd10 Do a pull in case docs were updated during build (#15922) 2022-03-08 07:19:41 -05:00
91fb62d01c Speedup training by using numpy instead of jnp for batch shuffling (#15963)
Speedup training by using numpy instead of jnp for batch shuffling

Co-authored-by: Yeb Havinga <y.t.havinga@mgrid.net>
2022-03-08 12:18:38 +01:00
ea07064a5c Returning outputs only when asked for for MaskFormer. (#15936)
* Returning outputs only when asked for for MaskFormer.

* Adding `output_auxiliary_logits` to the config.
2022-03-08 11:17:57 +01:00
b19f3e69a0 [Tests] Fix ViTMAE integration test (#15949)
* Fix test across both cpu and gpu

* Fix typo
2022-03-08 10:49:44 +01:00
9879a1d5f0 Fix LayoutLMv2 test (#15939)
* Fix LayoutLMv2 test

* Update black
2022-03-08 10:49:30 +01:00
8b9ae45549 Set scale_embedding to False in some TF tests (#15952)
* set scale_embedding to False to avoid large (> 1e-5) output differences between PT/TF

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-03-07 22:14:33 +01:00
38cc35069c Update training scripts docs (#15931)
* 📝 first draft

* 🖍 apply feedback

* 🖍 remove examples from toctree

* 🗑 remove examples from docs/source
2022-03-07 13:29:14 -06:00
c87cfd653c Better error message when inputs are empty 2022-03-07 13:29:16 -05:00
e9fa7cd5d7 Make is_thing_map in Feature Extractor post_process_panoptic_segmentation defaults to all instances (#15954)
* is_thing_map defaults to all instances

* better naming

* control flow

* resolving conversations
2022-03-07 19:10:32 +01:00
2596f95e84 Fix Embedding Module Bug in Flax Models (#15920) 2022-03-07 18:17:45 +01:00
1a62b25caf Backprop Test for Freeze FlaxWav2Vec2 Feature Encoder (#15938)
* Backprop Test for Freeze FlaxWav2Vec2 Feature Encoder

* remove jnp.ndarray type suggestion

* assert frozen grads are precisely zero
2022-03-07 18:10:15 +01:00
544fd9876b Support modern list type hints in HfArgumentParser (#15951)
* Support modern list type hint in HfArgumentParser

* Fix formatting with black
2022-03-07 10:22:48 -05:00
60b81dfa6f remove re-defination of FlaxWav2Vec2ForCTCModule (#15965) 2022-03-07 14:58:44 +01:00
ef9c3ca348 [Bug Fix] Beam search example in docs fails & a fix (integrating max_length in BeamScorer.finalize()) (#15555)
* added the test and fix

* had left out a comment
2022-03-07 09:10:18 +01:00
9932ee4b4b made MaskFormerModelTest faster (#15942) 2022-03-04 19:11:48 +01:00
e8efaecb87 Move dependency to call method (#15941) 2022-03-04 18:53:54 +01:00
5c6f57ee75 Constrained Beam Search [*With* Disjunctive Decoding] (#15761)
* added classes to get started with constrained beam search

* in progress, think i can directly force tokens now but not yet with the round robin

* think now i have total control, now need to code the bank selection

* technically works as desired, need to optimize and fix design choices leading to undersirable outputs

* complete PR #1 without disjunctive decoding

* removed incorrect tests

* Delete k.txt

* Delete test.py

* Delete test.sh

* revert changes to test scripts

* genutils

* full implementation with testing, no disjunctive yet

* shifted docs

* passing all tests realistically ran locally

* removing accidentally included print statements

* fixed source of error in initial PR test

* fixing the get_device() vs device trap

* fixed documentation docstrings about constrained_beam_search

* fixed tests having failing for Speech2TextModel's floating point inputs

* fix cuda long tensor

* added examples and testing for them and founx & fixed a bug in beam_search and constrained_beam_search

* deleted accidentally added test halting code with assert False

* code reformat

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

* fixing based on comments on PR

* took out the testing code that should but work fails without the beam search moditification ; style changes

* fixing comments issues

* docstrings for ConstraintListState

* typo in PhrsalConstraint docstring

* docstrings improvements

* finished adding what is sort of an opinionated implementation of disjunctive generation, but it revealed errors in inner beam search logic during testing.

* fixed bug found in constrained beam search that used beam_idx that were not global across all the batches

* disjunctive constraint working 100% correctly

* passing all tests

* Accidentally included mlruns

* Update src/transformers/generation_beam_constraints.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/generation_beam_constraints.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* complete overhaul of type complexities and other nits

* strict type checks in generate()

* fixing second round of feedback by narsil

* fixed failing generation test because of type check overhaul

* generation test fail fix

* fixing test fails

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-03-04 18:18:34 +01:00
040c11f6da Tests for MaskFormerFeatureExtractor's post_process*** methods (#15929)
* proper tests for post_process*** methods in feature extractor

* mask th == 0

* Update tests/maskformer/test_feature_extraction_maskformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-04 18:04:19 +01:00
f0aacc140b Do not change the output from tuple to list - to match PT's version (#15918)
* Do not change the output from tuple to list - to match PT's version

* Fix the same issues for 5 other models and the template

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-03-04 17:50:24 +01:00
10b76987fc [FlaxT5 Example] fix flax t5 example pretraining (#15835) 2022-03-04 17:04:43 +01:00
01485ceec3 Add missing support for Flax XLM-RoBERTa (#15900)
* Adding Flax XLM-RoBERTa

* Add Flax to __init__

* Adding doc and dummy objects

* Add tests

* Add Flax XLM-R models autodoc

* Fix tests

* Add Flask XLM-RoBERTa to TEST_FILES_WITH_NO_COMMON_TESTS

* Update src/transformers/models/xlm_roberta/modeling_flax_xlm_roberta.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update tests/xlm_roberta/test_modeling_flax_xlm_roberta.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update tests/xlm_roberta/test_modeling_flax_xlm_roberta.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Remove test on large Flask XLM-RoBERTa

* Add tokenizer to the test

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2022-03-04 14:36:28 +01:00
89c7d9cfba Making MaskFormerForInstanceSegmentation. (#15934)
Small adjustments.

Adding in type hint.

Last fix ?

Only include the default dict thing, not the pipelines.
2022-03-04 13:56:15 +01:00
7ade7c1794 Updating the slow tests: (#15893)
Linked to https://github.com/huggingface/transformers/pull/15826
2022-03-04 12:32:19 +01:00
6b104c5bb0 Support CLIPTokenizerFast for CLIPProcessor (#15913)
* Fix to support fast tokenizer with `CLIPProcessor`

* Update CLIPProcessor test for fast tokenizer

* Fix Docstring Style

* Rename into meaningful Variable name in test code
2022-03-04 11:57:09 +01:00
b71474895d Update README.md 2022-03-04 09:58:45 +01:00
a6e3b17981 Re-enabling all fast pipeline tests. (#15924) 2022-03-04 09:53:00 +01:00
a7df656f03 Update README.md (#15926) 2022-03-04 00:22:38 +01:00
c0281feb50 Fix #15898 (#15928) 2022-03-03 14:41:03 -05:00
9251427c38 Add vision models to doc tests (#15905)
* Add vision models to doc tests

* Apply suggestions from code review

* Add more models

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-03-03 19:46:31 +01:00
742273a52a fix for the output from post_process_panoptic_segmentation (#15916) 2022-03-03 19:35:48 +01:00
7c45fe747f Mark slow tests as slow 2022-03-03 11:03:24 -05:00
3822e4a563 Enabling MaskFormer in pipelines (#15917)
* Enabling MaskFormer in ppipelines

No AutoModel though :(

* Ooops local file.
2022-03-03 16:31:41 +01:00
79d28e80b6 v4.18.0.dev.0 2022-03-03 10:19:58 -05:00
6cbfa7bf4c [Doctests] Fix ignore bug and add more doc tests (#15911)
* finish speech doc tests

* finish

* boom

* Update src/transformers/models/speech_to_text/modeling_speech_to_text.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-03 16:01:56 +01:00
b693cbf99c The tests were not updated after the addition of torch.diag (#15890)
in the scoring (which is more correct)
2022-03-03 15:33:49 +01:00
3c4fbc616f Freeze FlaxWav2Vec2 Feature Encoder (#15873)
* Freeze FlaxWav2Vec2 Feature Encoder

* add to all module apply

* add backprop test
2022-03-03 14:17:13 +01:00
7b3bd1f21a Fix and improve REALM fine-tuning (#15297)
* Draft

* Add test

* Update src/transformers/models/realm/modeling_realm.py

* Apply suggestion

* Add block_mask

* Update

* Update

* Add block_embedding_to

* Remove no_grad

* Use AutoTokenizer

* Remove model.to overridding
2022-03-03 14:10:15 +01:00
439de3f7f9 [Fix link in pipeline doc] (#15906) 2022-03-03 07:43:13 -05:00
4cd7ed4b3b Fix a TF Vision Encoder Decoder test (#15896)
* send PyTorch inputs to the correct device

* Fix: TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-03-03 13:21:31 +01:00
39249c9589 Fix doc links in release utils (#15903) 2022-03-02 18:06:31 -05:00
3d2242869d Update delete-dev-doc job to match build-dev-doc (#15891)
* Update delete-dev-doc job to match build-dev-doc

* More debug info

* More debug info

* Stash if needed

* Remove the comment update

* Fix paths

* Wtf is going on..

* Fix git status test

* Try another way

* I don't understand what's happening

* Bash shell

* What's happening now...

* What's happening now...

* Try like this

* Back to trying to use bash

* And like that?

* Refine tests

* Stash after adding new files

* Stash after adding new files

* Proper commit sha and PR number

* Address review comments
2022-03-02 16:18:54 -05:00
89be34c36c Fix SegformerForImageClassification (#15895)
* Fix reshape

* Apply suggestion from code review

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-03-02 21:57:39 +01:00
130b987880 [XGLM] run sampling test on CPU to be deterministic (#15892)
* run sampling test on CPU to be deterministic

* input_ids on CPU
2022-03-02 17:55:49 +01:00
baab5e7cdf TF generate refactor - Sample (#15793)
* Add TF logits wrappers 

* Add sample method

* add tests for TF logit wrappers

* TF generate sample tests now run on CPU

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2022-03-02 16:13:54 +00:00
96ae92be8c [SegFormer] Add deprecation warning (#15889)
* Add deprecation warning

* Remove from docs and hide in kwargs

* Improve implementation

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-03-02 16:20:47 +01:00
8fd4731072 Fix Bug in FlaxWav2Vec2 Slow Test (#15887) 2022-03-02 16:02:26 +01:00
d83d22f578 Maskformer (#15682)
* maskformer

* conflicts

* conflicts

* minor fixes

* feature extractor test fix

refactor MaskFormerLoss following conversation

MaskFormer related types should not trigger a module time import error

missed one

removed all the types that are not used

update config mapping

minor updates in the doc

resolved conversation that doesn't need a discussion

minor changes

resolved conversations

fixed DetrDecoder

* minor changes

minor changes

fixed mdx file

test feature_extractor return types

functional losses -> classes

removed the return type test for the feature extractor

minor changes + style + quality

* conflicts?

* rebase master

* readme

* added missing files

* deleded poolformers test that where in the wrong palce

* CI

* minor changes

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* resolved conversations

* minor changes

* conversations

[Unispeech] Fix slow tests (#15818)

* remove soundfile old way of loading audio

* Adapt slow test

[Barthez Tokenizer] Fix saving (#15815)

[TFXLNet] Correct tf xlnet generate (#15822)

* [TFXLNet] Correct tf xlnet

* adapt test comment

Fix the push run (#15807)

Fix semantic segmentation pipeline test (#15826)

Fix dummy_inputs() to dummy_inputs in symbolic_trace doc (#15776)

Add model specific output classes to PoolFormer model docs (#15746)

* Added model specific output classes to poolformer docs

* Fixed Segformer typo in Poolformer docs

Adding the option to return_timestamps on pure CTC ASR models. (#15792)

* Adding the option to return_timestamps on pure CTC ASR models.

* Remove `math.prod` which was introduced in Python 3.8

* int are not floats.

* Reworking the PR to support "char" vs "word" output.

* Fixup!

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Quality.

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

HFTracer.trace should use/return self.graph to be compatible with torch.fx.Tracer (#15824)

Fix tf.concatenate + test past_key_values for TF models (#15774)

* fix wrong method name tf.concatenate

* add tests related to causal LM / decoder

* make style and quality

* clean-up

* Fix TFBertModel's extended_attention_mask when past_key_values is provided

* Fix tests

* fix copies

* More tf.int8 -> tf.int32 in TF test template

* clean-up

* Update TF test template

* revert the previous commit + update the TF test template

* Fix TF template extended_attention_mask when past_key_values is provided

* Fix some styles manually

* clean-up

* Fix ValueError: too many values to unpack in the test

* Fix more: too many values to unpack in the test

* Add a comment for extended_attention_mask when there is past_key_values

* Fix TFElectra extended_attention_mask when past_key_values is provided

* Add tests to other TF models

* Fix for TF Electra test: add prepare_config_and_inputs_for_decoder

* Fix not passing training arg to lm_head in TFRobertaForCausalLM

* Fix tests (with past) for TF Roberta

* add testing for pask_key_values for TFElectra model

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

[examples/summarization and translation] fix readme (#15833)

Add ONNX Runtime quantization for text classification notebook (#15817)

Re-enable doctests for the quicktour (#15828)

* Re-enable doctests for the quicktour

* Re-enable doctests for task_summary (#15830)

* Remove &

Framework split model report (#15825)

Add TFConvNextModel (#15750)

* feat: initial implementation of convnext in tensorflow.

* fix: sample code for the classification model.

* chore: added checked for  from the classification model.

* chore: set bias initializer in the classification head.

* chore: updated license terms.

* chore: removed ununsed imports

* feat: enabled  argument during using drop_path.

* chore: replaced tf.identity with layers.Activation(linear).

* chore: edited default checkpoint.

* fix: minor bugs in the initializations.

* partial-fix: tf model errors for loading pretrained pt weights.

* partial-fix: call method updated

* partial-fix: cross loading of weights (4x3 variables to be matched)

* chore: removed unneeded comment.

* removed playground.py

* rebasing

* rebasing and removing playground.py.

* fix: renaming TFConvNextStage conv and layer norm layers

* chore: added initializers and other minor additions.

* chore: added initializers and other minor additions.

* add: tests for convnext.

* fix: integration tester class.

* fix: issues mentioned in pr feedback (round 1).

* fix: how output_hidden_states arg is propoagated inside the network.

* feat: handling of  arg for pure cnn models.

* chore: added a note on equal contribution in model docs.

* rebasing

* rebasing and removing playground.py.

* feat: encapsulation for the convnext trunk.

* Fix variable naming; Test-related corrections; Run make fixup

* chore: added Joao as a contributor to convnext.

* rebasing

* rebasing and removing playground.py.

* rebasing

* rebasing and removing playground.py.

* chore: corrected copyright year and added comment on NHWC.

* chore: fixed the black version and ran formatting.

* chore: ran make style.

* chore: removed from_pt argument from test, ran make style.

* rebasing

* rebasing and removing playground.py.

* rebasing

* rebasing and removing playground.py.

* fix: tests in the convnext subclass, ran make style.

* rebasing

* rebasing and removing playground.py.

* rebasing

* rebasing and removing playground.py.

* chore: moved convnext test to the correct location

* fix: locations for the test file of convnext.

* fix: convnext tests.

* chore: applied  sgugger's suggestion for dealing w/ output_attentions.

* chore: added comments.

* chore: applied updated quality enviornment style.

* chore: applied formatting with quality enviornment.

* chore: revert to the previous tests/test_modeling_common.py.

* chore: revert to the original test_modeling_common.py

* chore: revert to previous states for test_modeling_tf_common.py and modeling_tf_utils.py

* fix: tests for convnext.

* chore: removed output_attentions argument from convnext config.

* chore: revert to the earlier tf utils.

* fix: output shapes of the hidden states

* chore: removed unnecessary comment

* chore: reverting to the right test_modeling_tf_common.py.

* Styling nits

Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>

* minor changes

* doc fix in feature extractor

* doc

* typose

* removed detr logic from config

* removed detr logic from config

* removed num_labels

* small fix in the config

* auxilary -> auxiliary

* make style

* some test is failing

* fix a weird char in config prevending doc-builder

* retry to fix the doc-builder issue

* make style

* new try to fix the doc builder

* CI

* change weights to facebook

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
2022-03-02 15:48:20 +01:00
e535c389aa Fix tiny typo (#15884) 2022-03-02 15:37:05 +01:00
2eb7bb15e7 Updates in Trainer to support new features in SM Model Parallel library (#15877)
* Create optimizer after model creation for SMP

* update dp_rank to rdp_rank for opt_state_dict

* update world_size and process_index for smp

* Address comments

* Lint fix

Co-authored-by: Cavdar <dcavdar@a07817b12d7e.ant.amazon.com>
2022-03-02 07:55:14 -05:00
05c237ea94 Update TF QA example (#15870) 2022-03-02 10:38:13 +00:00
6e57a56987 Adding timestamps for CTC with LM in ASR pipeline. (#15863)
* Adding timestamps for CTC with LM in ASR pipeline.

* iRemove print.

* Nit change.
2022-03-02 10:49:05 +01:00
8a133490bf Add TF generate sample tests with all logit processors (#15852)
* Add GPT2 TF generate sample test with all logits processor

* Add T5 generate sample test
2022-03-02 09:48:11 +00:00
40040727ab [Bart] Fix implementation note doc (#15879) 2022-03-02 10:24:32 +01:00
4bfe75bd08 M2M100 support for ONNX export (#15193)
* Add M2M100 support for ONNX export

* Delete useless imports

* Add M2M100 to tests

* Fix protobuf issue
2022-03-02 10:03:14 +01:00
d1a29078c0 Remove stash for now (#15882) 2022-03-01 22:36:19 -05:00
b842d7277a fix deepspeed tests (#15881)
* fix deepspeed tests

* style

* more fixes
2022-03-01 19:27:28 -08:00
6ccfa2170c Inference for multilingual models (#15836)
* 📝 first draft for multilingual models

* 🖍 make style
2022-03-01 15:10:31 -06:00
26426923b7 No self-hosted runner for dev documentation (#15710) 2022-03-01 14:05:54 -05:00
00eaffc81f Bump up doc node version to 16 (#15874) 2022-03-01 18:37:57 +01:00
afca0d5192 use python 3.7 for flax self-push tests (#15865)
* set python 3.7 for flax tests

* setup-python@v2

* python-dev

* install -y

* python3-dev

* install kenlm from source

* install cython

* cd to kenlm

* kenlm install

* don't install kenlm

* change flax pretrained to run flax tests

* cleanup

* remove python-dev
2022-03-01 18:26:30 +01:00
286fdc6b3c [vision] Add problem_type support (#15851)
* Add problem_type to missing models

* Fix deit test

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-03-01 18:09:52 +01:00
7ff9d450cd Scatter should run on CUDA (#15872) 2022-03-01 11:47:17 -05:00
c008afea3c Add link to notebooks (#15791)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-03-01 17:44:20 +01:00
e064f08150 Add time stamps for wav2vec2 with lm (#15854)
* [Wav2Vec2 With LM] add timestamps

* correct

* correct

* Apply suggestions from code review

* correct

* Update src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py

* make style

* Update src/transformers/models/wav2vec2_with_lm/processing_wav2vec2_with_lm.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* make style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-01 17:03:05 +01:00
3f2e636850 Update TF LM examples (#15855) 2022-03-01 14:12:58 +00:00
54f0db4066 Add PT + TF automatic builds (#15860)
* Add PT + TF automatic builds

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Wrap up

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-03-01 08:55:11 -05:00
9863f7d228 [Benchmark tools] Deprecate all (#15848)
* [Benchmark tools] Deprecate all

* up
2022-03-01 11:26:20 +01:00
df5a4094a6 Add Data2Vec (#15507)
* Add data2vec model cloned from roberta

* Add checkpoint conversion script

* Fix copies

* Update docs

* Add checkpoint conversion script

* Remove fairseq data2vec_text script and fix format

* Add comment on where to get data2vec_text.py

* Remove mock implementation cheat.py and fix style

* Fix copies

* Remove TF and Flax classes from init

* Add back copy from fairseq data2vec_text.py and fix style

* Update model name in docs/source/index.mdx to be CamelCase

* Revert model name in table to lower-case to get check_table test to pass

* Update src/transformers/models/data2vec/__init__.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update docs/source/model_doc/data2vec.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/data2vec.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/auto/configuration_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/configuration_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/configuration_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update documentation

* Copy-paste Data2VecConfig from BertConfig

* Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency

* Update config special tokens to match RoBERTa

* Split multiple assertions and add individual error messages

* Rename Data2VecModel to Data2VecForTextModel

* Add Data2Vec to _toctree.yml

* Rename Data2VecEmbeddings to Data2VecForTextEmbeddings

* Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding).

* finish audio model

* finish audio file

* Update names and fix style, quality and repo consistency

* Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files.

* add inputs to logits to data2vec'

* correct autio models

* correct config auto

* correct tok auto

* Update utils/tests_fetcher.py

* delete unnecessary files

* delete unnecessary files

* further renaming

* make all tests pass

* finish

* remove useless test file

* Update tests/test_modeling_common.py

* Update utils/check_repo.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec_text.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix copies

* Update docs

* Remove fairseq data2vec_text script and fix format

* Add comment on where to get data2vec_text.py

* Remove mock implementation cheat.py and fix style

* Fix copies

* Remove TF and Flax classes from init

* Add back copy from fairseq data2vec_text.py and fix style

* Update model name in docs/source/index.mdx to be CamelCase

* Revert model name in table to lower-case to get check_table test to pass

* Update documentation

* Update src/transformers/models/data2vec/__init__.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/auto/configuration_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/configuration_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/configuration_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/data2vec/modeling_data2vec.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Copy-paste Data2VecConfig from BertConfig

* Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency

* Update config special tokens to match RoBERTa

* Split multiple assertions and add individual error messages

* Rename Data2VecModel to Data2VecForTextModel

* Add Data2Vec to _toctree.yml

* Rename Data2VecEmbeddings to Data2VecForTextEmbeddings

* Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding).

* finish audio model

* finish audio file

* add inputs to logits to data2vec'

* Update names and fix style, quality and repo consistency

* Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files.

* correct autio models

* correct config auto

* correct tok auto

* delete unnecessary files

* delete unnecessary files

* Update utils/tests_fetcher.py

* further renaming

* make all tests pass

* finish

* remove useless test file

* Update tests/test_modeling_common.py

* Update utils/check_repo.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/data2vec/modeling_data2vec_text.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Move data2vec tests to new structure

* Fix test imports for text tests

* Remove fairseq files

* Change paper link to arxiv

* Modify Data2Vec documentation to reflect that the encoder is not shared across the audio and text models in the current implementation.

* Update text model checkpoint to be facebook/data2vec-text-base

* Add 'Copy from' statements and update paper links and docs

* fix copy from statements

* improve copied from

* correct more copied from statements

* finish copied from stuff

* make style

* add model to README

* add to master

Co-authored-by: Eduardo Gonzalez Ponferrada <eduardo@ferrumhealth.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-03-01 11:09:20 +01:00
ddbb485c41 [TF-PT-Tests] Fix PyTorch - TF tests for different GPU devices (#15846) 2022-02-28 15:46:46 -05:00
97f9b8a27b Fixing the timestamps with chunking. (#15843)
* Fixing the timestamps with chunking.

* The changes modified (and fixed) the striding tests.

* Adding a tokenizer test.

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Defense -> comment.

* Update src/transformers/models/wav2vec2/tokenization_wav2vec2.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-02-28 21:00:21 +01:00
410e26c7ad Fix (deprecated) ONNX exporter to account for new tf2onnx API (#15856)
* Fix (deprecated) ONNX exporter to account for new tf2onnx API
2022-02-28 20:17:44 +01:00
e3342edc4e Flax Speech-Encoder-Decoder Model (#15613)
* rebase

* Delete shift tokens func

* downsample decoder input seq len for init

* correct attention mask

* add tests

* pt flax cross test

* make fixup

* init file for import

* change pt-flax cross test threshold

* pt-flax test logits only

* move tests

* make repo-consistency

* consistent indentation

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-02-28 12:22:36 +01:00
935a76d90d [UniSpeechSat] correct unispeech sat (#15847) 2022-02-28 11:23:13 +01:00
84eaa6acf5 Add TFConvNextModel (#15750)
* feat: initial implementation of convnext in tensorflow.

* fix: sample code for the classification model.

* chore: added checked for  from the classification model.

* chore: set bias initializer in the classification head.

* chore: updated license terms.

* chore: removed ununsed imports

* feat: enabled  argument during using drop_path.

* chore: replaced tf.identity with layers.Activation(linear).

* chore: edited default checkpoint.

* fix: minor bugs in the initializations.

* partial-fix: tf model errors for loading pretrained pt weights.

* partial-fix: call method updated

* partial-fix: cross loading of weights (4x3 variables to be matched)

* chore: removed unneeded comment.

* removed playground.py

* rebasing

* rebasing and removing playground.py.

* fix: renaming TFConvNextStage conv and layer norm layers

* chore: added initializers and other minor additions.

* chore: added initializers and other minor additions.

* add: tests for convnext.

* fix: integration tester class.

* fix: issues mentioned in pr feedback (round 1).

* fix: how output_hidden_states arg is propoagated inside the network.

* feat: handling of  arg for pure cnn models.

* chore: added a note on equal contribution in model docs.

* rebasing

* rebasing and removing playground.py.

* feat: encapsulation for the convnext trunk.

* Fix variable naming; Test-related corrections; Run make fixup

* chore: added Joao as a contributor to convnext.

* rebasing

* rebasing and removing playground.py.

* rebasing

* rebasing and removing playground.py.

* chore: corrected copyright year and added comment on NHWC.

* chore: fixed the black version and ran formatting.

* chore: ran make style.

* chore: removed from_pt argument from test, ran make style.

* rebasing

* rebasing and removing playground.py.

* rebasing

* rebasing and removing playground.py.

* fix: tests in the convnext subclass, ran make style.

* rebasing

* rebasing and removing playground.py.

* rebasing

* rebasing and removing playground.py.

* chore: moved convnext test to the correct location

* fix: locations for the test file of convnext.

* fix: convnext tests.

* chore: applied  sgugger's suggestion for dealing w/ output_attentions.

* chore: added comments.

* chore: applied updated quality enviornment style.

* chore: applied formatting with quality enviornment.

* chore: revert to the previous tests/test_modeling_common.py.

* chore: revert to the original test_modeling_common.py

* chore: revert to previous states for test_modeling_tf_common.py and modeling_tf_utils.py

* fix: tests for convnext.

* chore: removed output_attentions argument from convnext config.

* chore: revert to the earlier tf utils.

* fix: output shapes of the hidden states

* chore: removed unnecessary comment

* chore: reverting to the right test_modeling_tf_common.py.

* Styling nits

Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
2022-02-25 18:19:16 +01:00
0b5bf6abef Framework split model report (#15825) 2022-02-25 12:00:00 -05:00
0118c4f6a8 Re-enable doctests for the quicktour (#15828)
* Re-enable doctests for the quicktour

* Re-enable doctests for task_summary (#15830)

* Remove &
2022-02-25 17:46:38 +01:00
fd5b05eb81 Add ONNX Runtime quantization for text classification notebook (#15817) 2022-02-25 11:29:35 -05:00
bf1fe32824 [examples/summarization and translation] fix readme (#15833) 2022-02-25 17:28:16 +01:00
8635407bc7 Fix tf.concatenate + test past_key_values for TF models (#15774)
* fix wrong method name tf.concatenate

* add tests related to causal LM / decoder

* make style and quality

* clean-up

* Fix TFBertModel's extended_attention_mask when past_key_values is provided

* Fix tests

* fix copies

* More tf.int8 -> tf.int32 in TF test template

* clean-up

* Update TF test template

* revert the previous commit + update the TF test template

* Fix TF template extended_attention_mask when past_key_values is provided

* Fix some styles manually

* clean-up

* Fix ValueError: too many values to unpack in the test

* Fix more: too many values to unpack in the test

* Add a comment for extended_attention_mask when there is past_key_values

* Fix TFElectra extended_attention_mask when past_key_values is provided

* Add tests to other TF models

* Fix for TF Electra test: add prepare_config_and_inputs_for_decoder

* Fix not passing training arg to lm_head in TFRobertaForCausalLM

* Fix tests (with past) for TF Roberta

* add testing for pask_key_values for TFElectra model

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-25 17:11:46 +01:00
4818bf7aed HFTracer.trace should use/return self.graph to be compatible with torch.fx.Tracer (#15824) 2022-02-25 15:54:45 +01:00
ad0d7d1745 Adding the option to return_timestamps on pure CTC ASR models. (#15792)
* Adding the option to return_timestamps on pure CTC ASR models.

* Remove `math.prod` which was introduced in Python 3.8

* int are not floats.

* Reworking the PR to support "char" vs "word" output.

* Fixup!

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Quality.

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-02-25 14:06:45 +01:00
7566734d6f Add model specific output classes to PoolFormer model docs (#15746)
* Added model specific output classes to poolformer docs

* Fixed Segformer typo in Poolformer docs
2022-02-25 13:43:56 +01:00
7963578fc5 Fix dummy_inputs() to dummy_inputs in symbolic_trace doc (#15776) 2022-02-25 11:32:23 +01:00
074645e32a Fix semantic segmentation pipeline test (#15826) 2022-02-25 09:21:29 +01:00
b7e292aebd Fix the push run (#15807) 2022-02-24 19:30:17 +01:00
cbf4391177 [TFXLNet] Correct tf xlnet generate (#15822)
* [TFXLNet] Correct tf xlnet

* adapt test comment
2022-02-24 19:23:34 +01:00
2f0f9038e2 [Barthez Tokenizer] Fix saving (#15815) 2022-02-24 19:09:09 +01:00
ca57b45071 [Unispeech] Fix slow tests (#15818)
* remove soundfile old way of loading audio

* Adapt slow test
2022-02-24 19:08:54 +01:00
35ecf99cc4 Revert changes in logit size for semantic segmentation models (#15722)
* Revert changes in logit size for semantic segmentation models

* Address review comments
2022-02-24 15:52:52 +01:00
d1fcc90abf Fix from_pretrained with default base_model_prefix (#15814) 2022-02-24 11:43:51 +01:00
7f921bcf47 Fix add-new-model-like when old model checkpoint is not found (#15805)
* Fix add-new-model-like command when old checkpoint can't be recovered

* Style
2022-02-24 08:58:18 +01:00
bb7949b35a Fix model templates (#15806)
* Fix model templates

* Update paths
2022-02-23 18:27:29 -05:00
309e87e25e Docker images should only run on a daily basis 2022-02-23 18:01:44 -05:00
c475f3ce2d Scheduled tests should only run on a daily basis 2022-02-23 17:52:22 -05:00
6336017c15 Fix build_documentation CI (#15803) 2022-02-23 21:53:51 +01:00
a0e3480699 [Test refactor 5/5] Build docker images (#15729) 2022-02-23 15:48:19 -05:00
4c737f0e40 [Test refactor 4/5] Improve the scheduled tests (#15728) 2022-02-23 15:48:05 -05:00
d3ae2bd3cf [Test refactor 3/5] Notification service improvement (#15727)
* Per-folder tests reorganization

* Review comments

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2022-02-23 15:46:59 -05:00
0400b2263d [Test refactor 2/5] Tests fetcher (#15726)
* Tests fetcher

* Review comments

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Review comments
2022-02-23 15:46:37 -05:00
29c10a41d0 [Test refactor 1/5] Per-folder tests reorganization (#15725)
* Per-folder tests reorganization

Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2022-02-23 15:46:28 -05:00
fecb08c2b8 🧼 NLP task guides (#15731)
* clean commit of changes to NLP tasks

* 🖍 apply feedback

* 📝 move tf data collator in multiple choice

Co-authored-by: Steven <stevhliu@gmail.com>
2022-02-23 13:58:33 -06:00
86636f52a9 Fix indent in doc-builder CI (#15798) 2022-02-23 20:01:33 +01:00
a1efc82362 HTML dev docs (#15678)
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>
2022-02-23 19:43:22 +01:00
lsb
3f76bf54ff Align documentation with code defaults (#15468)
In the code, `do_normalize` defaults to True
2022-02-23 18:39:41 +01:00
32f5de10a0 [doc] custom_models: mention security features of the Hub (#15768)
* custom_models: tiny doc addition

* mention security feature earlier in the section

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2022-02-23 11:40:06 -05:00
9e71d46455 Enable image-segmentation on AutoModelForSemanticSegmentation (#15647)
* Enabling Beit SegFormer to `image-segmentation`.

* Fixing the score.

* Fix import ?

* Missing in type hint.

* Multiple test fixes:

- Add `raw_image` support. It should be the default IMHO since in Python
  world it doesn't make any sense to base64 encode the image (Sorry
  @mishig, didn't catch that in my review). I really think we should
  consider breaking BC here.
- Add support for Segformer tiny test (needed
  `SegformerModelTester.get_config` to enable TinyConfig
  @NielsRogge)
- Add the check that `batch_size` works correctly on that pipeline.
  Uncovered that it doesn't for Detr, which IMO is OK since images
  after `feature_extractor` don't have the same size. Comment should
  explain.

* Type hint as a string.

* Make fixup + update black.

* torch+vision protections.

* Don't use torchvision, use F.interpolate instead (no new dep).

* Last fixes for Segformer.

* Update test to reflect new image (which was broken)

* Update tests.

* Major BC modification:

- Removed the string compressed PNG string, that's a job for users
`transformers` stays in python land.
- Removed the `score` for semantic segmentation. It has hardly a meaning
  on its own in this context.
- Don't include the grayscale with logits for now (which could enable
  users to get a sense of confidence). Might be done later.
- Don't include the surface of the mask (could be used for sorting by
  users, to filter out small masks). It's already calculable, and
  it's easier to add later, than to add now and break later if we need.

* `make fixup`.

* Small changes.

* Rebase + doc fixup.
2022-02-23 17:20:26 +01:00
1b23979736 [ViLT] Fix checkpoint url in config (#15790)
* [ViLT] Fix checkpoint url in config

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-02-23 14:51:40 +01:00
de737866f2 [CLIP] fix grad ckpt (#15789) 2022-02-23 14:30:05 +01:00
a3e607d19e Supporting Merges.txt files than contain an endline. (#15782)
(`hf-internal-testing/tiny-clip` for instance)
2022-02-23 11:51:48 +01:00
24588c6731 [M2M100, XGLM] fix create_position_ids_from_inputs_embeds (#15751) 2022-02-23 10:46:42 +01:00
f9582c205a Adding ZeroShotImageClassificationPipeline (#12119)
* [Proposal] Adding ZeroShotImageClassificationPipeline

- Based on CLIP

* WIP, Resurection in progress.

* Resurrection... achieved.

* Reword handling different `padding_value` for `feature_extractor` and
`tokenizer`.

* Thanks doc-builder !

* Adding docs + global namespace `ZeroShotImageClassificationPipeline`.

* Fixing templates.

* Make the test pass and be robust to floating error.

* Adressing suraj's comments on docs mostly.

* Tf support start.

* TF support.

* Update src/transformers/pipelines/zero_shot_image_classification.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

Co-authored-by: Suraj Patil <surajp815@gmail.com>
2022-02-23 09:41:42 +01:00
05a12a090d Fix HfArgumentParser when passing a generator (#15758)
* Fix `HfArgumentParser` when passing a generator

* Add missing import

* Always convert `dataclass_types` into a list
2022-02-23 00:16:38 +01:00
db57bb2b71 Cleanup transformers-cli (#15767) 2022-02-22 15:58:05 -05:00
3db2e8f92b Fix typo on examples/pytorch/question-answering (#15644)
cna -> can
2022-02-22 13:51:07 -05:00
2cdb6dbee5 fixed pipeline code (#15607)
Co-authored-by: Boumadane Abdelmoumene <moumene.boumadane@gmail.com>
2022-02-22 13:46:21 -05:00
c44d3675c2 Time stamps for CTC models (#15687)
* [Wav2Vec2 Time Stamps]

* Add first version

* add word time stamps

* Fix

* save intermediate space

* improve

* [Finish CTC Tokenizer]

* remove @

* remove @

* push

* continue with phonemes

* up

* finish PR

* up

* add example

* rename

* finish

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* correct split

* finalize

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-02-22 19:26:44 +01:00
32295b15a1 Gelu10 (#15676)
* Add GeLU10 (clipped version of GeLU) to transformers to improve quantization performances.

* Add unittests.

* Import tensorflow after `is_tf_available` check.

* Fix tensorflow wrong function `tf.tensor` to `tf.constant`

* style.

* use `tf.math.max`

* Fix tf tests.

* style.

* style style style style style style

* style style style style style style

* Address @sgugger comments.

* Fix wrong operator for raising ValueError for ClippedGELUActivation.
2022-02-22 18:21:16 +01:00
2c3fcc647a TF train_step docstring (#15755)
* TF train_step docstring
2022-02-22 11:18:35 +00:00
38bed912e3 added link to our writing-doc document (#15756) 2022-02-22 09:57:28 +01:00
0187c6f0ad revert temporary addition to test next version of CLIPTokenizerFast (#15717) 2022-02-21 18:30:11 +01:00
3956b133b6 TF text classification examples (#15704)
* Working example with to_tf_dataset

* updated text_classification

* more comments
2022-02-21 17:17:59 +00:00
142b69f24b Add layer_idx to CrossAttention of GPT2 model (#15730)
* Add layer_idx to CrossAttention

* Add layer_idx to crossattention of ImageGPT model
2022-02-21 17:31:39 +01:00
86119c1154 add VisionTextDualEncoder and CLIP fine-tuning script (#15701)
* begin script

* update script

* fix features and data args

* main

* add requirements

* add column name args

* fix captions

* don't jit transforms

* fix caption

* fix labels, handle attention mask

* convert pixel values to numpy

* labels => input_ids

* transform images on the fly

* use AutoModel class, create the hybird model outside of the script

* fix version message

* add readme

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* adderss review comments

* add more comments

* allow freezing vision and text models

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-02-21 16:10:59 +01:00
5444687f0f Fix minor comment typos (#15740) 2022-02-21 12:41:27 +01:00
a63bd3675f Remove input and target reset after preprocessing (#15741)
Remove input and target reset after preprocessing
2022-02-21 11:10:15 +01:00
2c2a31ffbc Add missing PLBart entry in README (#15721)
* Add missing PLBart entry in index

* Fix README

* Fix README

* Fix style

* Change to master model doc
2022-02-18 21:11:42 +01:00
60ba48205e fix bug in PT speech-encoder-decoder (#15699)
* fix bug in PT speech-encoder-decoder

* add pt test for `inputs is not None`

* fix test

* new pt test

* Update tests/test_modeling_speech_encoder_decoder.py

* make fixup

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-02-18 18:20:24 +01:00
3de12906c8 fix: hfdeepspeed config argument (#15711)
`HfDeepSpeedConfig` accepts a dictionary or path to `.json` file containing DS configurations, not `TrainingArguments`.
2022-02-18 12:00:02 -05:00
83f45cd656 Fix auto (#15706) 2022-02-18 08:50:23 -05:00
d5083c333f style_doc handles decorators in examples (#15719) 2022-02-18 14:49:53 +01:00
ae1f835028 Add PLBart (#13269)
* Init PLBART

* Add missing configuration file

* Add conversion script and configurationf ile

* Fix style

* Update modeling and conversion scripts

* Fix scale embedding in config

* Add comment

* Fix conversion script

* Add classification option to conversion script

* Fix vocab size in config doc

* Add tokenizer files from MBart50

* Allow no lang code in regular tokenizer

* Add PLBart Tokenizer Converters

* Remove mask from multi tokenizer

* Remove mask from multi tokenizer

* Change from MBart-50 to MBart tokenizer

* Fix names and modify src/tgt behavior

* Fix imports for tokenizer

* Remove <mask> from multi tokenizer

* Fix style

* Change tokenizer_class to processor_class

* Add attribute map to config class

* Update modeling file to modified MBart code

* Update configuration file to MBart style configuration

* Fix tokenizer

* Separate tokenizers

* Fix error in tokenization auto

* Copy MBart tests

* Replace with MBart tokenization tests

* Fix style

* Fix language code in multi tokenizer

* Fix configuration docs

* Add entry for plbart_multi in transformers init

* Add dummy objects and fix imports

* Fix modeling tests

* Add TODO in config

* Fix copyright year

* Fix modeling docs and test

* Fix some tokenization tests and style

* Add changes from review

* Fix copies

* Fix docs

* Fix docs

* Fix style

* Fix year

* Add changes from review

* Remove extra changes

* Fix base tokenizer and doc

* Fix style

* Fix modeling and slow tokenizer tests

* Remove Multi-tokenizer Converter and Tests

* Delete QA model and Multi Tokenizer dummy objects

* Fix repo consistency and code quality issues

* Fix example documentation

* Fix style

* Remove PLBartTokenizer from type checking in init

* Fix consistency issue

* Add changes from review

* Fix style

* Remove PLBartTokenizerFast

* Remove FastTokenizer converter

* Fix AutoTokenzier mapping

* Add plbart to toctree and fix consistency issues

* Add language codes tokenizer test

* Fix styling and doc issues

* Add fixes for failing tests

* Fix copies

* Fix failing modeling test

* Change assert to assertTrue in modeling tests
2022-02-18 14:17:09 +01:00
2f2fefd6af Fix LongformerModel hidden states (#15537)
* add undo padding

* fix

* fix tuple issue

* make style and quality

* move unpad logic to LongformerEncoder + unpad attentions + update tests

* move unpad logic to TFLongformerEncoder

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-18 13:56:53 +01:00
68dec6bffd Fix DETR model deprecation warnings for int div (#15702) 2022-02-18 15:14:44 +03:00
f8ff3fad87 TF: add initializer_std with a small value in TFFunnelModelTester (#15684) 2022-02-18 11:20:07 +00:00
416dff736c Fix SiluActivation (#15718) 2022-02-18 11:57:39 +01:00
e93763d420 fix CLIP fast tokenizer and change some properties of the slow version (#15067)
Very big changes concerning the tokenizer fast of CLIP which did not correspond to the tokenizer slow of CLIP

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-02-18 10:21:30 +01:00
240cc6cbdc Adding a model, more doc for pushing to the hub (#15690)
* doc for adding a model to the hub

* run make style

* resolved conversation

* removed a line

* removed )

* Update docs/source/add_new_model.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/add_new_model.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make style

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-02-18 09:11:18 +01:00
57882177be Add SimMIM (#15586)
* Add first draft

* Make model importable

* Make SwinForMaskedImageModeling importable

* Fix imports

* Add missing inits

* Add support for Swin

* Fix bug

* Fix bug

* Fix another bug

* Fix Swin MIM implementation

* Fix default encoder stride

* Fix Swin

* Add print statements for debugging

* Add image_size data argument

* Fix Swin

* Fix image_size

* Add print statements for debugging

* Fix print statement

* Remove print statements

* Improve reshaping of bool_masked_pos

* Add support for DeiT, fix tests

* Improve docstrings

* Apply new black version

* Improve script

* Fix bug

* Improve README

* Apply suggestions from code review

* Remove DS_Store and add to gitignore

* Apply suggestions from code review + fix BEiT Flax

* Revert BEiT changes

* Improve README

* Fix code quality

* Improve README

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-02-17 19:44:55 +01:00
426b96230a Fix shapes in model docstrings (#15696) 2022-02-17 08:42:14 -05:00
92a537d938 Minor fix on README.md (#15688)
* fix README

* fix more arxiv links

* make fix-copies

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-17 08:38:32 -05:00
f84e0dbd2a Add PoolFormer (#15531)
* Added all files, PoolFormerFeatureExtractor still failing tests

* Fixed PoolFormerFeatureExtractor not being able to import

* Completed Poolformer doc

* Applied Suggested fixes

* Fixed errors in modeling_auto.py

* Fix feature extractor, convert docs to Markdown, styling of code

* Remove PoolFormer from check_repo and fix integration test

* Remove Poolformer from check_repo

* Fixed configuration_poolformer.py docs and removed inference.py from poolformer

* Ran with black v22

* Added PoolFormer to _toctree.yml

* Updated poolformer doc

* Applied suggested fixes and added on README.md

* Did make fixup and make fix-copies, tests should pass now

* Changed PoolFormer weights conversion script name and fixed README

* Applied fixes in test_modeling_poolformer.py and modeling_poolformer.py

* Added PoolFormerFeatureExtractor to AutoFeatureExtractor API

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
2022-02-17 13:16:37 +01:00
0e91f885c3 Add image classification notebook (#15667)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
2022-02-17 13:14:01 +01:00
f65fe3663a Implementation of activations as pytorch modules (#15616)
* Implement activations as pytorch modules

* Apply fixup

* Add missing tests for activations

* Update docstring

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-02-16 14:37:52 -05:00
66828a19b1 Fix Funnel configuration doc (#15686)
* fix doc

* make style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-16 11:50:36 -05:00
3a4376d008 [Wav2Vec2ProcessorWithLM] Fix auto processor with lm (#15683) 2022-02-16 17:33:33 +01:00
cdc51ffd27 Add register method to AutoProcessor (#15669)
* Add push_to_hub method to processors

* Fix test

* The other one too!

* Add register method to AutoProcessor

* Update src/transformers/models/auto/processing_auto.py

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2022-02-16 09:13:33 -05:00
bc3379e12c 🔥 Remove build_doc_test github action (#15680) 2022-02-16 14:06:26 +01:00
d4692ad161 Fix dec_attn_mask in TFTransfoXLMainLayer (#15665)
* fix attn

* clean-up

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-16 11:53:26 +00:00
b87c044c79 Usage examples for logger (#15657)
* logger

* Update docs/source/main_classes/logging.mdx

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Update docs/source/main_classes/logging.mdx

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2022-02-16 10:15:13 +01:00
2d02f7b29b Add push_to_hub method to processors (#15668)
* Add push_to_hub method to processors

* Fix test

* The other one too!
2022-02-15 21:14:04 -05:00
bee361c6f1 [t5/t0/mt5 models] faster/leaner custom layer norm (#14656)
* [t5] faster/leaner custom layer norm

* wip

* apex.normalization.FusedRMSNorm

* cleanup

* cleanup

* add doc

* add catch all

* Trigger CI

* expand
2022-02-15 16:49:57 -08:00
e3d1a8dabc Add a missing space in a deprecation message (#15651) 2022-02-15 19:12:30 -05:00
1ddf3c2b74 Fix vit test (#15671) 2022-02-15 18:55:38 -05:00
943e2aa036 Fix model equivalence tests (#15670)
* Fix model equivalence tests

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-02-15 18:55:22 -05:00
1690319217 Fix TFSequenceSummary's activation (#15643)
* fix TFSequenceSummary

* fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-15 19:15:42 +00:00
faf4ff5974 [pipeline doc] fix api (#15660)
* [pipeline doc] fix api

* remove duplicate
2022-02-15 10:13:08 -08:00
2e12b907ae TF generate refactor - Greedy Search (#15562)
* TF generate start refactor

* Add tf tests for sample generate

* re-organize

* boom boom

* Apply suggestions from code review

* re-add

* add all code

* make random greedy pass

* make encoder-decoder random work

* further improvements

* delete bogus file

* make gpt2 and t5 tests work

* finish logits tests

* correct logits processors

* correct past / encoder_outputs drama

* refactor some methods

* another fix

* refactor shape_list

* fix more shape list

* import shape
_list

* finish docs

* fix imports

* make style

* correct tf utils

* Fix TFRag as well

* Apply Lysandre's and Sylvais suggestions

* Update tests/test_generation_tf_logits_process.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Update src/transformers/tf_utils.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* remove cpu according to gante

* correct logit processor

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2022-02-15 17:54:43 +01:00
a3dbbc3467 Add decoder_kwargs to send to LM on asr pipeline. (#15646)
Co-authored-by: Giuseppe Attanasio <giuseppeattanasio6@gmail.com>

Co-authored-by: Giuseppe Attanasio <giuseppeattanasio6@gmail.com>
2022-02-15 17:53:24 +01:00
cdf19c501d Re-export KeyDataset. (#15645)
* Re-export `KeyDataset`.

* Update the docs locations.
2022-02-15 17:49:38 +01:00
28e6155d8a add a network debug script and document it (#15652)
* add a network debug script and document it

* doc
2022-02-15 08:48:00 -08:00
5d8be090e0 Fix quality 2022-02-15 11:32:26 -05:00
f45ac11fb3 Add section about doc testing (#15659)
* Add doctesting section

* Improve

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-02-15 16:56:31 +01:00
80f1a59168 updated with latest PL and Ray (#15653) 2022-02-15 16:53:05 +01:00
7bc4a01cb5 Update bad_words_ids usage (#15641)
* Improve the parameter `bad_word_ids' usage

* Update the bad_words_ids strategy
2022-02-15 16:44:34 +01:00
67047b86ce add scores to Wav2Vec2WithLMOutput (#15413)
* add scores to Wav2Vec2WithLMOutput

* style fixup
2022-02-15 16:40:50 +01:00
45f56580a7 Allow custom code for Processors (#15649)
* Allow custom code for Processors

* Add more test

* Test all auto_map configs are properly set
2022-02-15 09:44:35 -05:00
86a7845c0c Fix typo in speech2text2 doc (#15617)
Forward looks for inputs, not input_ids
2022-02-15 13:54:34 +01:00
9eb7e9ba1d Fix ASR pipelines from local directories with wav2vec models that have language models attached (#15590)
* Fix loading pipelines with wav2vec models with lm when in local paths

* Adding tests

* Fix test

* Adding tests

* Flake8 fixes

* Removing conflict files :(

* Adding task type to test

* Remove unnecessary test and imports
2022-02-15 13:45:08 +01:00
e1cbc073bf Require tokenizers>=0.11.1 (#15266)
`tokenizers` version that supports the feature to choose the direction of truncation
2022-02-15 11:46:12 +01:00
fra
05a8580964 Revert "logger doc"
This reverts commit 41168a49ce61685ac5c9c38cd5b88fd883c0d811.
2022-02-15 10:46:45 +01:00
fra
41168a49ce logger doc 2022-02-15 10:03:28 +01:00
041fdc4a7e [SpeechEncoderDecoder] Make sure no EOS is generated in test (#15655) 2022-02-15 09:13:55 +01:00
e314c19a3f fix bug for the log of RNG states are not properly loaded exception. (#15638)
Co-authored-by: muz <muzhi1991@limuzhideMBP-2.lan>
2022-02-14 20:30:55 -05:00
2e11a04337 Register feature extractor (#15634)
* Rework AutoFeatureExtractor.from_pretrained internal

* Custom feature extractor

* Add more tests

* Add support for custom feature extractor code

* Clean up

* Add register API to AutoFeatureExtractor
2022-02-14 13:35:16 -05:00
0f71c29053 Remove redundant error logging in from_pretrained() method (#15631)
* Remove error logging in from_pretrained() method
2022-02-14 18:03:07 +01:00
b090b79022 Make Swin work with VisionEncoderDecoderModel (#15527)
* Add attribute_map

* Add mention in docs

* Set hidden_size attribute correctly

* Add note about Transformer-based models only

Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
2022-02-14 17:33:35 +01:00
ec15da2445 Report only the failed imports in requires_backends (#15636) 2022-02-14 10:35:20 -05:00
2b8599b2df Fix a bug that ignores max_seq_len in preprocess (#15238) 2022-02-14 13:18:40 +01:00
f52746d004 [Fix doc example] FlaxVisionEncoderDecoder (#15626)
* Fix wrong checkpoint name: vit

* Fix missing import

* Fix more missing import

* make style

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-02-14 12:48:23 +01:00
52d2e6f6e9 Add push to hub to feature extractor (#15632)
* Add push to hub to feature extractor

* Quality

* Clean up
2022-02-11 17:14:01 -05:00
4f403ea899 Fix grammar in tokenizer_summary (#15614)
"to make ensure" is redundant.
2022-02-11 16:51:30 -05:00
7a32e4722f Custom feature extractor (#15630)
* Rework AutoFeatureExtractor.from_pretrained internal

* Custom feature extractor

* Add more tests

* Add support for custom feature extractor code

* Clean up
2022-02-11 16:43:54 -05:00
fcb0f74397 [research_projects] deal with security alerts (#15594)
* [research_projects] deal with security alerts

* add a note of the original PL ver and warning
2022-02-11 14:31:09 -05:00
f15c99fabf [deepspeed docs] misc additions (#15585)
* [deepspeed docs] round_robin_gradients

* training and/or eval/predict loss is

* Update docs/source/main_classes/deepspeed.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-02-11 10:54:04 -08:00
2dce350b33 Fix _configuration_file argument getting passed to model (#15629) 2022-02-11 13:46:08 -05:00
85aee09e9a 🖍 remove broken link (#15615) 2022-02-11 12:33:55 -06:00
2f40c728c9 TF MT5 embeddings resize (#15567)
* Fix TF MT5 vocab resize

* more assertive testing
2022-02-11 17:35:10 +00:00
8c03df1010 Rebase (#15606) 2022-02-11 12:02:02 -05:00
3fae83d23a TF: Add informative warning for inexistent CPU backprop ops (#15612)
* Add informative warning
2022-02-11 16:16:26 +00:00
7e4844fc2a Enable ONNX export when PyTorch and TensorFlow installed in the same environment (#15625) 2022-02-11 16:25:06 +01:00
6cf06d198c Mark "code in the Hub" API as experimental (#15624) 2022-02-11 09:55:31 -05:00
45c7b5b1c7 [Generate] Small refactor (#15611) 2022-02-10 18:29:27 +01:00
c0864d98ba Correct JSON format (#15600) 2022-02-10 09:02:03 -08:00
2e8b85f72e Add local and TensorFlow ONNX export examples to docs (#15604)
* Add local and TensorFlow ONNX export examples to docs

* Use PyTorch - TensorFlow split
2022-02-10 16:31:00 +01:00
3a2ed96714 Fix Seq2SeqTrainer (#15603)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
2022-02-10 16:26:14 +01:00
724e51c6e6 Compute loss independent from decoder for TF EncDec models (as #14139) (#15175)
* Compute loss independent from decoder (as 14139)

* fix expected seq_len + style

* Apply the same change to TFVisionEncoderDecoderModel

* fix style

* Add case with labels in equivalence test

* uncomment

* Add case with labels in equivalence test

* add decoder_token_labels

* use hf_compute_loss

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Add copied from

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-02-10 15:47:02 +01:00
3d5dea9bf0 Add example batch size to all commands (#15596) 2022-02-10 08:52:07 -05:00
cb7ed6e083 Add Tensorflow handling of ONNX conversion (#13831)
* Add TensorFlow support for ONNX export

* Change documentation to mention conversion with Tensorflow

* Refactor export into export_pytorch and export_tensorflow

* Check model's type instead of framework installation to choose between TF and Pytorch

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Alberto Bégué <alberto.begue@della.ai>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2022-02-10 11:18:41 +01:00
e923917cd9 Reformat tokenization_fnet 2022-02-09 22:23:32 -05:00
644ec05233 Make slow tests slow 2022-02-09 19:10:22 -05:00
c722753afd Expand tutorial for custom models (#15587)
* Expand tutorial for custom models

* Style

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
2022-02-09 17:44:28 -05:00
a86ee2261e Add link (#15588)
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
2022-02-09 23:33:39 +01:00
dee17d5676 [trainer docs] document how to select specific gpus (#15551)
* [trainer docs] document how to select specific gpus

* expand

* add urls

* add accelerate launcher
2022-02-09 10:12:29 -08:00
258480864d update serving_output for some TF models (#15568)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-09 18:32:51 +01:00
315e67404d Fix tests hub failure (#15580)
* Expose hub test problem

* Fix tests
2022-02-09 12:27:59 -05:00
b1ba03e082 Fix quality 2022-02-09 12:06:59 -05:00
eed3186b79 Trigger doc build 2022-02-09 11:57:59 -05:00
2b5603f6ac Constrained Beam Search [without disjunctive decoding] (#15416)
* added classes to get started with constrained beam search

* in progress, think i can directly force tokens now but not yet with the round robin

* think now i have total control, now need to code the bank selection

* technically works as desired, need to optimize and fix design choices leading to undersirable outputs

* complete PR #1 without disjunctive decoding

* removed incorrect tests

* Delete k.txt

* Delete test.py

* Delete test.sh

* revert changes to test scripts

* genutils

* full implementation with testing, no disjunctive yet

* shifted docs

* passing all tests realistically ran locally

* removing accidentally included print statements

* fixed source of error in initial PR test

* fixing the get_device() vs device trap

* fixed documentation docstrings about constrained_beam_search

* fixed tests having failing for Speech2TextModel's floating point inputs

* fix cuda long tensor

* added examples and testing for them and founx & fixed a bug in beam_search and constrained_beam_search

* deleted accidentally added test halting code with assert False

* code reformat

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_generation_utils.py

* fixing based on comments on PR

* took out the testing code that should but work fails without the beam search moditification ; style changes

* fixing comments issues

* docstrings for ConstraintListState

* typo in PhrsalConstraint docstring

* docstrings improvements

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-02-09 16:59:26 +01:00
0113aae5b7 Add implementation of typical sampling (#15504)
* typical decoding

* changing arg name

* add test config params

* forgotten arg rename

* fix edge case where scores are same

* test for typical logits warper

* code quality fixes
2022-02-09 16:48:41 +01:00
f588cf4050 [Flax tests/FlaxBert] make from_pretrained test faster (#15561) 2022-02-09 16:48:08 +01:00
7029240927 Upgrade click version (#15579) 2022-02-09 10:28:43 -05:00
9e00566b9b Add Wav2Vec2 Adapter Weights to Flax (#15566)
* Add Wav2Vec2 Adapter Weights to Flax

* Suggested changes
2022-02-09 10:24:40 -05:00
1f60bc46f3 Make sure custom configs work with Transformers (#15569)
* Make sure custom configs work with Transformers

* Apply code review suggestions
2022-02-09 10:04:44 -05:00
7732d0fe7a Upgrade black to version ~=22.0 (#15565)
* Upgrade black to version ~=22.0

* Check copies

* Fix code
2022-02-09 09:28:57 -05:00
d923f76203 add model scaling section (#15119)
* add model scaling section

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* integrate reviewer feedback

* initialize GPU properly

* add note about BnB optimizer

* move doc from `scaling.mdx` to `performance.mdx`

* integrate reviewer feedback

* revert section levels

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-02-09 15:27:30 +01:00
b5c6fdecf0 PoC for a ProcessorMixin class (#15549)
* PoC for a ProcessorMixin class

* Documentation

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Roll out to other processors

* Add base feature extractor class in init

* Use args and kwargs

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-02-09 09:24:49 -05:00
ba3f9a71a1 logger.warn --> logger.warning (#15572)
* change logger.warn to logger.warning

* make style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-09 08:20:05 -05:00
a6885db912 [Flax tests] fix test_model_outputs_equivalence (#15571)
* fix test_model_outputs_equivalence

* fix tuple outputs for blenderbot
2022-02-09 12:26:48 +01:00
fcb4f11c92 📝 Add codecarbon callback to docs (#15563) 2022-02-08 14:10:53 -05:00
077c00c0b2 feat(flax): allow encoder_outputs in generate (#15554)
* feat(flax): allow encoder_outputs in generate

* doc(flax): encoder_outputs in generate

* fix: style

* fix: style
2022-02-08 17:53:22 +01:00
8406fa6dd5 Add TFSpeech2Text (#15113)
* Add wrapper classes

* convert inner layers to tf

* Add TF Encoder and Decoder layers

* TFSpeech2Text models

* Loadable model

* TF model with same outputs as PT model

* test skeleton

* correct tests and run the fixup

* correct attention expansion

* TFSpeech2Text pask_key_values with TF format
2022-02-08 16:27:23 +00:00
6a5472a8e1 Force use_cache to be False in PyTorch (#15385)
* use_cache = False for PT models if labels is passed

* Fix for BigBirdPegasusForConditionalGeneration

* add warning if users specify use_cache=True

* Use logger.warning instead of warnings.warn

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-08 16:20:53 +01:00
0acd84f7cb [GPTJ] fix docs (#15558) 2022-02-08 15:54:19 +01:00
87d08afb16 electra is added to onnx supported model (#15084)
* electra is added to onnx supported model

* add google/electra-base-generator for test onnx module

Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
2022-02-08 15:47:49 +01:00
0fe17f375a FX tracing improvement (#14321)
* Change the way tracing happens, enabling dynamic axes out of the box

* Update the tests and modeling xlnet

* Add the non recoding of leaf modules to avoid recording more values for the methods to record than what will be seen at tracing time (which would otherwise desynchronize the recorded values and the values that need to be given to the proxies during tracing, causing errors).

* Comments and making tracing work for gpt-j and xlnet

* Refactore things related to num_choices (and batch_size, sequence_length)

* Update fx to work on PyTorch 1.10

* Postpone autowrap_function feature usage for later

* Add copyrights

* Remove unnecessary file

* Fix issue with add_new_model_like

* Apply suggestions
2022-02-07 22:25:33 +01:00
552f8d3091 Create a custom model guide (#15489)
* 📝 add config section

* 📝 finish first draft

* 📝 add feature extractor and processor

* 🖍 apply feedback from review

* 📝 minor edits

* last review
2022-02-07 12:34:56 -06:00
ad1d3c4d4b Make TF Wav2Vec2 outputs the same as PT's version (#15530)
* fix outputs

* fix for CTC

* fix doc

* make style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-07 18:09:57 +01:00
131e258411 Fix TF T5/LED missing cross attn in retrun values (#15511)
* add cross attn to outputs

* add cross attn to outputs for TFLED

* add undo padding

* remove unused import

* fix style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-07 17:41:48 +01:00
6775b211b6 Remove Longformers from ONNX-supported models (#15273) 2022-02-07 17:32:13 +01:00
7a1412e12b Wav2Vec2 models must either throw or deal with add_apater (#15409)
* Wav2Vec2 models must either throw or deal with add_apater

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Add pre-add_adapter backwards compatibility

* Add pre-add_adapter backwards compatibility

* Fix issue in tests/test_modeling_wav2vec2.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-02-07 17:03:12 +01:00
a459f7f97d Add ASR CTC streaming example (#15309)
* Single-epoch run

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Infinite dataset

* Trainer fix + distributed benchmark

* Benchmark fix

* unused import

* interleaved splits

* interleaved splits

* has_length util

* Move to research projects

* Leftover Sized checks

* Bump min version

* Unused import

* Revert trainer changes

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-02-07 18:35:37 +03:00
75b13f82e9 [Trainer] Deeper length checks for IterableDatasetShard (#15539)
* Unused import

* Make `has_length()` torch-independent to use in callbacks

* Update src/transformers/trainer_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-02-07 18:34:56 +03:00
84eec9e6ba Add ConvNeXT (#15277)
* First draft

* Add conversion script

* Improve conversion script

* Improve docs and implement tests

* Define model output class

* Fix tests

* Fix more tests

* Add model to README

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply more suggestions from code review

* Apply suggestions from code review

* Rename dims to hidden_sizes

* Fix equivalence test

* Rename gamma to gamma_parameter

* Clean up conversion script

* Add ConvNextFeatureExtractor

* Add corresponding tests

* Implement feature extractor correctly

* Make implementation cleaner

* Add ConvNextStem class

* Improve design

* Update design to also include encoder

* Fix gamma parameter

* Use sample docstrings

* Finish conversion, add center cropping

* Replace nielsr by facebook, make feature extractor tests smaller

* Fix integration test

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-02-07 16:11:37 +01:00
c47d259241 [torch_int_div] Correct true division in generation (#15498)
* [torch_int_div] Correct true division in generation

* up

* up
2022-02-07 16:04:18 +01:00
5f1918a4a8 [ASR pipeline] correct asr pipeline for seq2seq models (#15541) 2022-02-07 15:35:44 +01:00
e02bdce791 Revert "Handle PyTorch to Flax conversion of 1D convolutions (#15519)" (#15540)
This reverts commit 854a0d526c7a3b958a790e92272ac798ca3831f5.
2022-02-07 12:33:49 +01:00
8ce1330631 [deepspeed docs] DeepSpeed ZeRO Inference (#15486)
* [deepspeed docs] DeepSpeed ZeRO Inference

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* tweak

* deal with black

* extra cleanup, better comments

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-02-04 13:51:02 -08:00
ac6aa10f23 Standardize semantic segmentation models outputs (#15469)
* Standardize instance segmentation models outputs

* Rename output

* Update src/transformers/modeling_outputs.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Add legacy argument to the config and model forward

* Update src/transformers/models/beit/modeling_beit.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Copy fix in Segformer

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2022-02-04 14:52:07 -05:00
31be2f45a9 [deepspeed docs] Megatron-Deepspeed info (#15488) 2022-02-04 11:15:13 -08:00
bbe9c6981b Fix TFRemBertEncoder all_hidden_states (#15510)
* fix

* fix test

* remove expected_num_hidden_layers

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-04 16:32:14 +00:00
854a0d526c Handle PyTorch to Flax conversion of 1D convolutions (#15519) 2022-02-04 17:08:03 +01:00
486260c68e use kwargs (#15509)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-04 15:25:37 +00:00
525dbbf84a Remove loss from some flax models docs & examples (#15492)
* Remove return_loss from Flax models

* fix more

* fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-03 21:39:46 +01:00
21dcaec5d5 [deepspeed docs] memory requirements (#15506) 2022-02-03 10:55:14 -08:00
f1a4c4ead5 [WIP] Add preprocess_logits_for_metrics Trainer param (#15473)
* Add preprocess_logits_for_metrics Trainer param

* Compute accuracy in LM examples

* Improve comments
2022-02-03 12:07:20 -05:00
4f5faaf044 [deepspeed] fix a bug in a test (#15493)
* [deepspeed] fix a bug in a test

* consistency
2022-02-03 08:55:45 -08:00
90166121ee Add general vision docstrings (#15501)
* Add general docstrings

* Remove legacy docstrings

* Add BEiT

* Add DEiT

* Add SegFormer

* Fix beit output class

* Fix missing return_dict
2022-02-03 17:47:22 +01:00
e2b6e73fa2 [Flax tests] Disable scheduled GPU tests (#15503) 2022-02-03 17:12:14 +01:00
f5d98da29e fix load_weight_prefix (#15101)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-03 15:11:53 +00:00
71dccd0774 fix (#15494)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-03 12:57:28 +01:00
5ec368d79e Correct eos_token_id settings in generate (#15403)
* Correct eos_token_id set in generate

* Set eos_token_id in test

* Correct eos_token_id set in generate

* Set eos_token_id in test
2022-02-03 00:24:40 +01:00
39b5d1a63a fix set truncation attribute in __init__ of PreTrainedTokenizerBase (#15456)
* change truncation_side in init of `PreTrainedTokenizerBase`

Co-authored-by: LSinev <LSinev@users.noreply.github.com>

* add test

* Revert "replace assert with exception for `padding_side` arg in `PreTrainedTokenizerBase` `__init__`"

This reverts commit 7a98b87962d2635c7e4d4f00db3948b694624843.

* fix kwargs

* Revert "fix kwargs"

This reverts commit 67b0a5270e8cf1dbf70e6b0232e94c0452b6946f.

* Update tests/test_tokenization_common.py

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* delete truncation_side variable

* reorganize test

* format

* complete doc

* Revert "Revert "replace assert with exception for `padding_side` arg in `PreTrainedTokenizerBase` `__init__`""

This reverts commit d5a10a7e2680539e5d9e98ae5d896c893d224b80.

* fix typo

* fix typos to render documentation

* Revert "Revert "Revert "replace assert with exception for `padding_side` arg in `PreTrainedTokenizerBase` `__init__`"""

This reverts commit 16cf58811943a08f43409a7c83eaa330686591d0.

* format

Co-authored-by: LSinev <LSinev@users.noreply.github.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2022-02-02 23:18:09 +01:00
45cac3fade Fix labels stored in model config for token classification examples (#15482)
* Playing

* Properly set labels in model config for token classification example

* Port to run_ner_no_trainer

* Quality
2022-02-02 14:23:43 -05:00
c74f3d4c48 Add W&B backend for hyperparameter sweep (#14582)
# Add support for W&B hyperparameter sweep
This PR:
* allows using wandb for running hyperparameter search.
* The runs are visualized on W&B sweeps dashboard
* This supports runnning sweeps on parallel devices, all reporting to the same central dashboard.

### Usage
**To run new a hyperparameter search:**
```
trainer.hyperparameter_search(
    backend="wandb", 
    project="transformers_sweep", # name of the project
    n_trials=5,
    metric="eval/loss", # metric to be optimized, default 'eval/loss'. A warning is raised if the passed metric is not found
)
```
This outputs a sweep id. Eg. `my_project/sweep_id`

**To run sweeps on parallel devices:**
Just pass sweep id which you want to run parallel
```
trainer.hyperparameter_search(
    backend="wandb", 
    sweep_id = "my_project/sweep_id"
)
```
2022-02-02 14:06:14 -05:00
13297ac71c Fic docstring of ASR pipeline (#15481) 2022-02-02 12:12:22 -05:00
dd360d58d9 fix error posted in issue #15448 (#15480)
* fix error posted in issue #15448

Signed-off-by: bugface <alexgre@ufl.edu>

* clean up - remove commented line

Signed-off-by: bugface <alexgre@ufl.edu>
2022-02-02 10:45:51 -05:00
44b21f117b Save code of registered custom models (#15379)
* Allow dynamic modules to use relative imports

* Work for configs

* Fix last merge conflict

* Save code of registered custom objects

* Map strings to strings

* Fix test

* Add tokenizer

* Rework tests

* Tests

* Ignore fixtures py files for tests

* Tokenizer test + fix collection

* With full path

* Rework integration

* Fix typo

* Remove changes in conftest

* Test for tokenizers

* Add documentation

* Update docs/source/custom_models.mdx

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add file structure and file content

* Add more doc

* Style

* Update docs/source/custom_models.mdx

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Address review comments

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2022-02-02 10:44:37 -05:00
623d8cb475 Adding support for microphone streaming within pipeline. (#15046)
* Adding support for `microphone` streaming within pipeline.

- Uses `ffmpeg` to get microphone data.
- Makes sure alignment is made to `size_of_sample`.
- Works by sending `{"raw": ..data.., "stride": (n, left, right),
"partial": bool}`
directly to the pipeline enabling to stream partial results and still
get inference.
- Let's `partial` information flow through the pipeline to enable caller
  to get it back and choose to display text or not.

- The striding reconstitution is bound to have errors since CTC does not
keep previous state. Currently most of the errors are we don't know if
there's a space or not between two chunks.
Since we have some left striding info, we could use that during decoding
to choose what to do with those spaces and even extra letters maybe (if
the stride is long enough, it's bound to cover at least a few symbols)

Fixing tests.

Protecting with `require_torch`.

`raw_ctc` support for nicer demo.

Post rebase fixes.

Revamp to split raw_mic_data from it's live chunking.

- Requires a refactor to make everything a bit cleaner.

Automatic resampling.

Small fix.

Small fix.

* Post rebase fix (need to let super handle more logic, reorder args.)

* Update docstrings

* Docstring format.

* Remove print.

* Prevent flow of `input_values`.

* Fixing `stride` too.

* Fixing the PR by removing `raw_ctc`.

* Better docstrings.

* Fixing init.

* Update src/transformers/pipelines/audio_utils.py

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Update tests/test_pipelines_automatic_speech_recognition.py

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* Quality.

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
2022-02-02 15:12:12 +01:00
d718c0c3a8 [Wav2Vec2ProcessorWithLM] add alpha & beta to batch decode & decode (#15465) 2022-02-02 12:59:40 +01:00
1d94d57546 Add option to resize like torchvision's Resize (#15419)
* Add torchvision's resize

* Rename torch_resize to default_to_square

* Apply suggestions from code review

* Add support for default_to_square and tuple of length 1
2022-02-02 09:44:22 +01:00
b9418a1d97 Update tutorial docs (#15165)
* first draft of pipeline, autoclass, preprocess tutorials

* apply review feedback

* 🖍 apply feedback from patrick/niels

* 📝add output image to preprocessed image

* 🖍 apply feedback from patrick
2022-02-01 18:31:35 -06:00
c157c7e3fd Update fine-tune docs (#15259)
* add fine-tune tutorial

* make edits, fix style

* 📝 make edits

* 🖍 fix code format links to external libraries

* 🔄revert code formatting

* 🖍 use DefaultDataCollator instead of DataCollatorWithPadding
2022-02-01 18:28:12 -06:00
d0b5ed110a Harder check for IndexErrors in QA scripts (#15438)
* Harder check for IndexErrors in QA scripts

* Make test stronger
2022-02-01 15:49:13 -05:00
8e5d4e4906 Trainer.push_to_hub always tries to push to the Hub (#15463) 2022-02-01 15:49:04 -05:00
37800f1365 [BartTokenizer] remove inheritance on RobertaTokenizer (#15461)
* refactor bart tokenizers

* doc

* replace assert with ValueError
2022-02-01 20:59:24 +01:00
f427e75049 use mean instead of elementwise_mean in XLMPredLayer (#15436)
* use mean instead of elementwise_mean

* make style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-01 19:08:17 +01:00
7b8bdd8601 fix the tokenizer_config.json file for the slow tokenizer when a fast version is available (#15319)
* add new test

* update test

* remove `tokenizer_file` from `additional_files_names` in `tokenization_utils_base.py`

* add `tokenizer_file` for the fast only tokenizer

* change global variables layoutxml

* remove `"tokenizer_file"` from DPR tokenizer's Global variables

* remove `tokenizer_file` from herbert slow tokenizer init

* `"tokenizer_file"` from LED tokenizer's Global variables

* remove `tokenizer_file` from mbart slow tokenizer init

* remove `tokenizer_file` from slow tokenizer template

* adapt to versioning

* adapt the `test_tokenizer_mismatch_warning` test

* clean test

* clarify `VOCAB_FILES_NAMES` in tokenization_utils_fast.py

* Revert "remove `tokenizer_file` from mbart slow tokenizer init"

This reverts commit 0dbb723fa9c7599d4640fe30b3647a74eb4a64e1.

* Revert "`"tokenizer_file"` from LED tokenizer's Global variables"

This reverts commit 5a3f879bdd651233f3d74a3d1146c34cde82b0c2.

* Revert "remove `tokenizer_file` from herbert slow tokenizer init"

This reverts commit f5e10007b7b0ec5345e015b9de7ffec72c5407fd.

* Revert "remove `"tokenizer_file"` from DPR tokenizer's Global variables"

This reverts commit da0895330bedfafc81ae3073470a9348c669f032.

* set `tokenizer_file` in super `__init__` of mbart
2022-02-01 16:48:25 +01:00
6d585fe0f0 replace assert with exception for padding_side arg in PreTrainedTokenizerBase __init__ (#15454)
* replace assert with exception for `padding_side` arg in `PreTrainedTokenizerBase` `__init__`

* add test

* fix kwargs

* reformat test

* format

* format

* fix typo to render the documentation
2022-02-01 16:13:58 +01:00
d2749cf72e Update README.md (#15462)
fix typo
2022-02-01 10:04:30 -05:00
1c9648c457 [M2M100, XGLM] fix positional emb resize (#15444) 2022-02-01 14:32:55 +01:00
2ca6268394 fix from_vision_text_pretrained doc example (#15453)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-01 12:20:22 +01:00
dc05dd539f Fix TF Causal LM models' returned logits (#15256)
* Fix TF Causal LM models' returned logits

* Fix expected shape in the tests

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-01 11:04:07 +00:00
af5c3329d7 remove "inputs" in tf common test script (no longer required) (#15262)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-02-01 10:09:49 +00:00
d12ae81664 [generate] fix synced_gpus default (#15446) 2022-01-31 13:58:27 -08:00
d4f201b860 skip test for XGLM (#15445) 2022-01-31 16:53:16 -05:00
0c17e766cb Error when group_by_length is used with an IterableDataset (#15437) 2022-01-31 15:33:16 -05:00
125a2882b4 Update modeling_wav2vec2.py (#15423)
* Update modeling_wav2vec2.py

With very tiny sound files (less than 0.1 seconds) the num_masked_span can be too long. The issue is described in issue #15366 and discussed with @patrickvonplaten.

* correct errors with mask time indices

* remove bogus file

* make fix-copies

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-01-31 21:22:11 +01:00
d984b10335 Add 'with torch.no_grad()' to BEiT integration test forward passes (#14961)
* Add 'with torch.no_grad()' to BEiT integration test forward pass

* Fix inconsistent use of tabs and spaces in indentation
2022-01-31 15:12:10 -05:00
09f9d07271 Misfiring tf warnings (#15442)
* Fix spurious warning in TF TokenClassification models

* Fixing one last spurious warning

* Removing outdated warning altogether
2022-01-31 19:17:59 +00:00
6915174e68 [RobertaTokenizer] remove inheritance on GPT2Tokenizer (#15429)
* refactor roberta tokenizer

* refactor fast tokenizer

* remove old comment
2022-01-31 19:50:25 +01:00
a5ecbf7348 correct positionla emb size (#15441) 2022-01-31 19:47:49 +01:00
5a70987301 Fix TFLEDModel (#15356)
* fix tf led

* fix

* fix

* Add test_pt_tf_model_equivalence_extra for TFLED

* add a (temporary) test

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-31 19:35:54 +01:00
87918d3221 [examples/Flax] add a section about GPUs (#15198)
* add a section about GPUs

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-01-31 19:20:53 +01:00
b8810847d0 [Trainer] suppress warning for length-related columns (#15421)
* [Trainer] suppress warning for length-related columns

* improve message

* Update src/transformers/trainer.py
2022-01-31 18:51:29 +01:00
3385ca2582 Change REALM checkpoint to new ones (#15439)
* Change REALM checkpoint to new ones

* Last checkpoint missing
2022-01-31 12:50:20 -05:00
7e56ba2864 Fix spurious warning in TF TokenClassification models (#15435) 2022-01-31 17:09:16 +00:00
554d333ece Fix loss calculation in TFXXXForTokenClassification models (#15294)
* Fix loss calculation in TFFunnelForTokenClassification

* revert the change in TFFunnelForTokenClassification

* fix FunnelForTokenClassification loss

* fix other TokenClassification loss

* fix more

* fix more

* add num_labels to ElectraForTokenClassification

* revert the change to research projects

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-31 11:43:08 -05:00
44c7857b87 [deepspeed doc] fix import, extra notes (#15400)
* [deepspeed doc] fix import, extra notes

* typo
2022-01-31 08:28:10 -08:00
47df0f2234 Add header (#15434) 2022-01-31 11:15:54 -05:00
7fc6f41d91 Add doc for add-new-model-like command (#15433) 2022-01-31 11:10:45 -05:00
282ae123e2 add t5 ner finetuning (#15432) 2022-01-31 17:03:06 +01:00
d4b3e56d64 [Hotfix] Fix Swin model outputs (#15414)
* Fix Swin model outputs

* Rename pooler
2022-01-31 16:32:14 +01:00
38dfb40ae3 import torch.utils.checkpoint (#15427) 2022-01-31 15:51:50 +01:00
f624249d8b [Robust Speech Challenge] Add missing LR parameter (#15428) 2022-01-31 15:50:56 +01:00
3254080d45 Update README.md (#15430)
fix typo
2022-01-31 09:48:20 -05:00
aa19f478ac Add (M)Luke model training for Token Classification in the examples (#14880)
* Add Luke training

* Fix true label tags

* Fix true label tags

* Fix true label tags

* Update the data collator for Luke

* Some training refactor for Luke

* Improve data collator for Luke

* Fix import

* Fix datasets concatenation

* Add the --max_entity_length argument for Luke models

* Remove unused code

* Fix style issues

* Fix style issues

* Move the Luke training into a separate folder

* Fix style

* Fix naming

* Fix filtering

* Fix filtering

* Fix filter

* Update some preprocessing

* Move luke to research_projects

* Checkstyle

* Address comments

* Fix style
2022-01-31 07:58:18 -05:00
0094eba363 Fix additional DataTrainingArguments documentation (#15408)
(This is an editorial change only)
2022-01-31 07:45:11 -05:00
ee5de66349 Add SegformerFeatureExtractor to Auto API (#15410) 2022-01-31 11:38:08 +01:00
0f69b924fb [XGLMTokenizer] fix init and add in AutoTokenizer (#15406) 2022-01-30 15:35:53 +01:00
f380bf2b61 Fix the inconsistency of loss calculation between PT/TF XLNetLMHeadModel (#15298)
* Fix the inconsistency of loss calculation between PT/TF XLNetLMHeadModel

* overwrite test_loss_computation

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-29 15:08:35 +00:00
e09473a817 Add support for XLM-R XL and XXL models by modeling_xlm_roberta_xl.py (#13727)
* add xlm roberta xl

* add convert xlm xl fairseq checkpoint to pytorch

* fix init and documents for xlm-roberta-xl

* fix indention

* add test for XLM-R xl,xxl

* fix model hub name

* fix some stuff

* up

* correct init

* fix more

* fix as suggestions

* add torch_device

* fix default values of doc strings

* fix leftovers

* merge to master

* up

* correct hub names

* fix docs

* fix model

* up

* finalize

* last fix

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add copied from

* make style

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-01-29 13:42:37 +01:00
16d4acbfdb Get started docs (#15098)
* clean commit of changes

* apply review feedback, make edits

* fix backticks, minor formatting

* 🖍 make fixup and minor edits

* 🖍 fix # in header

* 📝 update code sample without from_pt

* 📝 final review
2022-01-28 19:01:37 -06:00
cabd6d26a2 Update model share tutorial (#15288)
* add model sharing tutorial

* 🖍 apply feedback from review

* 📝 make edits

* 🖍 fix formatting

* 📝 convert from pt checkpoint to flax

* 📝 final review
2022-01-28 18:49:26 -06:00
c98a6ac211 Use argument for preprocessing workers in run_summairzation (#15394) 2022-01-28 18:34:10 -05:00
db07956740 Fix missing eps arg for LayerNorm in ElectraGeneratorPredictions (#15332)
* fix missing eps

* Same fix for ConvBertGeneratorPredictions

* Same fix for AlbertMLMHead

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-28 18:32:26 -05:00
297602c7f4 [deepspeed] saving checkpoint fallback when fp16 weights aren't saved (#14948)
* [deepspeed] saving checkpoint fallback when fp16 weights aren't saved

* Bump required deepspeed version to match usage when saving checkpoints

* update version

Co-authored-by: Mihai Balint <balint.mihai@gmail.com>
2022-01-28 11:05:47 -08:00
d25e25ee2b Add XGLM models (#14876)
* add xglm

* update vocab size

* fix model name

* style and tokenizer

* typo

* no mask token

* fix pos embed compute

* fix args

* fix tokenizer

* fix positions

* fix tokenization

* style and dic fixes

* fix imports

* add fast tokenizer

* update names

* add pt tests

* fix tokenizer

* fix typo

* fix tokenizer import

* fix fast tokenizer

* fix tokenizer

* fix converter

* add tokenizer test

* update checkpoint names

* fix tokenizer tests

* fix slow tests

* add copied from comments

* rst -> mdx

* flax model

* update flax tests

* quality

* style

* doc

* update index and readme

* fix copies

* fix doc

* update toctrr

* fix indent

* minor fixes

* fix config doc

* don't save embed_pos weights

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* address Sylvains commnets, few doc fixes

* fix check_repo

* align order of arguments

* fix copies

* fix labels

* remove unnecessary mapping

* fix saving tokenizer

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-01-28 18:55:23 +01:00
b6b79faa7e Make links explicit (#15395)
* Make links explicit

* Removing reference to compute_metrics() since it's kind of PyTorch-specific
2022-01-28 17:31:22 +00:00
6df29ba5e6 fix wrong tokenizer checkpoint name in flax marian (#15391)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-28 16:53:25 +01:00
507601a5cf Prepare deprecated ONNX exporter for torch v1.11 (#15388)
* Prepare deprecated ONNX exporter for PyTorch v1.11

* Add deprecation warning
2022-01-28 16:32:47 +01:00
4996922b6d [docs] fix wrong file name in pr_check (#15380) 2022-01-28 07:52:01 -05:00
8f5d62fdb1 Fix bad_words_ids not working with sentencepiece-based tokenizers (#15343)
* Fix `bad_word_ids` not working with sentencepiece-based tokenizers

* make style

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-01-28 12:39:55 +01:00
06107541d3 Fixing support batch_size and num_return_Sequences in text-generation pipeline (#15318)
* Fixing support `batch_size` and `num_return_Sequences` in
`text-generation` pipeline

And `text2text-generation` too.

The bug was caused by the batch_size containing both the incoming batch
**and** the generated `num_sequences`.

The fix simply consists into splitting both of these again into
different dimensions.

* TF support.

* Odd backward compatibility script in the way.
2022-01-28 12:15:30 +01:00
c4d1fd77fa Set syncfree AdamW as the default optimizer for xla:gpu device in amp mode (#15361)
* Use syncfree AdamW for xla:gpu device by default

* Make syncfree AdamW optional
2022-01-27 20:05:31 -05:00
2e4559fa37 Add init to BORT (#15378)
* Add init to BORT

* BORT should be in init
2022-01-27 15:16:54 -05:00
f5db6ce76a Fix code format for Accelerate doc (#15335)
* 🖍 fix code syntax to external libraries and replace image

* 🔄revert code formatting, replace image with code block

* 🖍 apply feedback
2022-01-27 13:49:04 -06:00
0b07230409 Allow relative imports in dynamic code (#15352)
* Allow dynamic modules to use relative imports

* Add tests

* Add one last test

* Changes
2022-01-27 14:47:59 -05:00
628b59e51d Bump numpy from 1.19.2 to 1.21.0 in /examples/research_projects/lxmert (#15369)
Bumps [numpy](https://github.com/numpy/numpy) from 1.19.2 to 1.21.0.
- [Release notes](https://github.com/numpy/numpy/releases)
- [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst.txt)
- [Commits](https://github.com/numpy/numpy/compare/v1.19.2...v1.21.0)

---
updated-dependencies:
- dependency-name: numpy
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-27 14:46:15 -05:00
ca0848b2ff Bump notebook in /examples/research_projects/visual_bert (#15368)
Bumps [notebook](http://jupyter.org) from 6.1.5 to 6.4.1.

---
updated-dependencies:
- dependency-name: notebook
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2022-01-27 14:45:58 -05:00
7d45a2e81c Bump numpy in /examples/research_projects/visual_bert (#15367)
Bumps [numpy](https://github.com/numpy/numpy) from 1.19.2 to 1.21.0.
- [Release notes](https://github.com/numpy/numpy/releases)
- [Changelog](https://github.com/numpy/numpy/blob/main/doc/HOWTO_RELEASE.rst.txt)
- [Commits](https://github.com/numpy/numpy/compare/v1.19.2...v1.21.0)

---
updated-dependencies:
- dependency-name: numpy
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-27 14:45:18 -05:00
a81fd35524 Fix tests_fetcher (#15376) 2022-01-27 14:17:48 -05:00
eab338104d Docs for version v4.16.0 2022-01-27 13:11:51 -05:00
f87db5e412 Release: v4.16.0 2022-01-27 13:06:33 -05:00
c43749289d Example script for PushToHubCallback (#15375)
* Example script for PushToHubCallback

* Expanding description slightly
2022-01-27 16:16:24 +00:00
8f6454bfac Add proper documentation for Keras callbacks (#15374)
* Add proper documentation for Keras callbacks

* Add dummies
2022-01-27 10:51:38 -05:00
2de90beeeb Super-small fix stops us confusing Keras console logging by modifying its logs (#15373) 2022-01-27 15:43:43 +00:00
fa6dce250f Implement fixes for TrainingArguments doc (#15370)
Co-authored-by: osanseviero <osanseviero@gmail.com>

Co-authored-by: osanseviero <osanseviero@gmail.com>
2022-01-27 10:25:43 -05:00
ade7371a41 improve saving strategy of sentencepiece tokenizer (#15328)
* add new test

* add a feature to same the sentencepiece tokenizer model when the init file was deleted

* update marian

* update m2m_100

* fix marian

* update speech to text

* override test for layoutxlm

* fix saving bartpho

* remove harcoded values bartpho

* special token string version

* finish bartpho

* override layoutxml test

* add mbart

* move special tokens list

* format

* Revert "format"

This reverts commit 37a40df37903a932c2f951cbd33acb684246bae7.

* simplify list of string of special tokens

* Re-write `self.fairseq_tokens_to_ids ` initialization logic with special tokens

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2022-01-27 16:24:51 +01:00
196cce6e9b Add a device argument to the eval script (#15371)
* Device argument for the eval script

* Default to none

* isort
2022-01-27 15:58:55 +01:00
6beae766ee Fix KerasMetricCallback prediction with generate() and inference of column names (#15351)
* Fix prediction with generate() and the inference of column names
Should now have very few differences with the PyTorch implementation

* Minor edit to parent class

* Update src/transformers/keras_callbacks.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Explaining the dict conversion

* Putting main_input_name back

* Fixes to main_input_name

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-01-27 14:13:23 +00:00
da5ef25db9 Push to hub save (#15327)
* Adapt doc and push at every save

* style
2022-01-27 09:00:54 -05:00
9f831bdeaf [DocTests Speech] Add doc tests for all speech models (#15031)
* fix_torch_device_generate_test

* remove @

* doc tests

* up

* up

* fix doctests

* adapt files

* finish refactor

* up

* save intermediate

* add more logic

* new change

* improve

* next try

* next try

* next try

* next try

* fix final spaces

* fix final spaces

* improve

* renaming

* correct more bugs

* finish wavlm

* add comment

* run on test runner

* finish all speech models

* adapt

* finish
2022-01-27 14:29:31 +01:00
4df69506a8 Fix YosoConfig doc (#15353) 2022-01-26 21:06:27 +01:00
fc8fc400e3 [docs] post-PR merge fix (#15355)
* [docs] post-PR merge fix

* Update docs/source/main_classes/deepspeed.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-01-26 11:23:32 -08:00
99a2771189 Add YOSO (#15091)
* Add cookiecutter files

* Add cuda kernels and cpp files

* Update modeling_yoso.py

* Add .h files

* Update configuration_yoso.py

* Updates

* Remove tokenizer

* Code quality

* Update modeling_yoso.py

* Update modeling_yoso.py

* Fix failing test

* Update modeling_yoso.py

* Fix code quality

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review and fix integration tests

* Update src/transformers/models/yoso/modeling_yoso.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Apply suggestions from code review

* Fix copied from statement

* Fix docstring

* Fix code quality

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions and fix mask

* Apply suggestions from code review

* Fix code quality

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix docstrings

* Fix code quality

* Remove trailing whitespace

* Update yoso.mdx

* Move kernel loading to YosoEncoder

* make style

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/yoso/modeling_yoso.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Add short summary to docs

* Update docs/source/model_doc/yoso.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update yoso.mdx

* Update docs/source/model_doc/yoso.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Remove CausalLM model and add copied from

* Remove autoregressive code

* Remove unused imports

* add copied from for embeddings

* Fix code quality

* Update docs/source/model_doc/yoso.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Apply suggestion from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-01-26 19:18:29 +01:00
6292532fd1 Update doc writing guide (#15350) 2022-01-26 12:54:11 -05:00
19732cc07a Fix 'eval_split_name' described as defaulting to 'train' (#15348)
The default is correct (`test`) but the description is not.
2022-01-26 10:19:38 -05:00
5d8b98608c Fix deepspeed docs (#15346) 2022-01-26 07:24:33 -05:00
96161ac408 make table into valid Markdown table syntax (#15337) 2022-01-26 07:10:00 -05:00
24e2fa1590 Fix encoder-decoder models when labels is passed (#15172)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-26 10:14:46 +01:00
e79a0faeae Added missing code in exemplary notebook - custom datasets fine-tuning (#15300)
* Added missing code in exemplary notebook - custom datasets fine-tuning

Added missing code in tokenize_and_align_labels function in the exemplary notebook on custom datasets - token classification.
The missing code concerns adding labels for all but first token in a single word.
The added code was taken directly from huggingface official example - this [colab notebook](https://github.com/huggingface/notebooks/blob/master/transformers_doc/custom_datasets.ipynb).

* Changes requested in the review - keep the code as simple as possible
2022-01-25 17:26:17 -05:00
0501beb846 Add 🤗 Accelerate tutorial (#15263)
* add accelerate tutorial

* 🖍 apply feedback from review

* 📝 make edits
2022-01-25 13:46:11 -06:00
637e81752a [Tests] Fix test (#15324)
* Fix Swin device

* Remove print statement
2022-01-25 15:48:25 +01:00
e695470794 Avoid using get_list_of_files (#15287)
* Avoid using get_list_of_files in config

* Wip, change tokenizer file getter

* Remove call in tokenizer files

* Remove last call to get_list_model_files

* Better tests

* Unit tests for new function

* Document bad API
2022-01-25 09:41:21 -05:00
e65bfc0971 Try without bad instruction 2022-01-24 15:55:29 -05:00
81156d20cd Add model like (#14992)
* Add new model like command

* Bad doc-styler

* black and doc-styler, stop fighting!

* black and doc-styler, stop fighting!

* At last

* Clean up

* Typo

* Bad doc-styler

* Bad doc-styler

* All good maybe?

* Use constants

* Add doc and type hints

* More cleaning

* Add doc

* Fix Copied from

* Doc template

* Use typing.Pattern instead

* Framework-specific files

* Fixes

* Select frameworks clean model init

* Deal with frameworks in main init

* fixes

* Last fix

* Prompt user for info

* Delete exemple config

* Last fixes

* Add test config

* Fix bug with model_type included in each other

* Fixes

* More fixes

* More fixes

* Adapt config

* Remove print statements

* Will fix tokenization later, leave it broken for now

* Add test

* Quality

* Try this way

* Debug

* Maybe by setting the path?

* Let's try another way

* It should go better when actually passing the arg...

* Remove debug statements and style

* Fix config

* Add tests

* Test require the three backends

* intermediate commit

* Revamp pattern replacements and start work on feature extractors

* Adapt model info

* Finalize code for processors

* Fix in main init additions

* Finish questionnaire for processing classes

* Fix file name

* Fix for real

* Fix patterns

* Style

* Remove needless warnings

* Copied from should work now.

* Include Copied form in blocks

* Add test

* More fixes and tests

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comment

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2022-01-24 15:25:10 -05:00
457dd4392b [Examples] Correct run ner label2id for fine-tuned models (#15017)
* up

* up

* make style

* apply sylvains suggestions

* apply changes to accelerate as well

* more changes

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-01-24 21:18:04 +01:00
8d6acc6c29 [Beam Search] Correct returned beam scores (#14654)
* better

* save intermediate

* finish code

* up

* docs

* Apply suggestions from code review

* up

* add compute transition  beam scores function to model and make sure scores are correct with eos

* apply nicos comments

* Apply suggestions from code review

* another fix
2022-01-24 21:13:21 +01:00
e239fc3b0b Replace NystromformerTokenizer with AutoTokenizer (#15312) 2022-01-24 16:33:43 +01:00
dcaa5100c9 [LayoutLMV2 Tests] Make sure input is on GPU (#15314)
* [LayoutLMV2 Tests] Make sure input is on GPU

* correct empty line
2022-01-24 15:54:47 +01:00
c15bb3fe19 [Fix doc example] fix missing import jnp (#15291)
* fix missing import jnp

* Fix missing jax and k=1

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-24 14:54:23 +01:00
eac4aecc3d Remove old debug code leftover. (#15306) 2022-01-24 07:27:45 -05:00
2390b2cf65 Fix a typo in tag addition (#15286)
* Fix a typo in tag addition

* Put it back again
2022-01-24 07:21:42 -05:00
c972433a85 Update CONTRIBUTING.md (#15290)
Fix typo in doc
2022-01-24 07:21:31 -05:00
4bf97415a4 Update eval.py (#15310) 2022-01-24 11:46:38 +01:00
b7cb126ccc [PyTorch-nightly-test] Fix Wav2Vec2 LM & Phoneme tests (#15272)
* [PyTorch-nightly-test] Fix Wav2Vec2 LM & Phoneme tests

* Update .github/workflows/self-nightly-scheduled.yml

* change lines

* Apply suggestions from code review
2022-01-24 10:53:53 +01:00
6ac77534bf Refine errors for pretrained objects (#15261)
* Refine errors for pretrained objects

* PoC to avoid using get_list_of_files

* Adapt tests to use new errors

* Quality + Fix PoC

* Revert "PoC to avoid using get_list_of_files"

This reverts commit cb93b7cae8504ef837c2a7663cb7955e714f323e.

* Revert "Quality + Fix PoC"

This reverts commit 3ba6d0d4ca546708b31d355baa9e68ba9736508f.

* Fix doc

* Revert PoC

* Add feature extractors

* More tests and PT model

* Adapt error message

* Feature extractor tests

* TF model

* Flax model and test

* Merge flax auto tests

* Add tokenization

* Fix test
2022-01-21 15:00:09 -05:00
80af1048cf [Wav2Vec2ProcessorWithLM] improve multi processing (#15247)
* [Wav2Vec2ProcessorWithLM] improve multi processing

* close pool
2022-01-21 18:30:10 +01:00
4cff3fae11 Second failing test 2022-01-21 12:19:28 -05:00
f6253147df Skip failing test 2022-01-21 12:03:21 -05:00
7799b6128f [Fix doc example] TFLayoutLMForTokenClassification: missing import tf (#15268)
* fix import

* remove import torch

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-21 11:18:11 -05:00
11afb709ec [Robust Speech Challenge] Add timeline (#15274) 2022-01-21 17:12:09 +01:00
3c3cf17a49 fix link (#15278) 2022-01-21 09:52:13 -05:00
95a75a715f Specify providers explicitly in ORT session initialization (#15235)
* Specify providers explicitly in ORT session initialization

Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>
2022-01-21 15:49:29 +01:00
833635e259 Move BART + ONNX example to research_projects (#15271)
* Move BART + ONNX example to research_projects

* Add author information
2022-01-21 14:47:34 +01:00
183ce067e0 Fix (#15276)
* Fix

* make style

* Remove trailing commas

* make style
2022-01-21 08:46:15 -05:00
b4ce313e6c Prepare ONNX export for torch v1.11 (#15270)
* Prepare ONNX export for torch v1.11
2022-01-21 14:28:19 +01:00
126bddd1ba Add module_spec to new model 2022-01-21 08:12:44 -05:00
c962c2adbf Adds missing module_specs for usages of _LazyModule (#15230)
* Add missing __spec__ for transformers.models.auto

* Moves the __spec__-test to the UnitTest class

* Adds module_spec to all instances of _LazyModule

* Refactors an old test from pytest to unittest
2022-01-21 07:30:12 -05:00
6c7b68d414 [ViTMAE] Add image pretraining script (#15242)
* Add script

* Improve script

* Fix data collator

* Update README

* Add label_names argument

* Apply suggestions from code review

* Add config parameters

* Update script

* Fix bug

* Improve README

* Improve README and add test

* Fix import

* Add image_column_name
2022-01-21 12:11:08 +01:00
d43e308e7f Add Swin Transformer (#15085)
* Add all files

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Updates

* Apply suggestions from review

* Fix failing tests

* Update __init__.py

* Update configuration_swin.py

* Update auto_factory.py

* Fix pytests

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Fix tests and default checkpoint

* Fix Recursion error

* Code quality

* Remove copied from

* Update modeling_swin.py

* Code quality

* Update modeling_swin.py

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Apply suggestions from code review

* Fix feature extractor

* Fix code quality

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Apply suggestions from code review

* Update configuration_swin.py

* Update default checkpoint

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/model_doc/swin.mdx

Co-authored-by: Mishig Davaadorj <mishig.davaadorj@coloradocollege.edu>

* Update conversion script

* Reformat conversion script

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Mishig Davaadorj <mishig.davaadorj@coloradocollege.edu>
2022-01-21 12:10:41 +01:00
515ed3ad2a Fix doc examples (#15257) 2022-01-20 21:51:51 +01:00
ad7390636d Tentative workflow improvement (#15255) 2022-01-20 13:51:19 -05:00
57820456bd Fix crash when logs are empty because Keras has wiped them out of spite (#15258) 2022-01-20 18:40:48 +00:00
1fc0fa4617 Make sure to raise NotImplementedError with correct method name (#15253) 2022-01-20 10:37:35 -05:00
f00f22a3e2 Fixes tf_default_data_collator sometimes guessing the wrong dtype for labels (#15234)
* Fixes tf_default_data_collator sometimes guessing the wrong dtype for labels

* Add test for numpy scalar inputs
2022-01-20 14:26:51 +00:00
4a6a35bc65 [Fix doc example] missing import (#15240)
* fix import

* fix style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-20 08:47:24 -05:00
08b41b413a Update pipelines.mdx (#15243)
fix few spelling mistakes
2022-01-20 08:46:48 -05:00
85ea462c08 Update README.md (#15246)
Clarify OVH instruction
2022-01-20 13:40:26 +03:00
e57468b8a8 Update README.md (#15239)
Add an OVHcloud tutorial URL for the Robust Speech Challenge
2022-01-20 11:46:50 +03:00
baf1ebe9f0 Fix usage of additional kwargs in from_encoder_decoder_pretrained in encoder-decoder models (#15056)
* [EncoderDecoder] Add test for usage of extra kwargs

* [EncoderDecoder] Fix usage of extra kwargs in from pretrained

* [EncoderDecoder] apply suggested changes (passing **kwargs_encoder)

* [EncoderDecoder] create new test function and make sure it passes

Co-authored-by: jonas <jsnfly@gmx.de>
2022-01-19 23:00:33 +01:00
3fefee9910 Make chuking smartly (long files) work on asr ctc_with_lm. (#15219)
* [WIP] Make chuking smartly (long files) work on asr ctc_with_lm.

* Slow test with functionality.

* Fixing regular test.

* fix for batch size 1

* Handling batch outside `rescale_Stride`.

- Renamed to `rescale_stride`.

* Disable equality in the test.

* Remove print.

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-01-19 21:04:26 +01:00
80f7296091 Update Trainer code example (#15070)
* Update code example

* Fix code quality

* Add comment
2022-01-19 20:15:12 +01:00
ac227093e4 Add ViLT (#14895)
* First commit

* Add conversion script

* Make conversion script work for base model

* More improvements

* Update conversion script, works for vqa

* Add indexing argument to meshgrid

* Make conversion script work for ViltForPreTraining

* Add ViltForPreTraining to docs

* Fix device issue

* Add processor

* Add MinMaxResize to feature extractor

* Implement call method of ViltProcessor

* Fix tests

* Add integration test

* Add loss calculation for VQA

* Improve tests

* Improve some more tests

* Debug tests

* Small improvements

* Add support for attention_mask

* Remove mask_it

* Add pixel_mask

* Add tests for ViltFeatureExtractor

* Improve tests

* Add ViltForNaturalLanguageVisualReasoning

* Add ViltForNaturalLanguageVisualReasoning to conversion script

* Minor fixes

* Add support for image_embeds, update docstrings to markdown

* Update docs to markdown

* Improve conversion script

* Rename ViltForPreTraining to ViltForMaskedLM

* Improve conversion script

* Convert docstrings to markdown

* Fix code example of retrieval model

* Properly convert masked language model

* Add integration test for nlvr

* Fix code quality

* Apply suggestions from code review

* Add copied from statements

* Fix pretrained_config_archive_map

* Fix docs

* Add model to README

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply more suggestions from code review

* Make code more readable

* Add ViltForNaturalLanguageVisualReasoning to the tests

* Rename ViltForVisualQuestionAnswering to ViltForQuestionAnswering

* Replace pixel_values_2 by single tensor

* Add hidden_states and attentions

* Fix one more test

* Fix all tests

* Update year

* Fix rebase issues

* Fix another rebase issue

* Remove ViltForPreTraining from auto mapping

* Rename ViltForImageRetrievalTextRetrieval to ViltForImageAndTextRetrieval

* Make it possible to use BertTokenizerFast in the processor

* Use BertTokenizerFast by default

* Rename ViltForNaturalLanguageVisualReasoning, define custom model output

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-01-19 19:51:59 +01:00
691878ee2f Update README.md (#15233) 2022-01-19 18:03:17 +01:00
f4b7420dfe Fix checkpoint for ViT Config 2022-01-19 11:22:54 -05:00
6a3c883c8b Fix PR number (#15231)
* Fix PR number

* Fix PR number
2022-01-19 11:00:16 -05:00
f778edb739 Fix typo in BERT tokenization file (#15228)
* Fix typo

* Fix copies
2022-01-19 10:16:19 -05:00
2a5a384970 fix speech event readme (#15227) 2022-01-19 15:30:03 +01:00
842298f84f [ViTMAE] Various fixes (#15221)
* Add MAE to AutoFeatureExtractor

* Add link to notebook

* Fix relative paths
2022-01-19 15:27:57 +01:00
6d92c429c7 Update README.md (#15226) 2022-01-19 15:23:00 +01:00
19c217b4b7 Update README.md 2022-01-19 15:21:03 +01:00
5439cda7f0 Update README.md 2022-01-19 15:19:57 +01:00
841d979190 Add FastTokenizer to REALM (#15211)
* Remove BertTokenizer abstraction

* Add FastTokenizer to REALM

* Fix config archive map

* Fix copies

* Update realm.mdx

* Apply suggestions from code review
2022-01-19 15:19:36 +01:00
021b52e7a8 fix name 'TFFunnelTokenizer' is not defined (#15225)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-19 09:06:00 -05:00
653379c094 Build dev documentation (#15210)
* Wrap up

* Remove secret

* Fix path

* Typo

Revert image switch

* Specific token for comments

* Cleaner comments

* Correct PR number

* Explicit master install

* Force uninstall
2022-01-19 08:47:34 -05:00
2708bfa127 Rename compute_loss in TF models (#15207)
* Rename compute_loss to hf_compute_loss to avoid conflicts with the new Keras method

* make style

* Adding deprecation warning to `compute_loss`

* Fix sneaky reference to compute_loss

* Replace logger.warning with warnings.warn

* Clarifying warning and deprecation timeline
2022-01-19 13:29:07 +00:00
d1f5ca1afd [FLAX] glue training example refactor (#13815)
* refactor run_flax_glue.py

* updated readme

* rm unused import and args typo fix

* refactor

* make consistent arg name across task

* has_tensorboard check

* argparse -> argument dataclasses

* refactor according to review

* fix
2022-01-19 12:04:51 +01:00
db3503949d Finish conversion of REALM doc to MDX 2022-01-18 18:00:30 -05:00
fe78fe98ca Enable tqdm toggling (#15167)
* feature: enable tqdm toggle

* test: add tqdm unit test

* style: run linter

* Update tests/test_tqdm_utils.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* refactor: use tiny model, run linter

* docs: add tqdm to logging

* docs: add tqdm reference to `http_get`

* style: run linter

* Update docs/source/main_classes/logging.mdx

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* fix: use `AutoConfig` for framework agnostic testing

* chore: mv tqdm test to `test_logging.py`

* feature: implement enable/disable functions

* docs: mv docstring to comment

* chore: mv tqdm functions to `logging.py`

* docs: update docs to reference `enable/disable` funcs

* test: update test to use `enable/disable` func

* chore: update function reference in comment

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
2022-01-18 17:52:35 -05:00
2c335037bd Trigger doc build 2022-01-18 17:46:29 -05:00
e118e085ea [Robust Speech Event] Add guides (#15155)
* up

* improve readme

* up

* up

* more info

* up

* up

* Apply suggestions from code review

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* add more stuff for eval

* update

* up

* Update README.md

* Update examples/research_projects/xls_r/README.md

Co-authored-by: Omar Sanseviero <osanseviero@users.noreply.github.com>

* apply omar's suggestions

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Co-authored-by: Omar Sanseviero <osanseviero@users.noreply.github.com>
2022-01-18 18:44:48 +01:00
1a354d53c4 Revert previous change - that was meant to be in a branch! 2022-01-18 17:34:26 +00:00
2085f20901 Fix a sneaky reference to compute_loss in the tests 2022-01-18 17:33:38 +00:00
979ca24e39 [Fix doc example] Wrong checkpoint name (#15079)
* fix doc example - MarianForCausalLM example

* try to keep copies

* fix copies

* fix more similar doc examples

* fix more

* fix style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-18 10:43:21 -05:00
7b3d4df47a fix: #14486 do not use BertPooler in DPR (#15068)
* fix: #14486 do not use BertPooler in DPR

* fix tf dpr as well

* finish

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-01-18 16:36:12 +01:00
74bec9865c Add MAE (#15120)
* First draft

* More improvements

* More improvements

* More improvements

* Fix embeddings

* Add conversion script

* Finish conversion script

* More improvements

* Fix forward pass

* Remove print statements

* Add weights initialization

* Add initialization of decoder weights

* Add support for other models in the conversion script

* Fix patch_size for huge model

* Fix most of the tests

* Fix integration test

* Fix docs

* Fix archive_list

* Apply suggestions from code review

* Improve documentation

* Apply more suggestions

* Skip some tests due to non-deterministic behaviour

* Fix test_initialization

* Remove unneccessary initialization of nn.Embedding

* Improve docs

* Fix dummies

* Remove ViTMAEFeatureExtractor from docs

* Add model to README and table of contents

* Delete inference file
2022-01-18 16:21:32 +01:00
2ae3be5442 [MBartTokenizer] remove dep on xlm-roberta tokenizer (#15201) 2022-01-18 16:02:56 +01:00
84c60a7b50 Ignore empty subfolders when identifying submodules (#15204)
* Ignore empty subfolders when identifying submodules

* Update utils/check_inits.py
2022-01-18 09:48:46 -05:00
6f0a9b41ef Remove dependency to quiet Dependabot (#15205) 2022-01-18 09:44:35 -05:00
497346d07e [ASR pipeline] correct with lm pipeline (#15200)
* [ASR pipeline] correct with lm pipeline

* improve error
2022-01-18 15:36:22 +01:00
1144d336b6 Copies and docstring styling (#15202)
* Style docstrings when making/checking copies

* Polish
2022-01-18 09:16:55 -05:00
531336bbfd Fix deprecation warnings for int div (#15180)
* Fix deprecation warnings for int div

Co-authored-by: mgoldey <matthew.goldey@gmail.com>

* Fix import

* ensure that tensor output is python scalar

* make backward compatible

* make code more readable

* adapt test functions

Co-authored-by: mgoldey <matthew.goldey@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-01-18 07:28:53 -05:00
f6d3fee855 Error when code examples are improperly closed (#15186) 2022-01-18 07:27:34 -05:00
22454ae492 Add REALM (#13292)
* REALM initial commit

* Retriever OK (Update new_gelu).

* Encoder prediction score OK

* Encoder pretrained model OK

* Update retriever comments

* Update docs, tests, and imports

* Prune unused models

* Make embedder as a module `RealmEmbedder`

* Add RealmRetrieverOutput

* Update tokenization

* Pass all tests in test_modeling_realm.py

* Prune RealmModel

* Update docs

* Add training test.

* Remove completed TODO

* Style & Quality

* Prune `RealmModel`

* Fixup

* Changes:
1. Remove RealmTokenizerFast
2. Update docstrings
3. Add a method to RealmTokenizer to handle candidates tokenization.

* Fix up

* Style

* Add tokenization tests

* Update `from_pretrained` tests

* Apply suggestions

* Style & Quality

* Copy BERT model

* Fix comment to avoid docstring copying

* Make RealmBertModel private

* Fix bug

* Style

* Basic QA

* Save

* Complete reader logits

* Add searcher

* Complete searcher & reader

* Move block records init to constructor

* Fix training bug

* Add some outputs to RealmReader

* Add finetuned checkpoint variable names parsing

* Fix bug

* Update REALM config

* Add RealmForOpenQA

* Update convert_tfrecord logits

* Fix bugs

* Complete imports

* Update docs

* Update naming

* Add brute-force searcher

* Pass realm model tests

* Style

* Exclude RealmReader from common tests

* Fix

* Fix

* convert docs

* up

* up

* more make style

* up

* upload

* up

* Fix

* Update src/transformers/__init__.py

* adapt testing

* change modeling code

* fix test

* up

* up

* up

* correct more

* make retriever work

* update

* make style

* finish main structure

* Resolve merge conflict

* Make everything work

* Style

* Fixup

* Fixup

* Update training test

* fix retriever

* remove hardcoded path

* Fix

* Fix modeling test

* Update model links

* Initial retrieval test

* Fix modeling test

* Complete retrieval tests

* Fix

* style

* Fix tests

* Fix docstring example

* Minor fix of retrieval test

* Update license headers and docs

* Apply suggestions from code review

* Style

* Apply suggestions from code review

* Add an example to RealmEmbedder

* Fix

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-01-18 07:24:13 -05:00
b25067d807 [Fix doc example] TFRagModel (#15187)
* fix doc example - NameError: name 'PATH' is not defined

* fix name 'TFRagModel' is not defined

* correct TFRagRagSequenceForGeneration

* fix name 'tf' is not defined

* fix style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-18 07:16:30 -05:00
dea563c943 is_ctc needs to be updated to `self.type == "ctc". (#15194)
* `is_ctc` needs to be updated to `self.type == "ctc".

* Adding fast test for this functionality.
2022-01-18 12:20:10 +01:00
32090c729f [Fix doc example] UniSpeechSatForPreTraining (#15152)
* fix doc example - cannot import name 'UniSpeechSatFeatureEncoder'

* fix ckpt name

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-18 00:34:05 +01:00
6f8e644f09 Mark bad tokenizers version (#15188) 2022-01-17 15:20:58 -05:00
edd3fce2f7 [doc] new MoE paper (#15184)
add new paper
2022-01-17 09:10:51 -08:00
9a2dabae70 Fix dtype issue in TF BART (#15178) 2022-01-17 14:02:55 +00:00
0167edc854 Added forward pass of test_inference_image_classification_head with torch.no_grad() (#14777) 2022-01-17 07:22:41 -05:00
7a787c68c6 [Speech models] Disable non-existing chunking in tests (#15163) 2022-01-16 17:15:19 +01:00
669e3c50c9 [doc] performance: Efficient Software Prebuilds (#15147)
* Efficient Software Prebuilds

* improve
2022-01-14 18:25:20 -08:00
ebc4edfe7a update from keras2onnx to tf2onnx (#15162) 2022-01-14 17:35:39 +00:00
1b730c3d11 Better dummies (#15148)
* Better dummies

* See if this fixes the issue

* Fix quality

* Style

* Add doc for DummyObject
2022-01-14 10:59:41 -05:00
b212ff9f49 Fixing flaky test (hopefully). (#15154)
* Fixing flaky test (hopefully).

* tf compliant.
2022-01-14 16:47:03 +01:00
7d9a33fb5c TF Bert inference - support np.ndarray optional arguments (#15074)
* TF Bert inference - support np.ndarray optional arguments

* apply np input tests to all TF architectures
2022-01-14 15:19:04 +00:00
4663c609b9 Add "open in hf spaces" gradio button issue #73 (#15106)
* update XLMProphetNet link

* update DPR link

* change prophetnet link

* change link MBART

* change link GPT

* update gpt2 link

* ctrl update link

* update Transformer-XL link

* Update Reformer link

* update xlnet link

* bert update link

* udpate albert link

* roberta update link

* update distilbert link

* update convbert link

* update XLM link

* xlm roberta update link

* update Flaubert link

* update electra link

* update funnel transformer and longformer

* bart update link

* pegasus update link

* udpate marianmt link

* t5 update link

* mt5 update link
2022-01-14 10:12:30 -05:00
735d2bb69b Update test_configuration_common.py (#15160) 2022-01-14 08:54:01 -05:00
51d7ebf260 fix BertTokenizerFast tokenize_chinese_chars arg (#15158)
* add new test

* fix in init

* more relevant test
2022-01-14 14:22:03 +01:00
4aa16fce6c fix doc example - object has no attribute 'lm_logits' (#15143)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-14 13:42:13 +01:00
7cbf8429d9 Make sure all submodules are properly registered (#15144)
* Make sure all submodules are properly registered

* Try to fix tests

* Fix tests
2022-01-14 07:37:51 -05:00
c4f7eb124b add TF glu activation function (#15146) 2022-01-14 10:42:08 +00:00
5f3c57fc84 Check the repo consistency in model templates test (#15141)
* Check the repo consistency in model templates test

* Fix doc template

* Fix docstrings

* Fix last docstring
2022-01-14 04:52:38 -05:00
96881729ce Remove assert on optional arg 2022-01-13 17:34:41 -05:00
1eb40338ac [deepspeed tests] fix summarization (#15149) 2022-01-13 13:48:51 -08:00
6e058e84fd Enable AMP for xla:gpu device in trainer class (#15022)
* Multiple fixes of trainer class with XLA GPU

* Make fp16 valid for xla:gpu

* Add mark_step in should_log to reduce compilation overhead
2022-01-13 15:21:00 -05:00
3fc221d077 Update model_sharing.mdx (#15142)
Fix typo
2022-01-13 12:26:02 -05:00
7b83feb50a Deprecates AdamW and adds --optim (#14744)
* Add AdamW deprecation warning

* Add --optim to Trainer

* Update src/transformers/optimization.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/optimization.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/optimization.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/optimization.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py

* fix style

* fix

* Regroup adamws together

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Change --adafactor to --optim adafactor

* Use Enum for optimizer values

* fixup! Change --adafactor to --optim adafactor

* fixup! Change --adafactor to --optim adafactor

* fixup! Change --adafactor to --optim adafactor

* fixup! Use Enum for optimizer values

* Improved documentation for --adafactor

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Add mention of no_deprecation_warning

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename OptimizerOptions to OptimizerNames

* Use choices for --optim

* Move optimizer selection code to a function and add a unit test

* Change optimizer names

* Rename method

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename method

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Remove TODO comment

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename variable

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename variable

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Rename function

* Rename variable

* Parameterize the tests for supported optimizers

* Refactor

* Attempt to make tests pass on CircleCI

* Add a test with apex

* rework to add apex to parameterized; add actual train test

* fix import when torch is not available

* fix optim_test_params when torch is not available

* fix optim_test_params when torch is not available

* re-org

* small re-org

* fix test_fused_adam_no_apex

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove .value from OptimizerNames

* Rename optimizer strings s|--adam_|--adamw_|

* Also rename Enum options

* small fix

* Fix instantiation of OptimizerNames. Remove redundant test

* Use ExplicitEnum instead of Enum

* Add unit test with string optimizer

* Change optimizer default to string value

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2022-01-13 08:14:51 -08:00
762416ffa8 [examples/flax/language-modeling] set loglevel (#15129) 2022-01-13 15:17:28 +01:00
74837171ab fix doc example - AssertionError: has to be configured as a decoder. (#15124)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-13 06:45:30 -05:00
6950ccec1b doc-builder -> doc-build (#15134)
* Updated script

* Commit everything

* Ready for review!

* Update .github/workflows/build_documentation.yml

Co-authored-by: Julien Chaumond <julien@huggingface.co>

Co-authored-by: Julien Chaumond <julien@huggingface.co>
2022-01-13 06:02:24 -05:00
9a94bb8e21 mBART support for run_summarization.py (#15125)
* Update run_summarization.py

* Fixed languages and added missing code

* fixed obj, docs, removed source_lang and target_lang

* make style, run_summarization.py reformatted
2022-01-12 16:39:33 -05:00
97f3beed36 Add with torch.no_grad() to DistilBERT integration test forward pass (#14979)
* refactor: wrap forward pass around no_grad context

* Update tests/test_modeling_distilbert.py

* fix: rm `no_grad` from non-integration tests

* chore: rm whitespace change
2022-01-12 10:42:39 -05:00
021f2ea987 Add ONNX configuration classes to docs (#15121)
* Add ONNX classes to main package

* Remove permalinks from ONNX guide

* Fix ToC entry

* Revert "Add ONNX classes to main package"

This reverts commit eb794a5b00d66b0b4eab234987301676d8357630.

* Add ONNX classes to main doc

* Fix syntax highlighting in doc

* Fix text

* Add FeaturesManager to doc

* Use paths to reference ONNX classes

* Add FeaturesManager to init

* Add missing ONNX paths
2022-01-12 16:33:32 +01:00
c425d60bb9 Fix link to deepspeed config 2022-01-12 09:32:53 -05:00
6820904454 Fix #14357 (#15001)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-12 14:29:09 +00:00
aa0135f2e0 fix: switch from slow to generic tokenizer class (#15122) 2022-01-12 09:12:43 -05:00
27b819b0e3 use block_size instead of max_seq_length in tf run_clm example (#15036)
* use block_size instead of max_seq_length

* fixup

* remove pad_to_block_size

Co-authored-by: Russell Klopfer <russell@kloper.us>
2022-01-12 08:57:00 -05:00
68cc4ccde2 Pipeline ASR with LM. (#15071)
* Pipeline ASR with LM.

* Revamped into `self.decoder`.

* Fixing.

* 2nd fix.

* Update src/transformers/pipelines/__init__.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fixing.

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-01-12 09:28:19 +01:00
1a00863e95 Fix typo in doc template 2022-01-11 15:22:15 -05:00
44eaa2b303 Update TF test_step to match train_step (#15111)
* Update TF test_step to match train_step

* Update compile() warning to be clearer about what to pass
2022-01-11 19:05:39 +00:00
57b980a613 Fix saving FlaubertTokenizer configs (#14991)
All specific tokenizer config properties must be passed to its base
class (XLMTokenizer) in order to be saved. This was not the case for
do_lowercase config. Thus it was not saved by save_pretrained() method
and saving and reloading the tokenizer changed its behaviour.

This commit fixes it.
2022-01-11 19:19:33 +01:00
16f0b7d72c Update ONNX docs (#14904)
* Remove docs for deprecated ONNX export

* Tidy up the CLI help messages

* Revamp ONNX docs

* Update auto-config table

* Use DistilBERT as example for consistency

* Wrap up first pass at ONNX docs

* Fix table check

* Add tweaks and introduction

* Add cross-ref

* Fix missing import

* Fix style

* Add permalinks to ONNX configs

* Clarify role of OrderedDict

* Update docs/source/serialization.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add doctest syntax to code blocks

* Remove permalinks

* Revert "Remove permalinks"

This reverts commit 099701daf0db27823457867938efdb2d4f22a7c1.

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-01-11 18:06:05 +01:00
704d1feca1 Doc styler tip (#15105)
* Add new lines before/after tips

* Check end of lines
2022-01-11 11:45:39 -05:00
68d925195e Merge branch 'master' into master 2022-01-11 11:11:29 -05:00
7480ded658 Fix failing test (#15104) 2022-01-11 15:57:34 +01:00
28e091430e Add Nystromformer (#14659)
* Initial commit

* Config and modelling changes

Added Nystromformer-specific attributes to config and removed all decoder functionality from modelling.

* Modelling and test changes

Added Nystrom approximation and removed decoder tests.

* Code quality fixes

* Modeling changes and conversion script

Initial commits to conversion script, modeling changes.

* Minor modeling changes and conversion script

* Modeling changes

* Correct modeling, add tests and documentation

* Code refactor

* Remove tokenizers

* Code refactor

* Update __init__.py

* Fix bugs

* Update src/transformers/__init__.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/nystromformer/__init__.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/model_doc/nystromformer.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/nystromformer/configuration_nystromformer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/nystromformer/configuration_nystromformer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/nystromformer/configuration_nystromformer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/nystromformer/configuration_nystromformer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/nystromformer/convert_nystromformer_original_pytorch_checkpoint_to_pytorch.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/nystromformer/configuration_nystromformer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update modeling and test_modeling

* Code refactor

* .rst to .mdx

* doc changes

* Doc changes

* Update modeling_nystromformer.py

* Doc changes

* Fix copies

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update configuration_nystromformer.py

* Fix copies

* Update tests/test_modeling_nystromformer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update test_modeling_nystromformer.py

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Fix code style

* Update modeling_nystromformer.py

* Update modeling_nystromformer.py

* Fix code style

* Reformat modeling file

* Update modeling_nystromformer.py

* Modify NystromformerForMultipleChoice

* Fix code quality

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Code style changes and torch.no_grad()

* make style

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-01-11 14:25:49 +01:00
444ea95a80 Print out durations of all scheduled tests (#15102) 2022-01-11 08:15:59 -05:00
285131bfb4 change metric_key_prefix in seq2seq_trainer.py (#15099)
It solves the problem that metric_key_prefix is different from trainer.
2022-01-11 07:44:29 -05:00
c4fa908fa9 Adds IBERT to models exportable with ONNX (#14868)
* Add IBertOnnxConfig and tests

* add all the supported features for IBERT and remove outputs in IbertOnnxConfig

* use OnnxConfig

* fix codestyle

* remove serialization.rst

* codestyle
2022-01-11 12:17:08 +01:00
efb35a4107 [Wav2Vec2ProcessorWithLM] improve decoder downlaod (#15040) 2022-01-11 05:59:38 -05:00
6ea6266625 Fix cookiecutter (#15100) 2022-01-11 05:57:26 -05:00
68810aa26c fix doc example - TypeError: forward() got an unexpected keyword argument 'input_ids' (#15092)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-11 04:04:23 -05:00
ca76618d6b Take gradient accumulation into account when defining samplers (#15095)
* Take gradient accumulation into account when defining samplers

* style
2022-01-11 03:16:39 -05:00
9dc8fb2fc7 Add test to check reported training loss (#15096)
* Add test

* Add tests for the reported train loss
2022-01-11 03:14:11 -05:00
5cd7086fdb XLM-ProphetNet Spaces badge 2022-01-11 00:11:31 -05:00
4e3208662e DPR Spaces badge 2022-01-10 13:50:40 -05:00
ac2c06d492 ProphetNet spaces badge 2022-01-10 13:43:34 -05:00
bf0201e184 MBART spaces badge 2022-01-10 13:37:17 -05:00
b67fd797be Add TFVisionEncoderDecoderModel (#14148)
* Start the work on TFVisionEncoderDecoderModel

* Expose TFVisionEncoderDecoderModel

* fix import

* Add modeling_tf_vision_encoder_decoder to _ignore_modules in get_model_modules()

* reorder

* Apply the fix for checkpoint loading as in #14016

* remove attention_mask + fix VISION_DUMMY_INPUTS

* A minimal change to make TF generate() work for vision models as encoder in encoder-decoder setting

* fix wrong condition: shape_list(input_ids) == 2

* add tests

* use personal TFViTModel checkpoint (for now)

* Add equivalence tests + projection layer

* style

* make sure projection layer can run

* Add examples

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Clean comments (need to work on TODOs for PyTorch models)

* Remove TF -> PT in check_pt_tf_equivalence for TFVisionEncoderDecoderModel

* fixes

* Revert changes in PT code.

* Update tests/test_modeling_tf_vision_encoder_decoder.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Add test_inference_coco_en for TF test

* fix quality

* fix name

* build doc

* add main_input_name

* Fix ckpt name in test

* fix diff between master and this PR

* fix doc

* fix style and quality

* fix missing doc

* fix labels handling

* Delete auto.rst

* Add the changes done in #14016

* fix prefix

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make style

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-01-10 13:30:14 -05:00
c9504b2f50 MT5 Spaces badge 2022-01-10 12:57:08 -05:00
daec528ca9 T5 Spaces badge 2022-01-10 12:51:39 -05:00
0554e4d5c5 MarianMT Spaces badge 2022-01-10 12:47:12 -05:00
7ec6aad23d Pegasus Spaces badge 2022-01-10 12:39:22 -05:00
03f8b9c9e0 BART Spaces badge 2022-01-10 12:33:59 -05:00
37bc0b4e53 [performance doc] Power and Cooling (#14935)
* [performance doc] Power and Cooling

* more docs

* Update docs/source/performance.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* reword

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2022-01-10 09:21:04 -08:00
20f169b523 Longformer Spaces badge 2022-01-10 12:14:18 -05:00
3e9fdcf019 [DOC] fix doc examples for bart-like models (#15093)
* fix doc examples

* remove double colons
2022-01-10 18:13:28 +01:00
4fbc924d0a Funnel Transformer spaces badge 2022-01-10 12:06:05 -05:00
61d18ae035 Happy New Year! (#15094) 2022-01-10 12:05:57 -05:00
222c09a635 ELECTRA Spaces badge 2022-01-10 11:53:23 -05:00
31838d3e11 [doc] normalize HF Transformers string (#15023) 2022-01-10 08:44:33 -08:00
84f360e862 FlauBERT spaces badge 2022-01-10 11:41:10 -05:00
9f33116898 XLM-Roberta Spaces badge 2022-01-10 10:54:18 -05:00
20fa9eb035 XLM Spaces badge 2022-01-10 10:48:06 -05:00
16b6df6fca ConvBERT spaces badge 2022-01-10 10:33:03 -05:00
f21bc4215a Use tqdm.auto in Pipeline docs (#14920)
It's better for e.g. notebook.
2022-01-10 10:28:34 -05:00
f012c00ada Model summary horizontal banners (#15058) 2022-01-10 10:06:14 -05:00
af9cb94974 Fix style 2022-01-10 09:40:20 -05:00
533624c5a9 fix doc example - AttributeError: type object 'RagModel' has no attribute 'from_question_encoder_generator_pretrained' (#15076)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-10 09:28:39 -05:00
b2c477fc6d support the trocr small models (#14893)
* support the trocr small models

* resolve conflict

* Update docs/source/model_doc/trocr.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/model_doc/trocr.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/model_doc/trocr.mdx

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/trocr/processing_trocr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/trocr/processing_trocr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/trocr/processing_trocr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/trocr/processing_trocr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* fix unexpected indent in processing_trocr.py

* Update src/transformers/models/trocr/processing_trocr.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* update the docstring of processing_trocr

* remove extra space

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-01-10 09:28:03 -05:00
42d57549b8 Change assignee for tokenizers (#15088) 2022-01-10 09:22:48 -05:00
a54961c5f7 Make OpenAIGPTTokenizer work with SpaCy 2.x and 3.x (#15019)
* Make OpenAIGPTTokenizer work with SpaCy 3.x

SpaCy 3.x introduced an API change to creating the tokenizer that
breaks OpenAIGPTTokenizer. The old API for creating the tokenizer in
SpaCy 2.x no longer works under SpaCy 3.x, but the new API for creating
the tokenizer in SpaCy 3.x DOES work under SpaCy 2.x. Switching to the
new API should allow OpenAIGPTTokenizer to work under both SpaCy 2.x and
SpaCy 3.x versions.

* Add is_spacy_available and is_ftfy_available methods to file utils

* Add spacy and ftfy unittest decorator to testing utils

* Add tests for OpenAIGPTTokenizer that require spacy and ftfy

* Modify CircleCI config to run tests that require spacy and ftfy

* Remove unneeded unittest decorators are reuse test code

* Run make fixup
2022-01-10 07:53:20 -05:00
9fbf7c87c3 Update check_repo.py (#15014)
added new line
2022-01-10 06:55:43 -05:00
0a03a86813 fix model table cell text alignment (#14999)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-10 06:44:11 -05:00
d72343d2b8 [Wav2Vec2 Speech Event] Add speech event v2 (#15083)
* up

* up

* up

* up

* up

* up

* improve

* up

* up

* Update src/transformers/trainer.py

* up

* up

* up
2022-01-10 10:46:21 +01:00
768e6c1449 Fix convert for newer megatron-lm bert model (#14082)
* Fix convert for newer megatron-lm models

* Save megatron-bert config in a proper way

* Fix code style
2022-01-08 11:33:55 -08:00
623b4f7c63 [VisionTextDualEncoder] Add token_type_ids param (#15073)
* fix doc example - TypeError: get_text_features() got an unexpected keyword argument 'token_type_ids'

* add token_type_ids param

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-07 20:02:49 +01:00
5be1242ac0 Merge branch 'huggingface:master' into master 2022-01-07 11:48:22 -05:00
484e7a441f Distilbert spaces badge 2022-01-07 11:47:56 -05:00
ac224bb079 [Fix doc examples] Add missing from_pretrained (#15044)
* fix doc example - ValueError: Parameter config should be an instance of class `PretrainedConfig`

* Update src/transformers/models/segformer/modeling_segformer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2022-01-07 16:55:59 +01:00
f18c6fa94c Resubmit changes after rebase to master (#14982) 2022-01-07 08:34:12 +01:00
1d71227295 Roberta spaces badge 2022-01-06 18:50:19 -05:00
e36a83d3a3 Merge branch 'huggingface:master' into master 2022-01-06 18:44:59 -05:00
cac877425c ALBERT spaces badge 2022-01-06 13:01:23 -05:00
794441c379 BERT spaces badge 2022-01-06 12:22:09 -05:00
f872f18dca XLNet spaces badge 2022-01-06 12:09:50 -05:00
8d187e7feb Reformer Spaces badge 2022-01-06 11:59:21 -05:00
cc406da4de [VisionTextDualEncoder] Fix doc example
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-06 17:59:06 +01:00
59fb636948 Transformer-XL badge 2022-01-06 11:47:41 -05:00
25b8b8a6f2 Merge branch 'huggingface:master' into master 2022-01-06 11:42:14 -05:00
b67f345d00 Update run_speech_recognition_seq2seq.py (#14967) 2022-01-06 19:26:45 +03:00
f71fb5c36e Add 'with torch.no_grad()' to BertGeneration integration test forward passes (#14963) 2022-01-06 10:39:13 -05:00
d2183a46fb Remove old asserts. (#15012) 2022-01-06 09:45:41 -05:00
83c552d390 Add detectron2 to Github actions (#15053) 2022-01-06 08:53:58 -05:00
5ab87cd4da wrapped forward passes in torch.no_grad() (#15037) 2022-01-06 08:48:49 -05:00
5a06118b39 Enabling TF on image-classification pipeline. (#15030) 2022-01-06 14:16:00 +01:00
9f89fa02ed Add Flax image captioning example (#14864)
* add image captioning example

* update README

* fix style & quality

* simplify

* apply review suggestions

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Apply review suggestions

* add comments about using np instead jax array

* remove unused lines

* add model creation script

* only support from_pretrained

* fix style

* fix

* not use cache_dir when creating model

* fix tokenizer creation

* update README

* fix quality

* apply suggestion

* simplify some blocks

* Update examples/flax/image-captioning/README.md


* Update examples/flax/image-captioning/run_image_captioning_flax.py

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* apply suggestion

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2022-01-06 14:00:54 +01:00
2e9af29494 [CLIP] Fix TF test (#15042) 2022-01-05 16:58:42 +01:00
443fdaf29f [SpeechEncoderDecoder] Fix from pretrained (#15043) 2022-01-05 16:54:39 +01:00
ae929dcbbd [CLIP] Fix PT test (#15041) 2022-01-05 14:21:04 +01:00
65cb94ff77 Adding QoL for batch_size arg (like others enabled everywhere). (#15027)
* Adding QoL for `batch_size` arg (like others enabled everywhere).

* Typo.
2022-01-05 12:16:23 +01:00
e34dd055e9 Fix doc example: mask_time_indices (numpy) has no attribute 'to' (#15033)
* fix doc example - AttributeError: 'numpy.ndarray' object has no attribute 'to'

* fix more

* Apply suggestions from code review

* Update src/transformers/models/unispeech/modeling_unispeech.py

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-01-05 11:34:08 +01:00
927f654427 [megatron convert] PYTHONPATH requirements (#14956)
* [megatron convert] PYTHONPATH requirements

* more info
2022-01-05 04:09:52 -05:00
2380136722 add spaces badges 2022-01-04 16:13:57 -05:00
857ab55c01 [doc] Update parallelism.mdx (#15018)
* Update parallelism.mdx

* Update parallelism.mdx
2022-01-04 09:58:27 -08:00
19d37c2dd3 Hotfix chunk_length_s instead of _ms. (#15029)
* Hotfix `chunk_length_s` instead of `_ms`.

* Adding fix of `pad_token` which should be last/previous token for CTC

proper decoding

* Fixing ChunkPipeline unwrapping.

* Adding a PackIterator specific test.
2022-01-04 14:07:44 +01:00
21aecc0971 Add Flax RoFormer (#15005)
* Add FlaxRoFormer

* Clean code + make quality

* Fix output pooling for FlaxRoFormerForMultipleChoiceModule

* Apply suggestions from code review

* add flax model to repos

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2022-01-04 13:23:10 +01:00
9e1775dd23 Fix a little typo (#15002) 2022-01-04 12:59:47 +01:00
774ed4a027 Fix Code block (#14983) 2022-01-04 12:59:20 +01:00
f2ab21833f Update parallelism.mdx (#15013)
* Update parallelism.mdx

* Update parallelism.mdx

* Update parallelism.mdx

* Update parallelism.mdx

* Update parallelism.mdx

* Update parallelism.mdx

* Update parallelism.mdx

* Update parallelism.mdx
2022-01-03 11:49:27 -08:00
dbac8899fe [Tests] Correct Wav2Vec2 & WavLM tests (#15015)
* up

* up

* up
2022-01-03 20:19:04 +01:00
0b4c3a1a53 fix missing import (#15016)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2022-01-03 19:11:47 +01:00
38f95d1846 Large audio chunking for the existing ASR pipeline (#14896)
* Naive ASR chunking

* Fixing batching for ASR.

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2022-01-03 16:54:17 +01:00
d33dc7966a Improve truncation_side (#14947)
* Enabling `truncation_side` for Slow and Fast tokenizer.

Co-Authored-by: Niels Rogge <48327001+NielsRogge@users.noreply.github.com>

* Disable failing tests.

* Layout xlm.

* assert -> assertEqual.

Co-authored-by: Niels Rogge <48327001+NielsRogge@users.noreply.github.com>
2022-01-03 16:18:39 +01:00
8c2618e6aa Fixing t2t pipelines lists outputs. (#15008)
Backward compatibility broken in
https://github.com/huggingface/transformers/pull/14988
2022-01-03 14:49:58 +01:00
8f6373c61c Map model_type and doc pages names (#14944)
* Map model_type and doc pages names

* Add script

* Fix typo

* Quality

* Manual check for Auto

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2022-01-03 05:08:55 -05:00
e68c3756fe Allow training to resume even if RNG states are not properly loaded (#14994)
* Allow training to resume even if RNG states are not properly loaded

* Proper f-string
2021-12-30 17:03:20 -05:00
08cb5718ec Enabling tokenizers upgrade. (#14941)
* Enabling `tokenizers` upgrade.

* Moved ugly comment.

* Tokenizers==0.11.1 needs an update to keep borrow checker

happy in highly contiguous calls.

* Support both 0.11.1 and 0.11.0
2021-12-30 17:30:58 +01:00
f8a989cfb2 Adding num_return_sequences support for text2text generation. (#14988)
* Adding `num_return_sequences` support for text2text generation.

Co-Authored-By: Enze <pu.miao@foxmail.com>

* Update tests/test_pipelines_text2text_generation.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_pipelines_text2text_generation.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Enze <pu.miao@foxmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-12-30 16:17:15 +01:00
c043ce6cfd [Generate] correct encoder_outputs are passed without attention_mask (#14980)
* [Generate] correct encoder_outputs are passed without attention_mask

* Apply suggestions from code review

* up
2021-12-30 10:16:03 +01:00
a1392883ce [AutoProcessor] Correct AutoProcessor and automatically add processor… (#14881)
* [AutoProcessor] Correct AutoProcessor and automatically add processor class

* up

* up

* up

* up

* up

* up

* up

* up

* continue tomorrow

* up

* up

* up

* make processor class private

* fix loop
2021-12-30 09:56:43 +01:00
d7d60df0ec Fixing a pathological case for slow tokenizers (#14981)
* Fixing a pathological case for slow tokenizers

* Update src/transformers/tokenization_utils.py
2021-12-30 09:10:34 +01:00
d1ba56d8d8 remove absl workaround as it's no longer needed (#14909)
the absl workaround hasn't been needed since 2019-04 https://github.com/abseil/abseil-py/issues/99 so it should be safe to remove it.
2021-12-29 17:18:03 -05:00
04cddaf402 refactor: replace assert with ValueError (#14970) 2021-12-29 10:09:54 -05:00
600496fa50 [Wav2Vec2] Rename model's feature extractor to feature encoder (#14959)
* rename classes

* clean up more namings

* remove bogus file

* Apply suggestions from code review

* Apply suggestions from code review

* replace more names

* more regex replace

* make style

* correct

* correct more

* make style

* finish

* correct more in wav2vec2

* make style

* improve freeze_extractor

* add aliases

* add tf aliases
2021-12-28 20:33:23 +01:00
1bfa347707 [Tests] Speed up tokenizer tests (#14964)
* speed up canine and mluke

* speed up mbart and mbart50 toks

* upload files
2021-12-28 17:02:50 +01:00
f80775df2b Update README.md (#14965) 2021-12-28 13:41:27 +01:00
1e847b40c0 [WavLM] give model for precision (#14958) 2021-12-28 11:07:05 +01:00
1c121916f3 Add Speech Seq2Seq Training script (#14792)
* start

* add gradient checkpointing and feature extractor freezing

* Apply suggestions from code review

* up

* up

* up

* correct

* up

* more changes

* up

* up

* up

* remove rst
2021-12-28 10:20:51 +01:00
10fd4fa1a6 [doc] :class: hunt (#14955)
* [doc] :class: hunt

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix the fix + style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-12-27 17:17:38 -08:00
2c5597f6c7 Style 2021-12-27 19:18:08 -05:00
b5e2b183af Doc styler examples (#14953)
* Fix bad examples

* Add black formatting to style_doc

* Use first nonempty line

* Put it at the right place

* Don't add spaces to empty lines

* Better templates

* Deal with triple quotes in docstrings

* Result of style_doc

* Enable mdx treatment and fix code examples in MDXs

* Result of doc styler on doc source files

* Last fixes

* Break copy from
2021-12-27 19:07:46 -05:00
e13f72fbff [doc] :obj: hunt (#14954)
* redo sans examples

* style
2021-12-27 15:49:48 -08:00
133c5e40c4 [doc] consistent True/False/None default format (#14951)
* [doc] consistent True/False/None default format

* Update src/transformers/models/xlnet/modeling_xlnet.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-12-27 14:31:40 -08:00
b2f500256e Convert last rst file (#14952) 2021-12-27 17:09:37 -05:00
87e6e4fe5c Doc styler v2 (#14950)
* New doc styler

* Fix issue with args at the start

* Code sample fixes

* Style code examples in MDX

* Fix more patterns

* Typo

* Typo

* More patterns

* Do without black for now

* Get more info in error

* Docstring style

* Re-enable check

* Quality

* Fix add_end_docstring decorator

* Fix docstring
2021-12-27 16:31:21 -05:00
c1138273d4 Fix duplicate call to save_checkpoint when using deepspeed (#14946)
* Fix duplicate call to save_checkpoint when using deepspeed / stage3_gather_fp16_weights_on_model_save

* Revert "Fix duplicate call to save_checkpoint when using deepspeed / stage3_gather_fp16_weights_on_model_save"

This reverts commit 6a3dec0397723a8417351dc38fdebf14ab17756c.

* Delete correct duplicate invocation of deepspeed save_checkpoint
2021-12-27 11:25:26 -08:00
03885a3f50 fix to issue #14833 in data_collator - consider no labels (#14930) 2021-12-27 11:48:48 -05:00
501307b58b Add ElectraForCausalLM -> Enable Electra encoder-decoder model (#14729)
* Add ElectraForCausalLM and cover some basic tests & need to fix a few tests

* Fix bugs

* make style

* make fix-copies

* Update doc

* Change docstring to markdown format

* Remove redundant update_keys_to_ignore
2021-12-27 12:37:52 +01:00
b058490ceb ChunkPipeline (batch_size enabled on zero-cls and qa pipelines. (#14225)
* Pipeline chunks.

* Batching for Chunking pipelines ?

* Batching for `question-answering` and `zero-shot-cls`.

* Fixing for FNet.

* Making ASR a chunk pipeline.

* Chunking ASR API.

* doc style.

* Fixing ASR test.

* Fixing QA eror (p_mask, padding is 1, not 0).

* Enable both vad and simple chunking.

* Max length for vad.

* remove inference mode, crashing on s2t.

* Revert ChunkPipeline for ASRpipeline.

Too many knobs for simple integration within the pipeline, better stick
to external convenience functions instead, more control to be had,
simpler pipeline and also easier to replace with other things later.

* Drop necessity for PT for these.

* Enabling generators.

* Add mic + cleanup.

* Typo.

* Typo2.

* Remove ASR work, it does not belong in this PR anymore.

* Update src/transformers/pipelines/pt_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/pipelines/zero_shot_classification.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Adding many comments.

* Doc quality.

* `hidden_states` handling.

* Adding doc.

* Bad rebase.

* Autofixing docs.

* Fixing CRITICAL bug in the new Zerocls pipeline.

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-12-27 11:26:20 +01:00
705ca7f21b Fix Perceiver docs (#14917) 2021-12-24 11:28:47 +01:00
116829900a [WavLM] fix wavlm docs (#14910) 2021-12-23 23:17:20 +01:00
415810664b [doc] install - add jax (#14912)
As `jax` cuda requires special instructions to be installed correctly add a link to jax installation instructions. 

Note: Flax install page only covers cpu jax installation info.
2021-12-23 13:12:59 -08:00
676643c6d6 Better logic for getting tokenizer config in AutoTokenizer (#14906)
* Better logic for getting tokenizer config in AutoTokenizer

* Remove needless import

* Remove debug statement

* Address review comments
2021-12-23 14:18:07 -05:00
f566c6e3b7 Fix failing GPU trainer tests (#14903)
* Fix failing GPU trainer tests

* Remove print statements
2021-12-23 13:59:33 -05:00
fe4197ab11 [Generate] Remove attention_mask and integrate model_main_input_name (#14856)
* up

* save

* correct

* up

* correct more

* up

* up

* up

* up

* up

* correct

* fix tf

* fix

* remove tokenizer
2021-12-23 19:43:37 +01:00
86b40073e9 [doc] post-porting (#14890)
found a few oddities:

1. https://huggingface.co/docs/transformers/main_classes/logging#transformers.utils.logging.enable_explicit_format
has a :: - this PR fixes it

2.  this looks borked too:
https://huggingface.co/docs/transformers/main_classes/logging#transformers.utils.logging.set_verbosity
 has a <

but I'm not sure where this one is coming from
2021-12-23 10:19:34 -08:00
ee55ea692b Update diarization and WavLM tolerances (#14902) 2021-12-23 19:53:56 +03:00
ef47d4f848 [AutoTokenizer] Fix incorrect from pretrained (#14900) 2021-12-23 17:22:33 +01:00
8f2cc1c3ab Add TFCLIPModel (#13967)
* Start the work for TFCLIPModel

* Convert to TF code (TODO: loss + doc)

* Clean up

* Fix pooled_output for TFCLIPTextTransformer - using tf.gather_nd

* assert -> raise error

* Expose TFCLIPModel

* Deal with dummy_inputs

* Add tests

* Fix all tests. TODO: manual check weight loading + add more comments

* Fix pt tf equivalence test

* fixes

* update TFCLIPVisionEmbeddings's Conv2D

* Fix loss + overwrite test_pt_tf_model_equivalence from common

* Add a comment about the change about MainLayer in test_keras_save_load

* Set return_loss=True in TFCLIPModelTester + make tests pass

* overwrite test_pt_tf_model_equivalence from tf common

* fix base_model_prefix

* Fix examples

* remove unused

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply review suggestions

* change self.pre_layrnorm to self.pre_layernorm

* apply more review suggestions

* return attention probs before dropout (to align with PT)

* fix weight init

* fix

* build doc

* fix missing doc

* fix for test

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-12-23 11:19:44 -05:00
2d30443cd3 Set run_name in MLflowCallback (#14894)
* Set run_name in MLflowCallback

* Update the docs for `run_name` argument
2021-12-23 10:53:33 -05:00
1d651868d6 add custom stopping criteria to human eval script (#14897) 2021-12-23 14:59:11 +01:00
6b655cc63f Add ONNX support for MarianMT models (#14586)
* First commit to add MarianMT to ONNX

* Now MarianModel.forward() automatically generates decoder_input_ids, like BartModel.forward()

* Adjusted MarianOnnxConfig.inputs and outputs to work with seq2seq-lm feature

* Style fix

* Added support for other features for already supported models

* Partial support for causal and seq2seq models

* Partial support for causal and seq2seq models

* Add default task for MarianMT ONNX

* Remove automatic creation of decoder_input_ids

* Extend inputs and outputs for MarianMT ONNX config

* Add MarianMT to ONNX unit tests

* Refactor

* OnnxSeq2SeqConfigWithPast to support seq2seq models

* Parameterized the onnx tests

* Restored run_mlm.py

* Restored run_mlm.py

* [WIP] BART update

* BART and MBART

* Add past_key_values and fix dummy decoder inputs

Using a sequence length of 1 in generate_dummy_outputs() produces large discrepancies, presumably due to some hidden optimisations.

* Refactor MarianOnnxConfig to remove custom past_key_values logic

* Fix quality

* Revert "Revert "Added support for other features for already supported models (#14358)" (#14679)"

This reverts commit 0f4e39c5599523c110cd713f60a3bfa145dad807.

* is_torch_available test to avoid failing imports

* sorting parameterize parameters to solve ERROR gw0 gw1

* tests fix

* tests fix

* GPT2 with past fix

* Fixed stateful class attribute change that was breaking things when converting multiple models sequentially

* Removed onnx file

* Refactor Marian export to account for base changes

* Fix copies

* Implemented suggestions

* Extend support for causal LM

* Revert "Revert "Added support for other features for already supported models (#14358)" (#14679)"

This reverts commit 0f4e39c5599523c110cd713f60a3bfa145dad807.

* is_torch_available test to avoid failing imports

* sorting parameterize parameters to solve ERROR gw0 gw1

* tests fix

* tests fix

* GPT2 with past fix

* Fixed stateful class attribute change that was breaking things when converting multiple models sequentially

* Removed onnx file

* Implemented suggestions

* Fixed __init__ to resolve conflict with master

* Revert "Revert "Added support for other features for already supported models (#14358)" (#14679)"

This reverts commit 0f4e39c5599523c110cd713f60a3bfa145dad807.

* is_torch_available test to avoid failing imports

* sorting parameterize parameters to solve ERROR gw0 gw1

* tests fix

* tests fix

* GPT2 with past fix

* Fixed stateful class attribute change that was breaking things when converting multiple models sequentially

* Removed onnx file

* Implemented suggestions

* Fixed __init__ to resolve conflict with master

* Remove commented import

* Remove ONNX model

* Remove redundant class method

* Tidy up imports

* Fix quality

* Refactor dummy input function

* Add copied from statements to Marian config functions

* Remove false copied from comments

* Fix copy from comment

Co-authored-by: Massimiliano Bruni <massimiliano.bruni@hcl.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
2021-12-23 13:35:56 +01:00
6a7b9da2ae Add 'with torch.no_grad()' to integration test forward pass (#14808) 2021-12-23 04:23:39 -05:00
d8c09c6541 Fix AttributeError from PreTrainedTokenizerFast.decoder (#14691) 2021-12-23 04:19:25 -05:00
4210579522 Fix doc examples: ... takes no keyword arguments (#14701)
* Fix doc examples: ... takes no keyword arguments

* fix copies

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-12-23 04:07:21 -05:00
355dc0ce67 Fix installation instructions for BART ONNX example (#14885) 2021-12-23 04:05:32 -05:00
207594be81 Convert rst files (#14888)
* Convert all tutorials and guides

* Convert all remaining rst to mdx

* Track and fix bad links
2021-12-22 16:14:35 -05:00
b0c7d2ec58 Keras metric callback (#14867)
* Working on splitting out labels

* First working version

* Fixed concatenation of outputs and labels

* val_dataset -> eval_dataset

* Only pass input arrays in tokenizer.model_input_names

* Only pass input arrays in tokenizer.model_input_names

* Only remove unexpected keys when predict_with_generate is True

* Adding proper docstring

* Adding example to docstring

* Add a proper ROUGE metric example

* Add a proper ROUGE metric example

* Add version checking

* Update src/transformers/keras_callbacks.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/keras_callbacks.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/keras_callbacks.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/keras_callbacks.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Remove requirement for tokenizer with predict_with_generate

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-12-22 20:35:39 +00:00
fa39ff9fc4 Docs for v4.16.0dev0 2021-12-22 20:39:44 +01:00
05fa1a7ac1 Release: v4.15.0 2021-12-22 18:43:15 +01:00
87a033d9fa Properly indent return block (#14887) 2021-12-22 12:28:45 -05:00
13504dcbea Onnx enable tasks for supported models (part 2) (#14700)
* Revert "Revert "Added support for other features for already supported models (#14358)" (#14679)"

This reverts commit 0f4e39c5599523c110cd713f60a3bfa145dad807.

* is_torch_available test to avoid failing imports

* sorting parameterize parameters to solve ERROR gw0 gw1

* tests fix

* tests fix

* GPT2 with past fix

* Fixed stateful class attribute change that was breaking things when converting multiple models sequentially

* Removed onnx file

* Implemented suggestions

* Fixed __init__ to resolve conflict with master

* Remove commented import
2021-12-22 14:43:11 +01:00
1045a36c1f Fix pytorch image classification example (#14883)
* Update example

* Remove skip in tests
2021-12-22 14:42:19 +01:00
7df4b90c76 Fix Perceiver docs (#14879) 2021-12-22 14:18:03 +01:00
e37bc579fc Fix typo in error message 2021-12-22 08:19:36 -05:00
17efc806b4 IterableDatasetShard should use per device batch size instead of real batch size (#14714) 2021-12-22 07:52:07 -05:00
2a56edb321 Updated deberta attention (#14625)
* Removed unused p2p attention handling

* Updated DeBERTa configuration

* Updated TF DeBERTa attention

* Rolled back accidental comment deletion

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-12-22 07:36:08 -05:00
824fd44fc3 Feature/fix slow test in mluke (#14749)
* make MLukeTokenizerTest fast

* make LukeTokenizerTest fast

* add entry to _toctree.yaml
2021-12-22 06:35:59 -05:00
c94c1b8967 update the arguments add_prefix_space and trim_offsets in backend_tokenizer.post_processor of RobertaTokenizerFast (#14752)
* add tests

* change post-processor, pre-tokenizer and decoder (can't update decoder)

* update test (remove decoder which doesn't depend on trim and add_prefix)

* just update the post_processor

* fix change

* `trim_offsets` has no influence on `pre_tokenizer`

* remove a test that need some input from the `tokenizers` lib maintainers

* format

* add new test offsets roberta

* polish comments
2021-12-22 10:51:55 +01:00
ec3567fe20 Convert model files from rst to mdx (#14865)
* First pass

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-12-22 03:27:30 -05:00
d0422de563 Fix doc mistakes (#14874)
* Remove double returns

* Last fixes

* Quality

* Last fix for Lxmert
2021-12-21 18:54:41 -05:00
e846a56ca4 Fix FlaxMarianMTModel return block. (#14873)
* Fixes in marian doc

* Another time

* Add return block in FlaxMarianMTModel
2021-12-21 17:57:37 -05:00
a6b7b47a39 Fixes in marian doc (#14872)
* Fixes in marian doc

* Another time
2021-12-21 17:17:02 -05:00
eec9c8bbd7 Fix FLAX_MULTIPLE_CHOICE_SAMPLE typo (#14871) 2021-12-21 16:54:10 -05:00
e51c7b5872 Skip failing test 2021-12-21 15:15:17 -05:00
27b3031de2 Mass conversion of documentation from rst to Markdown (#14866)
* Convert docstrings of all configurations and tokenizers

* Processors and fixes

* Last modeling files and fixes to models

* Pipeline modules

* Utils files

* Data submodule

* All the other files

* Style

* Missing examples

* Style again

* Fix copies

* Say bye bye to rst docstrings forever
2021-12-21 15:06:33 -05:00
185876392c [doc porting] several docs (#14858)
* [doc porting] 2 docs

* [doc porting] 2 docs

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/main_classes/deepspeed.mdx

* cleanup

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-12-21 09:55:25 -08:00
033c3ed95a [examples/summarization] deal with None in data records (#14816)
* [examples/summarization] deal with None in data records

* rewrite to use a simpler (slower) variant
2021-12-21 09:17:28 -08:00
c075fb7855 Replace commit sha by commit url for update jobs (#14852)
* Replace commit sha by commit url for update jobs

* Typo

* Update .github/workflows/build_documentation.yml

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Apply review comments

Co-authored-by: Julien Chaumond <julien@huggingface.co>
2021-12-21 11:17:11 -05:00
5722d05831 Add custom stopping_criteria and logits_processor to generate (#14779)
* add custom `stopping_criteria` and `logits_processor` to `generate`

* add tests for custom `stopping_criteria` and `logits_processor`

* fix typo in RAG

* address reviewer comments

* improve custom logits processor/stopping criteria error message

* fix types in merge function signature

* change default for custom list from `None` to empty list

* fix rag generate

* add string split suggestion

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-12-21 16:47:41 +01:00
Zed
0062058399 Fix the value error typo of AdamW's betas' valid values checking (#14780)
* Fix the value error typo of AdamW's betas value check

* error fixed
2021-12-21 09:44:09 -05:00
7ae6f07004 [ASR example] Improve example + add more examples (#14848)
* up

* load up

* up
2021-12-21 13:12:22 +01:00
97ec17f73b Only create the model card on process 0 (#14857) 2021-12-21 06:34:47 -05:00
b513ec8bbd [Bart] better error message (#14854) 2021-12-21 11:57:42 +01:00
7af80f6618 Convert docstrings of modeling files (#14850)
* Convert file_utils docstrings to Markdown

* Test on BERT

* Return block indent

* Temporarily disable doc styler

* Remove from quality checks as well

* Remove doc styler mess

* Remove check from circleCI

* Fix typo

* Convert file_utils docstrings to Markdown

* Test on BERT

* Return block indent

* Temporarily disable doc styler

* Remove from quality checks as well

* Remove doc styler mess

* Remove check from circleCI

* Fix typo

* Let's go on all other model files

* Add templates too

* Styling and quality
2021-12-21 05:37:32 -05:00
2a33734606 Make the onnx submodule init lazy (#14855)
* Use lazy init for onnx submodule

* Remove debug statements
2021-12-21 03:11:25 -05:00
b6ec956976 [logging] implement warning_advice / TRANSFORMERS_NO_ADVISORY_WARNINGS (#14669)
* [logging] implement warning_advice / TRANSFORMERS_NO_ADVISORY_WARNINGS

* reword
2021-12-20 20:48:38 -08:00
c1125dc2ba [doc] typo (#14849)
fix small typo
2021-12-20 12:20:21 -05:00
33f36c869f Add a main_input_name attribute to all models (#14803)
* Add a main_input_name attribute to all models

* Fix tests

* Wtf Vs Code?

* Update src/transformers/models/imagegpt/modeling_imagegpt.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Style

* Fix copies

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-12-20 11:19:08 -05:00
0940e9b242 Add 'with torch.no_grad()' to integration test forward pass (#14820) 2021-12-20 09:28:17 -05:00
b37cf7dee4 Add 'with torch.no_grad()' to integration test forward pass (#14821) 2021-12-20 09:25:34 -05:00
952a77b05d [Perceiver] Skip multi-gpu tests for now (#14813)
* [Perceiver] Skip multi-gpu tests for now

* Update tests/test_modeling_perceiver.py

* up

* up
2021-12-20 15:22:50 +01:00
8a818c26cb Fix dead link to benchmarks.ipynb (#14842)
Notebook has been updated here https://github.com/huggingface/notebooks/tree/master/examples/benchmark.ipynb
2021-12-20 09:08:05 -05:00
1b0ca7d270 Update CONTRIBUTING.md (#14835)
fix cmd typo
2021-12-20 08:42:03 -05:00
1531b31978 Add an argument to set bucket_cap_mb for PyTorch DDP (#14756)
* [trainer] Set bucket_cap_mb for DDP from arguments

* Put find_unused_parameters into kwargs
2021-12-20 08:41:40 -05:00
3883e3a75e Add SD and SV heads for WavLM (#14847)
* Add converted heads

* Add dummies
2021-12-20 16:40:56 +03:00
cd583bdaa5 [WavLM] Fix slow tests (#14845) 2021-12-20 12:06:42 +01:00
281e1fba75 up (#14829) 2021-12-20 11:47:32 +01:00
091693b494 [Seq2SeqTrainer] Remove model input name hack (#14802)
* [Seq2SeqTrainer] Remove model input name hack

* Update src/transformers/trainer_seq2seq.py

* make style

* finish
2021-12-20 10:53:48 +01:00
84ea427f46 [ImageGPT] Deprecate pixel_values input name to input_ids (#14801)
* [ImageGPT] Deprecate pixel_values input name to input_ids

* up

* Apply suggestions from code review

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* correct

* finish

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2021-12-17 20:05:22 +01:00
c4a96cecbc Wav2Vec2 meets phonemes (#14353)
* up

* add tokenizer

* improve more

* finish tokenizer

* finish

* adapt speech recognition script

* adapt convert

* more fixes

* more fixes

* update phonemizer wav2vec2

* better naming

* fix more tests

* more fixes swedish

* correct tests

* finish

* improve script

* remove file

* up

* lets get those 100 model architectures until the end of the month

* make fix-copies

* correct more

* correct script

* more fixes

* more fixes

* add to docs

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* replace assert

* fix copies

* fix docs

* new try docs

* boom boom

* update

* add phonemizer to audio tests

* make fix-copies

* up

* upload models

* some changes

* Update tests/test_tokenization_wav2vec2_phoneme.py

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* more fixes

* remove @

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
2021-12-17 19:56:44 +01:00
77d6c826d8 Convert rst to mdx bert (#14806)
* BERT to mdx
mdx :)
c

* Update docs/source/model_doc/bert.mdx

Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Remove all
Co-authored-by: sgugger <sylvain.gugger@gmail.com>

Co-authored-by: Julien Chaumond <julien@huggingface.co>
2021-12-17 11:13:34 -05:00
0b4ea79a0c Trigger doc building 2021-12-17 11:14:18 -05:00
ff066119ca Implement head_mask for Flax BERT and other models copied from BERT (#14620)
* Implement head_mask for Flax BERT and other models copied from BERT

* Remove `from jax._src.nn.functions import sigmoid`

Remove `from jax._src.nn.functions import sigmoid` unintentionally added by IDE

* Remove no more valid copy statement

* Apply patil-suraj's suggestions from code review

* Apply suggestions from the code review

* Update Flax template

* Fix a typo

* Also update template for CausalLM modules
2021-12-17 17:06:59 +01:00
95119ad7b0 [Generate] Correct input_ids detection (#14815)
* [Generate] Correct input_ids detection

* correct
2021-12-17 16:08:54 +01:00
bdbe3df869 [WavLM] Layerdrop is not allowed for first layer (#14811)
* [WavLM] Layerdrop is not allowed for first layer

* Apply suggestions from code review
2021-12-17 13:30:18 +01:00
cbf036f7ae Add test (#14810) 2021-12-17 04:33:27 -05:00
c4a0fb5199 [WavLM] Correct position bias computation (#14805) 2021-12-16 22:42:57 +01:00
d194d639ab Remove datasets requirement (#14795) 2021-12-16 14:34:14 -05:00
bef1e3e4a0 Add WavLM (#14354)
* first commit

* fix some stuff

* fix more readme

* Apply suggestions from code review

* update

* correct

* up

* attn layer works

* push code

* make modedls work

* Small change

* more refactor

* finish

* up

* fix convertsion

* fix position bias

* Fix style

* fix conversion

* make fix-copies

* add

* clean

* fix docs

* fix

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply final changes

* make fix-copies

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-12-16 18:57:05 +01:00
b18d8534ea [Generate] Make generate multi-modal (#14784)
* finish refactor

* refactor

* add tests

* add more tests

* up

* finish tests

* finish

* up

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* improve docstring

* fix docs

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-12-16 18:03:55 +01:00
48463ebb33 Add Speaker Diarization and Verification heads (#14723)
* Models

* Squashed commit of the following:

commit 72278e1e931a16d0879acc77f65762f3364833d0
Author: anton-l <aglozhkov@gmail.com>
Date:   Fri Dec 10 21:45:08 2021 +0300

* Add unispeech heads

* Add sd/sv automodels

* Docs cleanup

* Fix docstrings

* rename xvector classes

* examples

* Tests cleanup

* Style

* Better checkpoints for tests

* leftover docs

* apply review suggestions

* Style + init tests

* Update unispeech-sat tdnn downsampling
2021-12-16 19:22:14 +03:00
2e07180cba Train step fix (#14796)
* Fix for TF train step when no "labels" key in input

* make style
2021-12-16 16:08:13 +00:00
465a8b8d10 Update CONTRIBUTING.md (#14800)
fix pip installation cmd
2021-12-16 10:40:56 -05:00
8ae24e19b2 Update CONTRIBUTING.md (#14799)
typo
2021-12-16 10:24:26 -05:00
12e1b4c6df Fix the build documentation job (#14788)
* Fix the build documentation job

* Fix install

* Address review comment
2021-12-16 09:35:20 -05:00
5061a9fd55 Post sphinx-clean up and contributing guide updates (#14790)
* Clean up sphinx

* Update contributing guide

* Update docs README

* No example title

* Fix copies

* Update CONTRIBUTING.md

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-12-16 09:29:26 -05:00
8010fda9bf Removes images to put them in a dataset (#14781)
* First try

* Update instructions
2021-12-16 04:42:02 -05:00
459677aebe PoC for conserving old links (#14754)
* PoC for conserving old links

* Do the same for other links

* remap the redirects section

* add instructions on how to move sections

* improve

Co-authored-by: Stas Bekman <stas@stason.org>
2021-12-15 11:40:47 -08:00
c40ecfd740 Move import (#14787) 2021-12-15 13:34:42 -05:00
7c9c41f43c Docs for v4.14.0 2021-12-15 18:29:53 +01:00
1607 changed files with 160493 additions and 75018 deletions

View File

@ -78,7 +78,7 @@ jobs:
keys: keys:
- v0.4-torch_and_tf-{{ checksum "setup.py" }} - v0.4-torch_and_tf-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev - run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[sklearn,tf-cpu,torch,testing,sentencepiece,torch-speech,vision] - run: pip install .[sklearn,tf-cpu,torch,testing,sentencepiece,torch-speech,vision]
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html - run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
@ -99,7 +99,7 @@ jobs:
path: ~/transformers/tests_output.txt path: ~/transformers/tests_output.txt
- store_artifacts: - store_artifacts:
path: ~/transformers/reports path: ~/transformers/reports
run_tests_torch_and_tf_all: run_tests_torch_and_tf_all:
working_directory: ~/transformers working_directory: ~/transformers
docker: docker:
@ -116,7 +116,7 @@ jobs:
keys: keys:
- v0.4-torch_and_tf-{{ checksum "setup.py" }} - v0.4-torch_and_tf-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev - run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[sklearn,tf-cpu,torch,testing,sentencepiece,torch-speech,vision] - run: pip install .[sklearn,tf-cpu,torch,testing,sentencepiece,torch-speech,vision]
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html - run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
@ -149,7 +149,7 @@ jobs:
keys: keys:
- v0.4-torch_and_flax-{{ checksum "setup.py" }} - v0.4-torch_and_flax-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev - run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[sklearn,flax,torch,testing,sentencepiece,torch-speech,vision] - run: pip install .[sklearn,flax,torch,testing,sentencepiece,torch-speech,vision]
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html - run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
@ -169,7 +169,7 @@ jobs:
path: ~/transformers/tests_output.txt path: ~/transformers/tests_output.txt
- store_artifacts: - store_artifacts:
path: ~/transformers/reports path: ~/transformers/reports
run_tests_torch_and_flax_all: run_tests_torch_and_flax_all:
working_directory: ~/transformers working_directory: ~/transformers
docker: docker:
@ -186,7 +186,7 @@ jobs:
keys: keys:
- v0.4-torch_and_flax-{{ checksum "setup.py" }} - v0.4-torch_and_flax-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev - run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[sklearn,flax,torch,testing,sentencepiece,torch-speech,vision] - run: pip install .[sklearn,flax,torch,testing,sentencepiece,torch-speech,vision]
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html - run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
@ -217,7 +217,7 @@ jobs:
keys: keys:
- v0.4-torch-{{ checksum "setup.py" }} - v0.4-torch-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev - run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[sklearn,torch,testing,sentencepiece,torch-speech,vision,timm] - run: pip install .[sklearn,torch,testing,sentencepiece,torch-speech,vision,timm]
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html - run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
@ -237,7 +237,7 @@ jobs:
path: ~/transformers/tests_output.txt path: ~/transformers/tests_output.txt
- store_artifacts: - store_artifacts:
path: ~/transformers/reports path: ~/transformers/reports
run_tests_torch_all: run_tests_torch_all:
working_directory: ~/transformers working_directory: ~/transformers
docker: docker:
@ -253,7 +253,7 @@ jobs:
keys: keys:
- v0.4-torch-{{ checksum "setup.py" }} - v0.4-torch-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev - run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[sklearn,torch,testing,sentencepiece,torch-speech,vision,timm] - run: pip install .[sklearn,torch,testing,sentencepiece,torch-speech,vision,timm]
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html - run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
@ -284,7 +284,7 @@ jobs:
keys: keys:
- v0.4-tf-{{ checksum "setup.py" }} - v0.4-tf-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev - run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[sklearn,tf-cpu,testing,sentencepiece,tf-speech,vision] - run: pip install .[sklearn,tf-cpu,testing,sentencepiece,tf-speech,vision]
- run: pip install tensorflow_probability - run: pip install tensorflow_probability
@ -304,7 +304,7 @@ jobs:
path: ~/transformers/tests_output.txt path: ~/transformers/tests_output.txt
- store_artifacts: - store_artifacts:
path: ~/transformers/reports path: ~/transformers/reports
run_tests_tf_all: run_tests_tf_all:
working_directory: ~/transformers working_directory: ~/transformers
docker: docker:
@ -320,7 +320,7 @@ jobs:
keys: keys:
- v0.4-tf-{{ checksum "setup.py" }} - v0.4-tf-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev - run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[sklearn,tf-cpu,testing,sentencepiece,tf-speech,vision] - run: pip install .[sklearn,tf-cpu,testing,sentencepiece,tf-speech,vision]
- run: pip install tensorflow_probability - run: pip install tensorflow_probability
@ -351,7 +351,7 @@ jobs:
keys: keys:
- v0.4-flax-{{ checksum "setup.py" }} - v0.4-flax-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev - run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[flax,testing,sentencepiece,flax-speech,vision] - run: pip install .[flax,testing,sentencepiece,flax-speech,vision]
- run: pip install https://github.com/kpu/kenlm/archive/master.zip - run: pip install https://github.com/kpu/kenlm/archive/master.zip
@ -370,7 +370,7 @@ jobs:
path: ~/transformers/tests_output.txt path: ~/transformers/tests_output.txt
- store_artifacts: - store_artifacts:
path: ~/transformers/reports path: ~/transformers/reports
run_tests_flax_all: run_tests_flax_all:
working_directory: ~/transformers working_directory: ~/transformers
docker: docker:
@ -386,7 +386,7 @@ jobs:
keys: keys:
- v0.4-flax-{{ checksum "setup.py" }} - v0.4-flax-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev - run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[flax,testing,sentencepiece,vision,flax-speech] - run: pip install .[flax,testing,sentencepiece,vision,flax-speech]
- run: pip install https://github.com/kpu/kenlm/archive/master.zip - run: pip install https://github.com/kpu/kenlm/archive/master.zip
@ -417,7 +417,7 @@ jobs:
keys: keys:
- v0.4-torch-{{ checksum "setup.py" }} - v0.4-torch-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev - run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[sklearn,torch,testing,sentencepiece,torch-speech,vision,timm] - run: pip install .[sklearn,torch,testing,sentencepiece,torch-speech,vision,timm]
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html - run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
@ -437,7 +437,7 @@ jobs:
path: ~/transformers/tests_output.txt path: ~/transformers/tests_output.txt
- store_artifacts: - store_artifacts:
path: ~/transformers/reports path: ~/transformers/reports
run_tests_pipelines_torch_all: run_tests_pipelines_torch_all:
working_directory: ~/transformers working_directory: ~/transformers
docker: docker:
@ -454,7 +454,7 @@ jobs:
keys: keys:
- v0.4-torch-{{ checksum "setup.py" }} - v0.4-torch-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev - run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[sklearn,torch,testing,sentencepiece,torch-speech,vision,timm] - run: pip install .[sklearn,torch,testing,sentencepiece,torch-speech,vision,timm]
- run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html - run: pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.10.0+cpu.html
@ -549,7 +549,7 @@ jobs:
- v0.4-custom_tokenizers-{{ checksum "setup.py" }} - v0.4-custom_tokenizers-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[ja,testing,sentencepiece,jieba] - run: pip install .[ja,testing,sentencepiece,jieba,spacy,ftfy]
- run: python -m unidic download - run: python -m unidic download
- save_cache: - save_cache:
key: v0.4-custom_tokenizers-{{ checksum "setup.py" }} key: v0.4-custom_tokenizers-{{ checksum "setup.py" }}
@ -557,7 +557,11 @@ jobs:
- '~/.cache/pip' - '~/.cache/pip'
- run: | - run: |
if [ -f test_list.txt ]; then if [ -f test_list.txt ]; then
python -m pytest -s --make-reports=tests_custom_tokenizers ./tests/test_tokenization_bert_japanese.py | tee tests_output.txt python -m pytest -s --make-reports=tests_custom_tokenizers ./tests/test_tokenization_bert_japanese.py ./tests/test_tokenization_openai.py | tee tests_output.txt
fi
- run: |
if [ -f test_list.txt ]; then
python -m pytest -n 1 tests/test_tokenization_clip.py --dist=loadfile -s --make-reports=tests_tokenization_clip --durations=100 | tee tests_output.txt
fi fi
- store_artifacts: - store_artifacts:
path: ~/transformers/tests_output.txt path: ~/transformers/tests_output.txt
@ -579,7 +583,7 @@ jobs:
keys: keys:
- v0.4-torch_examples-{{ checksum "setup.py" }} - v0.4-torch_examples-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev - run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[sklearn,torch,sentencepiece,testing,torch-speech] - run: pip install .[sklearn,torch,sentencepiece,testing,torch-speech]
- run: pip install -r examples/pytorch/_tests_requirements.txt - run: pip install -r examples/pytorch/_tests_requirements.txt
@ -614,7 +618,7 @@ jobs:
keys: keys:
- v0.4-torch_examples-{{ checksum "setup.py" }} - v0.4-torch_examples-{{ checksum "setup.py" }}
- v0.4-{{ checksum "setup.py" }} - v0.4-{{ checksum "setup.py" }}
- run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev - run: sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev espeak-ng
- run: pip install --upgrade pip - run: pip install --upgrade pip
- run: pip install .[sklearn,torch,sentencepiece,testing,torch-speech] - run: pip install .[sklearn,torch,sentencepiece,testing,torch-speech]
- run: pip install -r examples/pytorch/_tests_requirements.txt - run: pip install -r examples/pytorch/_tests_requirements.txt
@ -662,7 +666,7 @@ jobs:
path: ~/transformers/flax_examples_output.txt path: ~/transformers/flax_examples_output.txt
- store_artifacts: - store_artifacts:
path: ~/transformers/reports path: ~/transformers/reports
run_examples_flax_all: run_examples_flax_all:
working_directory: ~/transformers working_directory: ~/transformers
docker: docker:
@ -729,7 +733,7 @@ jobs:
path: ~/transformers/tests_output.txt path: ~/transformers/tests_output.txt
- store_artifacts: - store_artifacts:
path: ~/transformers/reports path: ~/transformers/reports
run_tests_hub_all: run_tests_hub_all:
working_directory: ~/transformers working_directory: ~/transformers
docker: docker:
@ -795,7 +799,7 @@ jobs:
path: ~/transformers/tests_output.txt path: ~/transformers/tests_output.txt
- store_artifacts: - store_artifacts:
path: ~/transformers/reports path: ~/transformers/reports
run_tests_onnxruntime_all: run_tests_onnxruntime_all:
working_directory: ~/transformers working_directory: ~/transformers
docker: docker:

View File

@ -49,7 +49,7 @@ Library:
- Deepspeed: @stas00 - Deepspeed: @stas00
- Ray/raytune: @richardliaw, @amogkam - Ray/raytune: @richardliaw, @amogkam
- Text generation: @patrickvonplaten @narsil - Text generation: @patrickvonplaten @narsil
- Tokenizers: @LysandreJik - Tokenizers: @SaulLu
- Trainer: @sgugger - Trainer: @sgugger
- Pipelines: @Narsil - Pipelines: @Narsil
- Speech: @patrickvonplaten, @anton-l - Speech: @patrickvonplaten, @anton-l

61
.github/workflows/add-model-like.yml vendored Normal file
View File

@ -0,0 +1,61 @@
name: Add model like runner
on:
push:
branches:
- master
pull_request:
paths:
- "src/**"
- "tests/**"
- ".github/**"
types: [opened, synchronize, reopened]
jobs:
run_tests_templates_like:
name: "Add new model like template tests"
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: ~/.cache/pip
key: v1-tests_model_like
restore-keys: |
v1-tests_model_like-${{ hashFiles('setup.py') }}
v1-tests_model_like
- name: Install dependencies
run: |
pip install --upgrade pip!=21.3
pip install -U click # Click 7 is installed in the environment by default, but we need at least version 8 for Black
sudo apt -y update && sudo apt install -y libsndfile1-dev
pip install .[dev]
- name: Create model files
run: |
transformers-cli add-new-model-like --config_file tests/fixtures/add_distilbert_like_config.json --path_to_repo .
make style
make fix-copies
- name: Run all PyTorch modeling test
run: |
python -m pytest -n 2 --dist=loadfile -s --make-reports=tests_new_models tests/bert_new/test_modeling_bert_new.py
- name: Run style changes
run: |
make style && make quality && make repo-consistency
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_new_models/failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_new_models_test_reports
path: reports/tests_new_models

View File

@ -0,0 +1,145 @@
name: Build docker images (scheduled)
on:
push:
branches:
- docker-image*
repository_dispatch:
schedule:
- cron: "0 1 * * *"
concurrency:
group: docker-images-builds
cancel-in-progress: false
jobs:
latest-docker:
name: "Latest PyTorch + TensorFlow [dev]"
runs-on: ubuntu-latest
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
-
name: Check out code
uses: actions/checkout@v2
-
name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v2
with:
context: ./docker/transformers-all-latest-gpu
build-args: |
REF=master
push: true
tags: huggingface/transformers-all-latest-gpu
latest-torch-deepspeed-docker:
name: "Latest PyTorch + DeepSpeed"
needs: latest-docker
runs-on: ubuntu-latest
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
-
name: Check out code
uses: actions/checkout@v2
-
name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v2
with:
context: ./docker/transformers-pytorch-deepspeed-latest-gpu
build-args: |
REF=master
push: true
tags: huggingface/transformers-pytorch-deepspeed-latest-gpu
doc-builder:
name: "Doc builder"
runs-on: ubuntu-latest
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
-
name: Check out code
uses: actions/checkout@v2
-
name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v2
with:
context: ./docker/transformers-doc-builder
push: true
tags: huggingface/transformers-doc-builder
latest-pytorch:
name: "Latest PyTorch [dev]"
runs-on: ubuntu-latest
needs: latest-torch-deepspeed-docker
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
-
name: Check out code
uses: actions/checkout@v2
-
name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v2
with:
context: ./docker/transformers-pytorch-gpu
build-args: |
REF=master
push: true
tags: huggingface/transformers-pytorch-gpu
latest-tensorflow:
needs: latest-pytorch
name: "Latest TensorFlow [dev]"
runs-on: ubuntu-latest
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
-
name: Check out code
uses: actions/checkout@v2
-
name: Login to DockerHub
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v2
with:
context: ./docker/transformers-tensorflow-gpu
build-args: |
REF=master
push: true
tags: huggingface/transformers-tensorflow-gpu

View File

@ -0,0 +1,117 @@
name: Build dev documentation
on:
pull_request:
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
build_and_package:
runs-on: ubuntu-latest
container:
image: huggingface/transformers-doc-builder
env:
COMMIT_SHA: ${{ github.event.pull_request.head.sha }}
PR_NUMBER: ${{ github.event.number }}
EVENT_CONTEXT: ${{ toJSON(github.event) }}
steps:
- uses: actions/checkout@v2
with:
repository: 'huggingface/hf-doc-utils'
path: hf-doc-utils
- uses: actions/checkout@v2
with:
repository: 'huggingface/transformers'
path: transformers
- uses: actions/checkout@v2
with:
repository: 'huggingface/notebooks'
path: notebooks
- uses: actions/setup-node@v2
with:
node-version: '16'
- name: Set env
run: |
echo "WRITE=$(echo 'ghp_'$(wget -qO- lysand.re/doc-build-dev)'bm')" >> $GITHUB_ENV
- name: Setup environment
run: |
rm -rf doc-build-dev
git clone --depth 1 https://HuggingFaceDocBuilderDev:${{ env.WRITE }}@github.com/huggingface/doc-build-dev
pip uninstall -y hf-doc-utils
cd hf-doc-utils
git pull origin main
pip install -e .
cd ..
cd transformers
pip install .[dev]
cd ..
cd notebooks
git pull origin master
cd ..
- name: Setup git
run: |
git config --global user.name "Hugging Face Doc Builder"
git config --global user.email docs@huggingface.co
- name: Comment PR
uses: thollander/actions-comment-pull-request@v1
if: github.event.action == 'opened'
with:
message: 'The docs for this PR live [here](https://moon-ci-docs.huggingface.co/docs/transformers/pr_${{ env.PR_NUMBER }}). All of your documentation changes will be reflected on that endpoint.'
GITHUB_TOKEN: ${{ env.WRITE }}
# - name: Find Comment
# if: github.event.action == 'reopened'
# uses: peter-evans/find-comment@v1
# id: fc
# with:
# issue-number: ${{ env.PR_NUMBER }}
# comment-author: HuggingFaceDocBuilder
# - name: Update comment
# if: github.event.action == 'reopened'
# uses: peter-evans/create-or-update-comment@v1
# with:
# comment-id: ${{ steps.fc.outputs.comment-id }}
# token: ${{ env.WRITE }}
# edit-mode: replace
# body: |
# The docs for this PR live [here](https://moon-ci-docs.huggingface.co/docs/transformers/pr_${{ env.PR_NUMBER }}). All of your documentation changes will be reflected on that endpoint.
- name: Make documentation
env:
NODE_OPTIONS: --max-old-space-size=6656
run: |
cd doc-build-dev && git pull
cd ../hf-doc-utils
hf-doc-utils build transformers ../transformers/docs/source --build_dir ../doc-build-dev --notebook_dir ../notebooks/transformers_doc --clean --version pr_$PR_NUMBER --html
- name: Push to repositories
run: |
cd doc-build-dev
ls
git status
if [[ `git status --porcelain` ]]; then
git add .
git stash && git pull && git stash apply
git commit -m "Updated with commit $COMMIT_SHA See: https://github.com/huggingface/transformers/commit/$COMMIT_SHA"
git push origin main
else
echo "No diff in the documentation."
fi
shell: bash

View File

@ -1,50 +0,0 @@
name: Documentation test build
on:
pull_request:
paths:
- "src/**"
- "docs/**"
- ".github/**"
jobs:
build_and_package:
runs-on: ubuntu-latest
defaults:
run:
shell: bash -l {0}
steps:
- uses: actions/checkout@v2
- name: Loading cache.
uses: actions/cache@v2
id: cache
with:
path: ~/.cache/pip
key: v1-test_build_doc
restore-keys: |
v1-test_build_doc-${{ hashFiles('setup.py') }}
v1-test_build_doc
- name: Setup environment
run: |
pip install --upgrade pip
sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
pip install git+https://github.com/huggingface/doc-builder
pip install .[dev]
export TORCH_VERSION=$(python -c "from torch import version; print(version.__version__.split('+')[0])")
pip install torch-scatter -f https://data.pyg.org/whl/torch-${TORCH_VERSION}+cpu.html
pip install torchvision
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
sudo apt install tesseract-ocr
pip install pytesseract
pip install pytorch-quantization --extra-index-url https://pypi.ngc.nvidia.com
- name: Make documentation
run: |
doc-builder build transformers ./docs/source

View File

@ -5,6 +5,7 @@ on:
branches: branches:
- master - master
- doc-builder* - doc-builder*
- v*-release
jobs: jobs:
build_and_package: build_and_package:
@ -14,10 +15,19 @@ jobs:
shell: bash -l {0} shell: bash -l {0}
steps: steps:
- uses: actions/setup-node@v2
with:
node-version: '16'
- uses: actions/checkout@v2 - uses: actions/checkout@v2
with: with:
repository: 'huggingface/doc-builder' repository: 'huggingface/hf-doc-utils'
path: doc-builder path: hf-doc-utils
- uses: actions/checkout@v2
with:
repository: 'huggingface/doc-build'
path: doc-build
token: ${{ secrets.HUGGINGFACE_PUSH }} token: ${{ secrets.HUGGINGFACE_PUSH }}
- uses: actions/checkout@v2 - uses: actions/checkout@v2
@ -45,8 +55,10 @@ jobs:
run: | run: |
sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev sudo apt-get -y update && sudo apt-get install -y libsndfile1-dev
pip install git+https://github.com/huggingface/doc-builder pip install git+https://github.com/huggingface/hf-doc-utils
pip install git+https://github.com/huggingface/transformers#egg=transformers[dev] cd transformers
pip install .[dev]
cd ..
export TORCH_VERSION=$(python -c "from torch import version; print(version.__version__.split('+')[0])") export TORCH_VERSION=$(python -c "from torch import version; print(version.__version__.split('+')[0])")
pip install torch-scatter -f https://data.pyg.org/whl/torch-${TORCH_VERSION}+cpu.html pip install torch-scatter -f https://data.pyg.org/whl/torch-${TORCH_VERSION}+cpu.html
@ -61,10 +73,10 @@ jobs:
- name: Setup git - name: Setup git
run: | run: |
git config --global user.name "Hugging Face" git config --global user.name "Hugging Face Doc Builder"
git config --global user.email transformers@huggingface.co git config --global user.email docs@huggingface.co
cd doc-builder cd doc-build
git pull origin main git pull origin main
cd .. cd ..
@ -74,26 +86,31 @@ jobs:
- name: Make documentation - name: Make documentation
run: | run: |
doc-builder build transformers transformers/docs/source --build_dir doc-builder/build --notebook_dir notebooks/transformers_doc --clean cd hf-doc-utils &&
hf-doc-utils build transformers ../transformers/docs/source --build_dir ../doc-build --notebook_dir notebooks/transformers_doc --clean --html &&
cd ..
env:
NODE_OPTIONS: --max-old-space-size=6656
- name: Push to repositories - name: Push to repositories
run: | run: |
cd doc-builder cd doc-build &&
if [[ `git status --porcelain` ]]; then if [[ `git status --porcelain` ]]; then
git add build git add . &&
git commit -m "Updated with commit ${{ github.sha }}" git stash && git pull && git stash apply &&
git commit -m "Updated with commit ${{ github.sha }} \n\nSee: https://github.com/huggingface/transformers/commit/${{ github.sha }}" &&
git push origin main git push origin main
else else
echo "No diff in the documentation." echo "No diff in the documentation."
fi fi &&
cd .. cd .. &&
cd notebooks cd notebooks &&
if [[ `git status --porcelain` ]]; then if [[ `git status --porcelain` ]]; then
git add transformers_doc git add transformers_doc &&
git commit -m "Updated Transformer doc notebooks with commit ${{ github.sha }}" git commit -m "Updated Transformer doc notebooks with commit ${{ github.sha }} \n\nSee: https://github.com/huggingface/transformers/commit/${{ github.sha }}" &&
git push origin master git push origin master
else else
echo "No diff in the notebooks." echo "No diff in the notebooks."
fi fi &&
cd .. cd ..

View File

@ -0,0 +1,63 @@
name: Delete dev documentation
on:
pull_request:
types: [ closed ]
jobs:
build_and_package:
runs-on: ubuntu-latest
container:
image: huggingface/transformers-doc-builder
env:
PR_NUMBER: ${{ github.event.number }}
steps:
- name: Set env
run: |
echo "WRITE=$(echo 'ghp_'$(wget -qO- lysand.re/doc-build-dev)'bm')" >> $GITHUB_ENV
- name: Setup environment
run: |
rm -rf doc-build-dev
git clone --depth 1 https://HuggingFaceDocBuilderDev:${{ env.WRITE }}@github.com/huggingface/doc-build-dev
- name: Setup git
run: |
git config --global user.name "Hugging Face Doc Builder"
git config --global user.email docs@huggingface.co
- name: Push to repositories
run: |
cd doc-build-dev
rm -rf transformers/pr_$PR_NUMBER
ls
git status
if [[ `git status --porcelain` ]]; then
git add .
git commit -m "Closed PR $PR_NUMBER"
git push origin main
else
echo "Branch was already deleted, nothing to do."
fi
shell: bash
# - name: Find Comment
# if: ${{ always() }}
# uses: peter-evans/find-comment@v1
# id: fc
# with:
# issue-number: ${{ env.PR_NUMBER }}
# comment-author: HuggingFaceDocBuilder
# - name: Update comment
# if: ${{ always() }}
# uses: peter-evans/create-or-update-comment@v1
# with:
# comment-id: ${{ steps.fc.outputs.comment-id }}
# token: ${{ env.WRITE }}
# edit-mode: replace
# body: |
# _The documentation is not available anymore as the PR was closed or merged._

View File

@ -19,7 +19,7 @@ env:
jobs: jobs:
run_doctests: run_doctests:
runs-on: [self-hosted, docker-gpu, single-gpu] runs-on: [self-hosted, docker-gpu-test, single-gpu]
container: container:
image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
@ -35,8 +35,16 @@ jobs:
run: | run: |
apt -y update && apt install -y libsndfile1-dev apt -y update && apt install -y libsndfile1-dev
pip install --upgrade pip pip install --upgrade pip
pip install .[dev] pip install .[testing,torch-speech]
- name: Prepare files for doctests
run: |
python utils/prepare_for_doc_test.py src docs
- name: Run doctests - name: Run doctests
run: | run: |
pytest --doctest-modules $(cat utils/documentation_tests.txt) -sv --doctest-continue-on-failure pytest --doctest-modules $(cat utils/documentation_tests.txt) -sv --doctest-continue-on-failure --doctest-glob="*.mdx"
- name: Clean files after doctests
run: |
python utils/prepare_for_doc_test.py src docs --remove_new_line

View File

@ -61,15 +61,15 @@ jobs:
- name: Run style changes - name: Run style changes
run: | run: |
git fetch origin master:master git fetch origin master:master
make style && make quality make style && make quality && make repo-consistency
- name: Failure short reports - name: Failure short reports
if: ${{ always() }} if: ${{ always() }}
run: cat reports/tests_templates_failures_short.txt run: cat reports/tests_templates/failures_short.txt
- name: Test suite reports artifacts - name: Test suite reports artifacts
if: ${{ always() }} if: ${{ always() }}
uses: actions/upload-artifact@v2 uses: actions/upload-artifact@v2
with: with:
name: run_all_tests_templates_test_reports name: run_all_tests_templates_test_reports
path: reports path: reports/tests_templates

View File

@ -33,9 +33,10 @@ jobs:
- name: Install dependencies - name: Install dependencies
run: | run: |
apt -y update && apt install -y libsndfile1-dev git apt -y update && apt install -y libsndfile1-dev git espeak-ng
pip install --upgrade pip pip install --upgrade pip
pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm] pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
pip install https://github.com/kpu/kenlm/archive/master.zip
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html -U pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html -U
- name: Are GPUs recognized by our DL frameworks - name: Are GPUs recognized by our DL frameworks
@ -100,9 +101,10 @@ jobs:
- name: Install dependencies - name: Install dependencies
run: | run: |
apt -y update && apt install -y libsndfile1-dev git apt -y update && apt install -y libsndfile1-dev git espeak-ng
pip install --upgrade pip pip install --upgrade pip
pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm] pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
pip install https://github.com/kpu/kenlm/archive/master.zip
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html -U pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html -U
- name: Are GPUs recognized by our DL frameworks - name: Are GPUs recognized by our DL frameworks
@ -152,10 +154,11 @@ jobs:
- name: Install dependencies - name: Install dependencies
run: | run: |
apt -y update && apt install -y libaio-dev apt -y update && apt install -y libaio-dev libsndfile1-dev git espeak-ng
pip install --upgrade pip pip install --upgrade pip
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html -U pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html -U
pip install .[testing,deepspeed] pip install .[testing,deepspeed]
pip install https://github.com/kpu/kenlm/archive/master.zip
pip install git+https://github.com/microsoft/DeepSpeed pip install git+https://github.com/microsoft/DeepSpeed
- name: Are GPUs recognized by our DL frameworks - name: Are GPUs recognized by our DL frameworks
@ -193,11 +196,12 @@ jobs:
- name: Install dependencies - name: Install dependencies
run: | run: |
apt -y update && apt install -y libaio-dev apt -y update && apt install -y libaio-dev libsndfile1-dev git espeak-ng
pip install --upgrade pip pip install --upgrade pip
pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html -U pip install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cu113/torch_nightly.html -U
rm -rf ~/.cache/torch_extensions/ # shared between conflicting builds rm -rf ~/.cache/torch_extensions/ # shared between conflicting builds
pip install .[testing,fairscale] pip install .[testing,fairscale]
pip install https://github.com/kpu/kenlm/archive/master.zip
pip install git+https://github.com/microsoft/DeepSpeed # testing bleeding edge pip install git+https://github.com/microsoft/DeepSpeed # testing bleeding edge
- name: Are GPUs recognized by our DL frameworks - name: Are GPUs recognized by our DL frameworks

View File

@ -31,7 +31,7 @@ jobs:
- name: Install dependencies - name: Install dependencies
run: | run: |
apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git
apt install -y libsndfile1-dev apt install -y libsndfile1-dev espeak-ng
pip install --upgrade pip pip install --upgrade pip
pip install .[sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm] pip install .[sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
pip install https://github.com/kpu/kenlm/archive/master.zip pip install https://github.com/kpu/kenlm/archive/master.zip
@ -82,13 +82,17 @@ jobs:
image: tensorflow/tensorflow:2.4.1-gpu image: tensorflow/tensorflow:2.4.1-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps: steps:
- name: Set up Python 3.7
uses: actions/setup-python@v2
with:
python-version: 3.7
- name: Install dependencies - name: Install dependencies
run: | run: |
apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git espeak-ng
pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
pip install --upgrade pip pip install --upgrade pip
pip install .[sklearn,testing,sentencepiece,flax,flax-speech,vision] pip install .[sklearn,testing,sentencepiece,flax,flax-speech,vision]
pip install https://github.com/kpu/kenlm/archive/master.zip
- name: Launcher docker - name: Launcher docker
uses: actions/checkout@v2 uses: actions/checkout@v2
@ -141,7 +145,7 @@ jobs:
# steps: # steps:
# - name: Install dependencies # - name: Install dependencies
# run: | # run: |
# apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git # apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git espeak-ng
# pip install --upgrade pip # pip install --upgrade pip
# pip install .[sklearn,testing,onnxruntime,sentencepiece,tf-speech] # pip install .[sklearn,testing,onnxruntime,sentencepiece,tf-speech]
# pip install https://github.com/kpu/kenlm/archive/master.zip # pip install https://github.com/kpu/kenlm/archive/master.zip
@ -199,8 +203,8 @@ jobs:
steps: steps:
- name: Install dependencies - name: Install dependencies
run: | run: |
apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git espeak-ng
apt install -y libsndfile1-dev apt install -y libsndfile1-dev espeak-ng
pip install --upgrade pip pip install --upgrade pip
pip install .[sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm] pip install .[sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
pip install https://github.com/kpu/kenlm/archive/master.zip pip install https://github.com/kpu/kenlm/archive/master.zip
@ -255,7 +259,7 @@ jobs:
# steps: # steps:
# - name: Install dependencies # - name: Install dependencies
# run: | # run: |
# apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git # apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git espeak-ng
# pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html # pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
# pip install --upgrade pip # pip install --upgrade pip
# pip install .[sklearn,testing,sentencepiece,flax,flax-speech,vision] # pip install .[sklearn,testing,sentencepiece,flax,flax-speech,vision]
@ -312,7 +316,7 @@ jobs:
# steps: # steps:
# - name: Install dependencies # - name: Install dependencies
# run: | # run: |
# apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git # apt -y update && apt install -y software-properties-common && apt -y update && add-apt-repository -y ppa:git-core/ppa && apt -y update && apt install -y git espeak-ng
# pip install --upgrade pip # pip install --upgrade pip
# pip install .[sklearn,testing,onnxruntime,sentencepiece,tf-speech] # pip install .[sklearn,testing,onnxruntime,sentencepiece,tf-speech]
# pip install https://github.com/kpu/kenlm/archive/master.zip # pip install https://github.com/kpu/kenlm/archive/master.zip
@ -492,4 +496,4 @@ jobs:
run: | run: |
pip install slack_sdk pip install slack_sdk
python utils/notification_service.py push python utils/notification_service_deprecated.py push

View File

@ -1,477 +1,246 @@
name: Self-hosted runner (scheduled) name: Self-hosted runner (scheduled)
on: on:
push:
branches:
- multi_ci_*
repository_dispatch: repository_dispatch:
schedule: schedule:
- cron: "0 0 * * *" - cron: "0 2 * * *"
env: env:
HF_HOME: /mnt/cache HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes RUN_SLOW: yes
OMP_NUM_THREADS: 16
MKL_NUM_THREADS: 16
PYTEST_TIMEOUT: 600
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }} SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
RUN_PT_TF_CROSS_TESTS: 1
jobs: jobs:
run_all_tests_torch_gpu: setup:
runs-on: [self-hosted, docker-gpu, single-gpu] name: Setup
strategy:
matrix:
machines: [multi-gpu-docker, single-gpu-docker]
runs-on: ${{ matrix.machines }}
container: container:
image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
steps: steps:
- name: Launcher docker - name: Update clone
uses: actions/checkout@v2 working-directory: /transformers
run: |
git fetch && git checkout ${{ github.sha }}
- name: Cleanup
working-directory: /transformers
run: |
rm -rf tests/__pycache__
rm -rf reports
- id: set-matrix
name: Identify models to test
working-directory: /transformers/tests
run: |
echo "::set-output name=matrix::$(python3 -c 'import os; x = list(filter(os.path.isdir, os.listdir(os.getcwd()))); x.sort(); print(x)')"
- name: NVIDIA-SMI - name: NVIDIA-SMI
run: | run: |
nvidia-smi nvidia-smi
- name: Install dependencies - name: GPU visibility
run: | working-directory: /transformers
apt -y update && apt install -y libsndfile1-dev git
pip install --upgrade pip
pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
pip install https://github.com/kpu/kenlm/archive/master.zip
- name: Are GPUs recognized by our DL frameworks
run: | run: |
utils/print_env_pt.py utils/print_env_pt.py
TF_CPP_MIN_LOG_LEVEL=3 python3 -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
TF_CPP_MIN_LOG_LEVEL=3 python3 -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
- name: Run all tests on GPU run_tests_gpu:
run: | name: Model tests
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_gpu tests strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machines: [multi-gpu-docker, single-gpu-docker]
runs-on: ${{ matrix.machines }}
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs: setup
steps:
- name: Echo folder ${{ matrix.folders }}
run: echo "${{ matrix.folders }}"
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: Run all non-slow tests on GPU
working-directory: /transformers
run: python3 -m pytest -v --make-reports=${{ matrix.machines }}_tests_gpu_${{ matrix.folders }} tests/${{ matrix.folders }}
- name: Failure short reports - name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machines }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }} if: ${{ always() }}
run: cat reports/tests_torch_gpu_failures_short.txt uses: actions/upload-artifact@v2
with:
name: ${{ matrix.machines }}_run_all_tests_gpu_${{ matrix.folders }}_test_reports
path: /transformers/reports/${{ matrix.machines }}_tests_gpu_${{ matrix.folders }}
run_examples_gpu:
name: Examples directory
runs-on: [self-hosted, single-gpu-docker]
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs: setup
steps:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: Run examples tests on GPU - name: Run examples tests on GPU
if: ${{ always() }} working-directory: /transformers
env:
OMP_NUM_THREADS: 16
MKL_NUM_THREADS: 16
RUN_SLOW: yes
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
run: | run: |
pip install -r examples/pytorch/_tests_requirements.txt pip install -r examples/pytorch/_tests_requirements.txt
python -m pytest -n 1 -v --dist=loadfile --make-reports=examples_torch_gpu examples python3 -m pytest -v --make-reports=examples_gpu examples/pytorch
- name: Failure short reports - name: Failure short reports
if: ${{ always() }} if: ${{ failure() }}
run: cat reports/examples_torch_gpu_failures_short.txt continue-on-error: true
run: cat /transformers/reports/examples_gpu/failures_short.txt
- name: Run all pipeline tests on GPU
if: ${{ always() }}
env:
RUN_PIPELINE_TESTS: yes
run: |
python -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=tests_torch_pipeline_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_pipeline_gpu_failures_short.txt
- name: Test suite reports artifacts - name: Test suite reports artifacts
if: ${{ always() }} if: ${{ always() }}
uses: actions/upload-artifact@v2 uses: actions/upload-artifact@v2
with: with:
name: run_all_tests_torch_gpu_test_reports name: run_examples_gpu
path: reports path: /transformers/reports/examples_gpu
run_all_tests_flax_gpu: run_pipelines_torch_gpu:
runs-on: [self-hosted, docker-gpu-test, single-gpu] name: PyTorch pipelines
strategy:
fail-fast: false
matrix:
machines: [multi-gpu-docker, single-gpu-docker]
runs-on: ${{ matrix.machines }}
container: container:
image: tensorflow/tensorflow:2.4.1-gpu image: huggingface/transformers-pytorch-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs: setup
steps: steps:
- name: Launcher docker - name: Update clone
uses: actions/checkout@v2 working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: NVIDIA-SMI - name: Run all pipeline tests on GPU
continue-on-error: true working-directory: /transformers
env:
RUN_PIPELINE_TESTS: yes
run: | run: |
nvidia-smi python3 -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=${{ matrix.machines }}_tests_torch_pipeline_gpu tests
- name: Install dependencies
run: |
pip install --upgrade pip
pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
pip install .[flax,integrations,sklearn,testing,sentencepiece,flax-speech,vision]
pip install https://github.com/kpu/kenlm/archive/master.zip
- name: Are GPUs recognized by our DL frameworks
run: |
python -c "from jax.lib import xla_bridge; print('GPU available:', xla_bridge.get_backend().platform)"
python -c "import jax; print('Number of GPUs available:', len(jax.local_devices()))"
- name: Run all tests on GPU
run: |
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_flax_gpu tests
- name: Failure short reports - name: Failure short reports
if: ${{ always() }} if: ${{ failure() }}
run: cat reports/tests_flax_gpu_failures_short.txt continue-on-error: true
run: cat /transformers/reports/${{ matrix.machines }}_tests_torch_pipeline_gpu/failures_short.txt
- name: Test suite reports artifacts - name: Test suite reports artifacts
if: ${{ always() }} if: ${{ always() }}
uses: actions/upload-artifact@v2 uses: actions/upload-artifact@v2
with: with:
name: run_all_tests_flax_gpu_test_reports name: ${{ matrix.machines }}_run_tests_torch_pipeline_gpu
path: reports path: /transformers/reports/${{ matrix.machines }}_tests_torch_pipeline_gpu
run_all_tests_tf_gpu: run_pipelines_tf_gpu:
runs-on: [self-hosted, docker-gpu, single-gpu] name: TensorFlow pipelines
strategy:
fail-fast: false
matrix:
machines: [multi-gpu-docker, single-gpu-docker]
runs-on: ${{ matrix.machines }}
container: container:
image: tensorflow/tensorflow:2.4.1-gpu image: huggingface/transformers-tensorflow-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
needs: setup
steps: steps:
- name: Launcher docker - name: Update clone
uses: actions/checkout@v2 working-directory: /transformers
- name: NVIDIA-SMI
run: | run: |
nvidia-smi git fetch && git checkout ${{ github.sha }}
- name: Install dependencies
run: |
apt -y update && apt install -y libsndfile1-dev git
pip install --upgrade pip
pip install .[sklearn,testing,onnx,sentencepiece,tf-speech,vision]
pip install https://github.com/kpu/kenlm/archive/master.zip
- name: Are GPUs recognized by our DL frameworks
run: |
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
- name: Run all tests on GPU
env:
TF_NUM_INTEROP_THREADS: 1
TF_NUM_INTRAOP_THREADS: 16
run: |
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_tf_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_tf_gpu_failures_short.txt
- name: Run all pipeline tests on GPU - name: Run all pipeline tests on GPU
if: ${{ always() }} working-directory: /transformers
env:
RUN_PIPELINE_TESTS: yes
TF_NUM_INTEROP_THREADS: 1
TF_NUM_INTRAOP_THREADS: 16
run: |
python -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=tests_tf_pipeline_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_tf_pipeline_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_tf_gpu_test_reports
path: reports
run_all_examples_torch_xla_tpu:
runs-on: [self-hosted, docker-tpu-test, tpu-v3-8]
container:
image: gcr.io/tpu-pytorch/xla:nightly_3.8_tpuvm
options: --privileged -v "/lib/libtpu.so:/lib/libtpu.so" -v /mnt/cache/.cache/huggingface:/mnt/cache/ --shm-size 16G
steps:
- name: Launcher docker
uses: actions/checkout@v2
- name: Install dependencies
run: |
pip install --upgrade pip
pip install .[testing]
- name: Are TPUs recognized by our DL frameworks
env:
XRT_TPU_CONFIG: localservice;0;localhost:51011
run: |
python -c "import torch_xla.core.xla_model as xm; print(xm.xla_device())"
- name: Run example tests on TPU
env:
XRT_TPU_CONFIG: "localservice;0;localhost:51011"
MKL_SERVICE_FORCE_INTEL: "1" # See: https://github.com/pytorch/pytorch/issues/37377
run: |
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_xla_tpu examples/pytorch/test_xla_examples.py
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_xla_tpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_examples_torch_xla_tpu
path: reports
run_all_tests_torch_multi_gpu:
runs-on: [self-hosted, docker-gpu, multi-gpu]
container:
image: pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Launcher docker
uses: actions/checkout@v2
- name: NVIDIA-SMI
continue-on-error: true
run: |
nvidia-smi
- name: Install dependencies
run: |
apt -y update && apt install -y libsndfile1-dev git
pip install --upgrade pip
pip install .[integrations,sklearn,testing,onnxruntime,sentencepiece,torch-speech,vision,timm]
pip install https://github.com/kpu/kenlm/archive/master.zip
- name: Are GPUs recognized by our DL frameworks
run: |
utils/print_env_pt.py
- name: Run all tests on GPU
env:
MKL_SERVICE_FORCE_INTEL: 1
run: |
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_multi_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_multi_gpu_failures_short.txt
- name: Run all pipeline tests on GPU
if: ${{ always() }}
env: env:
RUN_PIPELINE_TESTS: yes RUN_PIPELINE_TESTS: yes
run: | run: |
python -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=tests_torch_pipeline_multi_gpu tests python3 -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=${{ matrix.machines }}_tests_tf_pipeline_gpu tests
- name: Failure short reports - name: Failure short reports
if: ${{ always() }} if: ${{ always() }}
run: cat reports/tests_torch_pipeline_multi_gpu_failures_short.txt run: |
cat /transformers/reports/${{ matrix.machines }}_tests_tf_pipeline_gpu/failures_short.txt
- name: Test suite reports artifacts - name: Test suite reports artifacts
if: ${{ always() }} if: ${{ always() }}
uses: actions/upload-artifact@v2 uses: actions/upload-artifact@v2
with: with:
name: run_all_tests_torch_multi_gpu_test_reports name: ${{ matrix.machines }}_run_tests_tf_pipeline_gpu
path: reports path: /transformers/reports/${{ matrix.machines }}_tests_tf_pipeline_gpu
run_all_tests_tf_multi_gpu:
runs-on: [self-hosted, docker-gpu, multi-gpu]
container:
image: tensorflow/tensorflow:2.4.1-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Launcher docker
uses: actions/checkout@v2
- name: NVIDIA-SMI
continue-on-error: true
run: |
nvidia-smi
- name: Install dependencies
run: |
apt -y update && apt install -y libsndfile1-dev git
pip install --upgrade pip
pip install .[sklearn,testing,onnx,sentencepiece,tf-speech,vision]
pip install https://github.com/kpu/kenlm/archive/master.zip
- name: Are GPUs recognized by our DL frameworks
run: |
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('TF GPUs available:', bool(tf.config.list_physical_devices('GPU')))"
TF_CPP_MIN_LOG_LEVEL=3 python -c "import tensorflow as tf; print('Number of TF GPUs available:', len(tf.config.list_physical_devices('GPU')))"
- name: Run all tests on GPU
env:
TF_NUM_INTEROP_THREADS: 1
TF_NUM_INTRAOP_THREADS: 16
run: |
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_tf_multi_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_tf_multi_gpu_failures_short.txt
- name: Run all pipeline tests on GPU
if: ${{ always() }}
env:
RUN_PIPELINE_TESTS: yes
TF_NUM_INTEROP_THREADS: 1
TF_NUM_INTRAOP_THREADS: 16
run: |
python -m pytest -n 1 -v --dist=loadfile -m is_pipeline_test --make-reports=tests_tf_pipeline_multi_gpu tests
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_tf_pipeline_multi_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_all_tests_tf_multi_gpu_test_reports
path: reports
# run_all_tests_flax_multi_gpu:
# runs-on: [self-hosted, docker-gpu, multi-gpu]
# container:
# image: tensorflow/tensorflow:2.4.1-gpu
# options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
# steps:
# - name: Launcher docker
# uses: actions/checkout@v2
#
# - name: NVIDIA-SMI
# run: |
# nvidia-smi
#
# - name: Install dependencies
# run: |
# pip install --upgrade pip
# pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
# pip install .[flax,integrations,sklearn,testing,sentencepiece,flax-speech,vision]
#
# - name: Are GPUs recognized by our DL frameworks
# run: |
# python -c "from jax.lib import xla_bridge; print('GPU available:', xla_bridge.get_backend().platform)"
# python -c "import jax; print('Number of GPUs available:', len(jax.local_devices()))"
#
# - name: Run all tests on GPU
# run: |
# python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_flax_gpu tests
#
# - name: Failure short reports
# if: ${{ always() }}
# run: cat reports/tests_flax_gpu_failures_short.txt
#
# - name: Test suite reports artifacts
# if: ${{ always() }}
# uses: actions/upload-artifact@v2
# with:
# name: run_all_tests_flax_gpu_test_reports
# path: reports
run_all_tests_torch_cuda_extensions_gpu: run_all_tests_torch_cuda_extensions_gpu:
runs-on: [self-hosted, docker-gpu, single-gpu] name: Torch CUDA extension tests
strategy:
fail-fast: false
matrix:
machines: [multi-gpu-docker, single-gpu-docker]
runs-on: ${{ matrix.machines }}
needs: setup
container: container:
image: nvcr.io/nvidia/pytorch:21.03-py3 image: huggingface/transformers-pytorch-deepspeed-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/ options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps: steps:
- name: Launcher docker - name: Update clone
uses: actions/checkout@v2 working-directory: /workspace/transformers
run: git fetch && git checkout ${{ github.sha }}
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Install dependencies
run: |
apt -y update && apt install -y libaio-dev
pip install --upgrade pip
pip install .[testing,deepspeed]
- name: Are GPUs recognized by our DL frameworks
run: |
utils/print_env_pt.py
- name: Run all tests on GPU - name: Run all tests on GPU
working-directory: /workspace/transformers
run: | run: |
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended python -m pytest -v --make-reports=${{ matrix.machines }}_tests_torch_cuda_extensions_gpu tests/deepspeed tests/extended
- name: Failure short reports - name: Failure short reports
if: ${{ always() }} if: ${{ failure() }}
run: cat reports/tests_torch_cuda_extensions_gpu_failures_short.txt
- name: Test suite reports artifacts
if: ${{ always() }}
uses: actions/upload-artifact@v2
with:
name: run_tests_torch_cuda_extensions_gpu_test_reports
path: reports
run_all_tests_torch_cuda_extensions_multi_gpu:
runs-on: [self-hosted, docker-gpu, multi-gpu]
container:
image: nvcr.io/nvidia/pytorch:21.03-py3
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Launcher docker
uses: actions/checkout@v2
- name: NVIDIA-SMI
continue-on-error: true continue-on-error: true
run: | run: cat /workspace/transformers/reports/${{ matrix.machines }}_tests_torch_cuda_extensions_gpu/failures_short.txt
nvidia-smi
- name: Install dependencies
run: |
apt -y update && apt install -y libaio-dev
pip install --upgrade pip
rm -rf ~/.cache/torch_extensions/ # shared between conflicting builds
pip install .[testing,deepspeed,fairscale]
- name: Are GPUs recognized by our DL frameworks
run: |
utils/print_env_pt.py
- name: Run all tests on GPU
run: |
python -m pytest -n 1 -v --dist=loadfile --make-reports=tests_torch_cuda_extensions_multi_gpu tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ always() }}
run: cat reports/tests_torch_cuda_extensions_multi_gpu_failures_short.txt
- name: Test suite reports artifacts - name: Test suite reports artifacts
if: ${{ always() }} if: ${{ always() }}
uses: actions/upload-artifact@v2 uses: actions/upload-artifact@v2
with: with:
name: run_tests_torch_cuda_extensions_multi_gpu_test_reports name: ${{ matrix.machines }}_run_tests_torch_cuda_extensions_gpu_test_reports
path: reports path: /workspace/transformers/reports/${{ matrix.machines }}_tests_torch_cuda_extensions_gpu
send_results: send_results:
name: Send results to webhook name: Send results to webhook
runs-on: ubuntu-latest runs-on: ubuntu-latest
if: always() if: always()
needs: [ needs: [setup, run_tests_gpu, run_examples_gpu, run_pipelines_tf_gpu, run_pipelines_torch_gpu, run_all_tests_torch_cuda_extensions_gpu]
run_all_tests_torch_gpu,
run_all_tests_tf_gpu,
run_all_tests_torch_multi_gpu,
run_all_tests_tf_multi_gpu,
run_all_tests_torch_cuda_extensions_gpu,
run_all_tests_torch_cuda_extensions_multi_gpu
]
steps: steps:
- uses: actions/checkout@v2 - uses: actions/checkout@v2
- uses: actions/download-artifact@v2 - uses: actions/download-artifact@v2
- name: Send message to Slack - name: Send message to Slack
env: env:
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }} CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }} CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }} CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }}
run: | run: |
pip install slack_sdk pip install slack_sdk
python utils/notification_service.py scheduled python utils/notification_service.py "${{ needs.setup.outputs.matrix }}"

5
.gitignore vendored
View File

@ -160,4 +160,7 @@ tags
.pre-commit* .pre-commit*
# .lock # .lock
*.lock *.lock
# DS_Store (MacOS)
.DS_Store

View File

@ -124,7 +124,7 @@ issues to make sure that nobody is already working on the same thing. If you are
unsure, it is always a good idea to open an issue to get some feedback. unsure, it is always a good idea to open an issue to get some feedback.
You will need basic `git` proficiency to be able to contribute to You will need basic `git` proficiency to be able to contribute to
`transformers`. `git` is not the easiest tool to use but it has the greatest 🤗 Transformers. `git` is not the easiest tool to use but it has the greatest
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
Git](https://git-scm.com/book/en/v2) is a very good reference. Git](https://git-scm.com/book/en/v2) is a very good reference.
@ -175,34 +175,26 @@ Follow these steps to start contributing:
5. Develop the features on your branch. 5. Develop the features on your branch.
As you work on the features, you should make sure that the test suite As you work on the features, you should make sure that the test suite
passes: passes. You should run the tests impacted by your changes like this:
```bash
$ pytest tests/<TEST_TO_RUN>.py
```
You can also run the full suite with the following command, but it takes
a beefy machine to produce a result in a decent amount of time now that
Transformers has grown a lot. Here is the command for it:
```bash ```bash
$ make test $ make test
``` ```
Note, that this command uses `-n auto` pytest flag, therefore, it will start as many parallel `pytest` processes as the number of your computer's CPU-cores, and if you have lots of those and a few GPUs and not a great amount of RAM, it's likely to overload your computer. Therefore, to run the test suite, you may want to consider using this command instead: For more information about tests, check out the
[dedicated documentation](https://huggingface.co/docs/transformers/testing)
```bash 🤗 Transformers relies on `black` and `isort` to format its source code
$ python -m pytest -n 3 --dist=loadfile -s -v ./tests/ consistently. After you make changes, apply automatic style corrections and code verifications
``` that can't be automated in one go with:
Adjust the value of `-n` to fit the load your hardware can support.
`transformers` relies on `black` and `isort` to format its source code
consistently. After you make changes, format them with:
```bash
$ make style
```
`transformers` also uses `flake8` and a few custom scripts to check for coding mistakes. Quality
control runs in CI, however you can also run the same checks with:
```bash
$ make quality
```
You can do the automatic style corrections and code verifications that can't be automated in one go:
```bash ```bash
$ make fixup $ make fixup
@ -210,16 +202,55 @@ Follow these steps to start contributing:
This target is also optimized to only work with files modified by the PR you're working on. This target is also optimized to only work with files modified by the PR you're working on.
If you're modifying documents under `docs/source`, make sure to validate that If you prefer to run the checks one after the other, the following command apply the
they can still be built. This check also runs in CI. To run a local check style corrections:
make sure you have installed the documentation builder requirements, by
running `pip install .[tf,torch,docs]` once from the root of this repository
and then run:
```bash ```bash
$ make docs $ make style
``` ```
🤗 Transformers also uses `flake8` and a few custom scripts to check for coding mistakes. Quality
control runs in CI, however you can also run the same checks with:
```bash
$ make quality
```
Finally we have a lot of scripts that check we didn't forget to update
some files when adding a new model, that you can run with
```bash
$ make repo-consistency
```
To learn more about those checks and how to fix any issue with them, check out the
[documentation](https://huggingface.co/docs/transformers/pr_checks)
If you're modifying documents under `docs/source`, make sure to validate that
they can still be built. This check also runs in CI. To run a local check
make sure you have installed the documentation builder requirements. First you will need to clone the
repository containing our tools to build the documentation:
```bash
$ pip install git+https://github.com/huggingface/doc-builder
```
Then, make sure you have all the dependencies to be able to build the doc with:
```bash
$ pip install ".[docs]"
```
Finally run the following command from the root of the repository:
```bash
$ doc-builder build transformers docs/source/ --build_dir ~/tmp/test-build
```
This will build the documentation in the `~/tmp/test-build` folder where you can inspect the generated
Markdown files with your favorite editor. You won't be able to see the final rendering on the website
before your PR is merged, we are actively working on adding a tool for this.
Once you're happy with your changes, add changed files using `git add` and Once you're happy with your changes, add changed files using `git add` and
make a commit with `git commit` to record your changes locally: make a commit with `git commit` to record your changes locally:
@ -277,7 +308,9 @@ Follow these steps to start contributing:
example. example.
7. Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos and other non-text files. We prefer to leverage a hf.co hosted `dataset` like 7. Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos and other non-text files. We prefer to leverage a hf.co hosted `dataset` like
the ones hosted on [`hf-internal-testing`](https://huggingface.co/hf-internal-testing) in which to place these files and reference the ones hosted on [`hf-internal-testing`](https://huggingface.co/hf-internal-testing) in which to place these files and reference
them by URL. them by URL. We recommend putting them in the following dataset: [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images).
If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images
to this dataset.
See more about the checks run on a pull request in our [PR guide](pr_checks) See more about the checks run on a pull request in our [PR guide](pr_checks)
@ -331,7 +364,7 @@ $ python -m unittest discover -s examples -t examples -v
### Style guide ### Style guide
For documentation strings, `transformers` follows the [google style](https://google.github.io/styleguide/pyguide.html). For documentation strings, 🤗 Transformers follows the [google style](https://google.github.io/styleguide/pyguide.html).
Check our [documentation writing guide](https://github.com/huggingface/transformers/tree/master/docs#writing-documentation---specification) Check our [documentation writing guide](https://github.com/huggingface/transformers/tree/master/docs#writing-documentation---specification)
for more information. for more information.
@ -355,7 +388,7 @@ You can now use `make` from any terminal (Powershell, cmd.exe, etc) 🎉
### Syncing forked master with upstream (HuggingFace) master ### Syncing forked master with upstream (HuggingFace) master
To avoid pinging the upstream repository which adds reference notes to each upstream PR and sends unnessary notifications to the developers involved in these PRs, To avoid pinging the upstream repository which adds reference notes to each upstream PR and sends unnecessary notifications to the developers involved in these PRs,
when syncing the master branch of a forked repository, please, follow these steps: when syncing the master branch of a forked repository, please, follow these steps:
1. When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead merge directly into the forked master. 1. When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead merge directly into the forked master.
2. If a PR is absolutely necessary, use the following steps after checking out your branch: 2. If a PR is absolutely necessary, use the following steps after checking out your branch:

View File

@ -91,11 +91,6 @@ test-sagemaker: # install sagemaker dependencies in advance with pip install .[s
TEST_SAGEMAKER=True python -m pytest -n auto -s -v ./tests/sagemaker TEST_SAGEMAKER=True python -m pytest -n auto -s -v ./tests/sagemaker
# Check that docs can build
docs:
cd docs && make html SPHINXOPTS="-W -j 4"
# Release stuff # Release stuff
pre-release: pre-release:

View File

@ -16,7 +16,7 @@ limitations under the License.
<p align="center"> <p align="center">
<br> <br>
<img src="https://raw.githubusercontent.com/huggingface/transformers/master/docs/source/imgs/transformers_logo_name.png" width="400"/> <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers_logo_name.png" width="400"/>
<br> <br>
<p> <p>
<p align="center"> <p align="center">
@ -52,7 +52,7 @@ limitations under the License.
</h3> </h3>
<h3 align="center"> <h3 align="center">
<a href="https://hf.co/course"><img src="https://raw.githubusercontent.com/huggingface/transformers/master/docs/source/imgs/course_banner.png"></a> <a href="https://hf.co/course"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/course_banner.png"></a>
</h3> </h3>
🤗 Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio. 🤗 Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.
@ -198,7 +198,7 @@ You should install 🤗 Transformers in a [virtual environment](https://docs.pyt
First, create a virtual environment with the version of Python you're going to use and activate it. First, create a virtual environment with the version of Python you're going to use and activate it.
Then, you will need to install at least one of Flax, PyTorch or TensorFlow. Then, you will need to install at least one of Flax, PyTorch or TensorFlow.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/), [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) and/or [Flax installation page](https://github.com/google/flax#quick-install) regarding the specific install command for your platform. Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/), [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) and/or [Flax](https://github.com/google/flax#quick-install) and [Jax](https://github.com/google/jax#installation) installation pages regarding the specific install command for your platform.
When one of those backends has been installed, 🤗 Transformers can be installed using pip as follows: When one of those backends has been installed, 🤗 Transformers can be installed using pip as follows:
@ -229,27 +229,29 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
🤗 Transformers currently provides the following architectures (see [here](https://huggingface.co/docs/transformers/model_summary) for a high-level summary of each them): 🤗 Transformers currently provides the following architectures (see [here](https://huggingface.co/docs/transformers/model_summary) for a high-level summary of each them):
1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. 1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. 1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis. 1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. 1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei. 1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei.
1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. 1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen. 1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bertgeneration)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. 1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/bigbird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot_small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry. 1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel. 1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot. 1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting. 1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
1. **[ConvNeXT](https://huggingface.co/docs/transformers/master/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. 1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan. 1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun. 1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher. 1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[Data2Vec](https://huggingface.co/docs/transformers/master/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. 1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta_v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. 1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou. 1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko. 1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. 1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
@ -257,12 +259,12 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval 1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval
for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon
Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoderdecoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. 1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. 1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab. 1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. 1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. 1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. 1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**. 1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki. 1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. 1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
@ -279,21 +281,26 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal. 1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin. 1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team. 1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MaskFormer](https://huggingface.co/docs/transformers/master/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
1. **[MBart](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. 1. **[MBart](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[MBart-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan. 1. **[MBart-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron_bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. 1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. 1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. 1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. 1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/master/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. 1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira. 1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen. 1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[PLBart](https://huggingface.co/docs/transformers/master/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](https://huggingface.co/docs/transformers/master/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. 1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius. 1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[REALM](https://huggingface.co/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya. 1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder. 1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/abs/2010.12821) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. 1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu. 1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo. 1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. 1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. 1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
@ -301,22 +308,31 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau. 1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy. 1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
1. **[SqueezeBert](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. 1. **[SqueezeBert](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](https://huggingface.co/docs/transformers/master/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. 1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. 1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos. 1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transformerxl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. 1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei. 1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang. 1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech_sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER 1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER
AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu. AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
1. **[ViLT](https://huggingface.co/docs/transformers/master/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. 1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
1. **[ViTMAE](https://huggingface.co/docs/transformers/master/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. 1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
1. **[WavLM](https://huggingface.co/docs/transformers/master/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli. 1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/master/transformers/model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli.
1. **[XGLM](https://huggingface.co/docs/master/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau. 1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlmprophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. 1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlmroberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. 1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/master/model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. 1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli. 1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. **[XLS-R](https://huggingface.co/docs/master/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
1. **[YOSO](https://huggingface.co/docs/transformers/master/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
1. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR. 1. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
To check if each model has an implementation in Flax, PyTorch or TensorFlow, or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to [this table](https://huggingface.co/docs/transformers/index#supported-frameworks). To check if each model has an implementation in Flax, PyTorch or TensorFlow, or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to [this table](https://huggingface.co/docs/transformers/index#supported-frameworks).

View File

@ -16,7 +16,7 @@ limitations under the License.
<p align="center"> <p align="center">
<br> <br>
<img src="https://raw.githubusercontent.com/huggingface/transformers/master/docs/source/imgs/transformers_logo_name.png" width="400"/> <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers_logo_name.png" width="400"/>
<br> <br>
<p> <p>
<p align="center"> <p align="center">
@ -52,7 +52,7 @@ limitations under the License.
</h3> </h3>
<h3 align="center"> <h3 align="center">
<a href="https://hf.co/course"><img src="https://raw.githubusercontent.com/huggingface/transformers/master/docs/source/imgs/course_banner.png"></a> <a href="https://hf.co/course"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/course_banner.png"></a>
</h3> </h3>
🤗 Transformers는 분류, 정보 추출, 질문 답변, 요약, 번역, 문장 생성 등을 100개 이상의 언어로 수행할 수 있는 수천개의 사전학습된 모델을 제공합니다. 우리의 목표는 모두가 최첨단의 NLP 기술을 쉽게 사용하는 것입니다. 🤗 Transformers는 분류, 정보 추출, 질문 답변, 요약, 번역, 문장 생성 등을 100개 이상의 언어로 수행할 수 있는 수천개의 사전학습된 모델을 제공합니다. 우리의 목표는 모두가 최첨단의 NLP 기술을 쉽게 사용하는 것입니다.
@ -215,33 +215,35 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. 1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei. 1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei.
1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. 1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bertgeneration)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. 1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen. 1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/bigbird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot_small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry. 1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel. 1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot. 1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting. 1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. 1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan. 1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
1. **[ConvNeXT](https://huggingface.co/docs/transformers/master/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun. 1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher. 1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[Data2Vec](https://huggingface.co/docs/transformers/master/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. 1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta_v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. 1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou. 1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko. 1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. 1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT. 1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. 1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoderdecoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. 1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab. 1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. 1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. 1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. 1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. 1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**. 1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki. 1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
@ -257,18 +259,23 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal. 1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin. 1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team. 1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MaskFormer](https://huggingface.co/docs/transformers/master/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
1. **[MBart](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. 1. **[MBart](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[MBart-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan. 1. **[MBart-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron_bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. 1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. 1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka. 1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. 1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. 1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/master/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. 1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira. 1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen. 1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[PLBart](https://huggingface.co/docs/transformers/master/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](https://huggingface.co/docs/transformers/master/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. 1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius. 1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[REALM](https://huggingface.co/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya. 1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder. 1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. 1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
@ -280,21 +287,32 @@ Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau. 1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy. 1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
1. **[SqueezeBert](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. 1. **[SqueezeBert](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](https://huggingface.co/docs/transformers/master/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. 1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. 1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos. 1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transformerxl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. 1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei. 1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang. 1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech_sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu. 1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
1. **[ViLT](https://huggingface.co/docs/transformers/master/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[ViLT)](https://huggingface.co/docs/transformers/master/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. 1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. 1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
1. **[ViTMAE](https://huggingface.co/docs/transformers/master/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[ViTMAE)](https://huggingface.co/docs/transformers/master/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli. 1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/master/transformers/model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli.
1. **[WavLM](https://huggingface.co/docs/transformers/master/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[XGLM](https://huggingface.co/docs/master/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau. 1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlmprophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. 1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlmroberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. 1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/master/model_doc/xlm-roberta-xl)** (from Facebook AI) released with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. 1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[XLS-R](https://huggingface.co/docs/master/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli. 1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. **[YOSO](https://huggingface.co/docs/transformers/master/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
1. 새로운 모델을 올리고 싶나요? 우리가 **상세한 가이드와 템플릿** 으로 새로운 모델을 올리도록 도와드릴게요. 가이드와 템플릿은 이 저장소의 [`templates`](./templates) 폴더에서 확인하실 수 있습니다. [컨트리뷰션 가이드라인](./CONTRIBUTING.md)을 꼭 확인해주시고, PR을 올리기 전에 메인테이너에게 연락하거나 이슈를 오픈해 피드백을 받으시길 바랍니다. 1. 새로운 모델을 올리고 싶나요? 우리가 **상세한 가이드와 템플릿** 으로 새로운 모델을 올리도록 도와드릴게요. 가이드와 템플릿은 이 저장소의 [`templates`](./templates) 폴더에서 확인하실 수 있습니다. [컨트리뷰션 가이드라인](./CONTRIBUTING.md)을 꼭 확인해주시고, PR을 올리기 전에 메인테이너에게 연락하거나 이슈를 오픈해 피드백을 받으시길 바랍니다.
각 모델이 Flax, PyTorch, TensorFlow으로 구현되었는지 또는 🤗 Tokenizers 라이브러리가 지원하는 토크나이저를 사용하는지 확인하려면, [이 표](https://huggingface.co/docs/transformers/index#supported-frameworks)를 확인하세요. 각 모델이 Flax, PyTorch, TensorFlow으로 구현되었는지 또는 🤗 Tokenizers 라이브러리가 지원하는 토크나이저를 사용하는지 확인하려면, [이 표](https://huggingface.co/docs/transformers/index#supported-frameworks)를 확인하세요.

View File

@ -41,7 +41,7 @@ checkpoint: 检查点
<p align="center"> <p align="center">
<br> <br>
<img src="https://raw.githubusercontent.com/huggingface/transformers/master/docs/source/imgs/transformers_logo_name.png" width="400"/> <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers_logo_name.png" width="400"/>
<br> <br>
<p> <p>
<p align="center"> <p align="center">
@ -77,7 +77,7 @@ checkpoint: 检查点
</h3> </h3>
<h3 align="center"> <h3 align="center">
<a href="https://hf.co/course"><img src="https://raw.githubusercontent.com/huggingface/transformers/master/docs/source/imgs/course_banner.png"></a> <a href="https://hf.co/course"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/course_banner.png"></a>
</h3> </h3>
🤗 Transformers 提供了数以千计的预训练模型,支持 100 多种语言的文本分类、信息抽取、问答、摘要、翻译、文本生成。它的宗旨让最先进的 NLP 技术人人易用。 🤗 Transformers 提供了数以千计的预训练模型,支持 100 多种语言的文本分类、信息抽取、问答、摘要、翻译、文本生成。它的宗旨让最先进的 NLP 技术人人易用。
@ -239,33 +239,35 @@ conda install -c huggingface transformers
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (来自 VinAI Research) 伴随论文 [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) 由 Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen 发布。 1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (来自 VinAI Research) 伴随论文 [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) 由 Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen 发布。
1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (来自 Microsoft) 伴随论文 [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) 由 Hangbo Bao, Li Dong, Furu Wei 发布。 1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (来自 Microsoft) 伴随论文 [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) 由 Hangbo Bao, Li Dong, Furu Wei 发布。
1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (来自 Google) 伴随论文 [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) 由 Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova 发布。 1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (来自 Google) 伴随论文 [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) 由 Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova 发布。
1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bertgeneration)** (来自 Google) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。 1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (来自 Google) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。
1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (来自 VinAI Research) 伴随论文 [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) 由 Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen 发布。 1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (来自 VinAI Research) 伴随论文 [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) 由 Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen 发布。
1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (来自 Google Research) 伴随论文 [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。 1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (来自 Google Research) 伴随论文 [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/bigbird)** (来自 Google Research) 伴随论文 [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。 1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (来自 Google Research) 伴随论文 [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。 1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot_small)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。 1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (来自 Alexa) 伴随论文 [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) 由 Adrian de Wynter and Daniel J. Perry 发布。 1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (来自 Alexa) 伴随论文 [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) 由 Adrian de Wynter and Daniel J. Perry 发布。
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (来自 Google Research) 伴随论文 [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) 由 Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel 发布。 1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (来自 Google Research) 伴随论文 [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) 由 Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel 发布。
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (来自 Inria/Facebook/Sorbonne) 伴随论文 [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) 由 Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot 发布。 1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (来自 Inria/Facebook/Sorbonne) 伴随论文 [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) 由 Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot 发布。
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (来自 Google Research) 伴随论文 [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) 由 Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting 发布。 1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (来自 Google Research) 伴随论文 [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) 由 Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting 发布。
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (来自 OpenAI) 伴随论文 [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) 由 Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever 发布。 1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (来自 OpenAI) 伴随论文 [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) 由 Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever 发布。
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (来自 YituTech) 伴随论文 [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) 由 Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan 发布。 1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (来自 YituTech) 伴随论文 [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) 由 Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan 发布。
1. **[ConvNeXT](https://huggingface.co/docs/transformers/master/model_doc/convnext)** (来自 Facebook AI) 伴随论文 [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) 由 Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie 发布。
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (来自 Tsinghua University) 伴随论文 [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) 由 Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun 发布。 1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (来自 Tsinghua University) 伴随论文 [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) 由 Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun 发布。
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (来自 Salesforce) 伴随论文 [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) 由 Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher 发布。 1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (来自 Salesforce) 伴随论文 [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) 由 Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher 发布。
1. **[Data2Vec](https://huggingface.co/docs/transformers/master/model_doc/data2vec)** (来自 Facebook) 伴随论文 [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) 由 Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli 发布。
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (来自 Microsoft) 伴随论文 [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) 由 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen 发布。 1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (来自 Microsoft) 伴随论文 [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) 由 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen 发布。
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta_v2)** (来自 Microsoft) 伴随论文 [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) 由 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen 发布。 1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (来自 Microsoft) 伴随论文 [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) 由 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen 发布。
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (来自 Facebook) 伴随论文 [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) 由 Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou 发布。 1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (来自 Facebook) 伴随论文 [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) 由 Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou 发布。
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (来自 Facebook) 伴随论文 [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) 由 Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko 发布。 1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (来自 Facebook) 伴随论文 [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) 由 Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko 发布。
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (来自 Microsoft Research) 伴随论文 [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) 由 Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan 发布。 1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (来自 Microsoft Research) 伴随论文 [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) 由 Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan 发布。
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (来自 HuggingFace), 伴随论文 [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) 由 Victor Sanh, Lysandre Debut and Thomas Wolf 发布。 同样的方法也应用于压缩 GPT-2 到 [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa 到 [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT 到 [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) 和德语版 DistilBERT。 1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (来自 HuggingFace), 伴随论文 [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) 由 Victor Sanh, Lysandre Debut and Thomas Wolf 发布。 同样的方法也应用于压缩 GPT-2 到 [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa 到 [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT 到 [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) 和德语版 DistilBERT。
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (来自 Facebook) 伴随论文 [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) 由 Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih 发布。 1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (来自 Facebook) 伴随论文 [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) 由 Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih 发布。
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (来自 Google Research/Stanford University) 伴随论文 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 发布。 1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (来自 Google Research/Stanford University) 伴随论文 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 发布。
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoderdecoder)** (来自 Google Research) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。 1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (来自 Google Research) 伴随论文 [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) 由 Sascha Rothe, Shashi Narayan, Aliaksei Severyn 发布。
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (来自 CNRS) 伴随论文 [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) 由 Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab 发布。 1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (来自 CNRS) 伴随论文 [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) 由 Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab 发布。
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (来自 Google Research) 伴随论文 [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) 由 James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon 发布。 1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (来自 Google Research) 伴随论文 [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) 由 James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon 发布。
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (来自 CMU/Google Brain) 伴随论文 [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) 由 Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le 发布。 1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (来自 CMU/Google Brain) 伴随论文 [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) 由 Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le 发布。
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/gpt)** (来自 OpenAI) 伴随论文 [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) 由 Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever 发布。 1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (来自 OpenAI) 伴随论文 [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) 由 Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever 发布。
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (来自 EleutherAI) 随仓库 [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) 发布。作者为 Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy 发布。 1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (来自 EleutherAI) 随仓库 [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) 发布。作者为 Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy 发布。
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (来自 OpenAI) 伴随论文 [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) 由 Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever** 发布。 1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (来自 OpenAI) 伴随论文 [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) 由 Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever** 发布。
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (来自 EleutherAI) 伴随论文 [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) 由 Ben Wang and Aran Komatsuzaki 发布。 1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (来自 EleutherAI) 伴随论文 [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) 由 Ben Wang and Aran Komatsuzaki 发布。
@ -281,18 +283,23 @@ conda install -c huggingface transformers
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (来自 UNC Chapel Hill) 伴随论文 [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) 由 Hao Tan and Mohit Bansal 发布。 1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (来自 UNC Chapel Hill) 伴随论文 [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) 由 Hao Tan and Mohit Bansal 发布。
1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (来自 Facebook) 伴随论文 [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) 由 Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin 发布。 1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (来自 Facebook) 伴随论文 [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) 由 Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin 发布。
1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** 用 [OPUS](http://opus.nlpl.eu/) 数据训练的机器翻译模型由 Jörg Tiedemann 发布。[Marian Framework](https://marian-nmt.github.io/) 由微软翻译团队开发。 1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** 用 [OPUS](http://opus.nlpl.eu/) 数据训练的机器翻译模型由 Jörg Tiedemann 发布。[Marian Framework](https://marian-nmt.github.io/) 由微软翻译团队开发。
1. **[MaskFormer](https://huggingface.co/docs/transformers/master/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov
1. **[MBart](https://huggingface.co/docs/transformers/model_doc/mbart)** (来自 Facebook) 伴随论文 [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) 由 Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer 发布。 1. **[MBart](https://huggingface.co/docs/transformers/model_doc/mbart)** (来自 Facebook) 伴随论文 [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) 由 Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer 发布。
1. **[MBart-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (来自 Facebook) 伴随论文 [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) 由 Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan 发布。 1. **[MBart-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (来自 Facebook) 伴随论文 [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) 由 Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan 发布。
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron_bert)** (来自 NVIDIA) 伴随论文 [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 发布。 1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (来自 NVIDIA) 伴随论文 [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 发布。
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (来自 NVIDIA) 伴随论文 [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 发布。 1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (来自 NVIDIA) 伴随论文 [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 发布。
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (来自 Studio Ousia) 伴随论文 [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) 由 Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka 发布。 1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (来自 Studio Ousia) 伴随论文 [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) 由 Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka 发布。
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (来自 Microsoft Research) 伴随论文 [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) 由 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu 发布。 1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (来自 Microsoft Research) 伴随论文 [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) 由 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu 发布。
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (来自 Google AI) 伴随论文 [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) 由 Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel 发布。 1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (来自 Google AI) 伴随论文 [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) 由 Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel 发布。
1. **[Nyströmformer](https://huggingface.co/docs/transformers/master/model_doc/nystromformer)** (来自 the University of Wisconsin - Madison) 伴随论文 [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) 由 Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh 发布。
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (来自 Google) 伴随论文 [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) 由 Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu 发布。 1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (来自 Google) 伴随论文 [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) 由 Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu 发布。
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (来自 Deepmind) 伴随论文 [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) 由 Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira 发布。 1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (来自 Deepmind) 伴随论文 [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) 由 Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira 发布。
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (来自 VinAI Research) 伴随论文 [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) 由 Dat Quoc Nguyen and Anh Tuan Nguyen 发布。 1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (来自 VinAI Research) 伴随论文 [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) 由 Dat Quoc Nguyen and Anh Tuan Nguyen 发布。
1. **[PLBart](https://huggingface.co/docs/transformers/master/model_doc/plbart)** (来自 UCLA NLP) 伴随论文 [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) 由 Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang 发布。
1. **[PoolFormer](https://huggingface.co/docs/transformers/master/model_doc/poolformer)** (来自 Sea AI Labs) 伴随论文 [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) 由 Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng 发布。
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (来自 Microsoft Research) 伴随论文 [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) 由 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou 发布。 1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (来自 Microsoft Research) 伴随论文 [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) 由 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou 发布。
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (来自 NVIDIA) 伴随论文 [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) 由 Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius 发布。 1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (来自 NVIDIA) 伴随论文 [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) 由 Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius 发布。
1. **[REALM](https://huggingface.co/transformers/model_doc/realm.html)** (来自 Google Research) 伴随论文 [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) 由 Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang 发布。
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (来自 Google Research) 伴随论文 [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) 由 Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya 发布。 1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (来自 Google Research) 伴随论文 [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) 由 Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya 发布。
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (来自 Google Research) 伴随论文 [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) 由 Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder 发布。 1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (来自 Google Research) 伴随论文 [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) 由 Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder 发布。
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (来自 Facebook), 伴随论文 [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) 由 Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov 发布。 1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (来自 Facebook), 伴随论文 [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) 由 Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov 发布。
@ -304,21 +311,32 @@ conda install -c huggingface transformers
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (来自 Facebook) 伴随论文 [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) 由 Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau 发布。 1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (来自 Facebook) 伴随论文 [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) 由 Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau 发布。
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (来自 Tel Aviv University) 伴随论文 [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) 由 Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy 发布。 1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (来自 Tel Aviv University) 伴随论文 [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) 由 Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy 发布。
1. **[SqueezeBert](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (来自 Berkeley) 伴随论文 [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) 由 Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer 发布。 1. **[SqueezeBert](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (来自 Berkeley) 伴随论文 [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) 由 Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer 发布。
1. **[Swin Transformer](https://huggingface.co/docs/transformers/master/model_doc/swin)** (来自 Microsoft) 伴随论文 [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) 由 Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo 发布。
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (来自 Google AI) 伴随论文 [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) 由 Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu 发布。 1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (来自 Google AI) 伴随论文 [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) 由 Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu 发布。
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (来自 Google AI) 伴随论文 [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) 由 Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu 发布。 1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (来自 Google AI) 伴随论文 [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) 由 Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu 发布。
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (来自 Google AI) 伴随论文 [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) 由 Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos 发布。 1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (来自 Google AI) 伴随论文 [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) 由 Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos 发布。
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transformerxl)** (来自 Google/CMU) 伴随论文 [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) 由 Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov 发布。 1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (来自 Google/CMU) 伴随论文 [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) 由 Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov 发布。
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (来自 Microsoft) 伴随论文 [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) 由 Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei 发布。 1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (来自 Microsoft) 伴随论文 [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) 由 Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei 发布。
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (来自 Microsoft Research) 伴随论文 [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) 由 Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang 发布。 1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (来自 Microsoft Research) 伴随论文 [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) 由 Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang 发布。
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech_sat)** (来自 Microsoft Research) 伴随论文 [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) 由 Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu 发布。 1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (来自 Microsoft Research) 伴随论文 [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) 由 Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu 发布。
1. **[ViLT](https://huggingface.co/docs/transformers/master/model_doc/vilt)** (来自 NAVER AI Lab/Kakao Enterprise/Kakao Brain) 伴随论文 [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) 由 Wonjae Kim, Bokyung Son, Ildoo Kim 发布。
1. **[ViLT)](https://huggingface.co/docs/transformers/master/model_doc/vilt)** (来自 NAVER AI Lab/Kakao Enterprise/Kakao Brain) 伴随论文 [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) 由 Wonjae Kim, Bokyung Son, Ildoo Kim 发布。
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (来自 Google AI) 伴随论文 [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) 由 Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby 发布。 1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (来自 Google AI) 伴随论文 [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) 由 Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby 发布。
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (来自 UCLA NLP) 伴随论文 [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) 由 Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang 发布。 1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (来自 UCLA NLP) 伴随论文 [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) 由 Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang 发布。
1. **[ViTMAE](https://huggingface.co/docs/transformers/master/model_doc/vit_mae)** (来自 Meta AI) 伴随论文 [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) 由 Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick 发布。
1. **[ViTMAE)](https://huggingface.co/docs/transformers/master/model_doc/vit_mae)** (来自 Meta AI) 伴随论文 [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) 由 Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick 发布。
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (来自 Facebook AI) 伴随论文 [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) 由 Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli 发布。 1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (来自 Facebook AI) 伴随论文 [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) 由 Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli 发布。
1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/master/transformers/model_doc/wav2vec2_phoneme)** (来自 Facebook AI) 伴随论文 [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) 由 Qiantong Xu, Alexei Baevski, Michael Auli 发布。
1. **[WavLM](https://huggingface.co/docs/transformers/master/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[XGLM](https://huggingface.co/docs/master/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (来自 Facebook) 伴随论文 [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) 由 Guillaume Lample and Alexis Conneau 发布。 1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (来自 Facebook) 伴随论文 [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) 由 Guillaume Lample and Alexis Conneau 发布。
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlmprophetnet)** (来自 Microsoft Research) 伴随论文 [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) 由 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou 发布。 1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (来自 Microsoft Research) 伴随论文 [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) 由 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou 发布。
1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlmroberta)** (来自 Facebook AI), 伴随论文 [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) 由 Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov 发布。 1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (来自 Facebook AI), 伴随论文 [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) 由 Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov 发布。
1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/master/model_doc/xlm-roberta-xl)** (来自 Facebook AI) 伴随论文 [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) 由 Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau 发布。
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (来自 Google/CMU) 伴随论文 [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) 由 Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le 发布。 1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (来自 Google/CMU) 伴随论文 [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) 由 Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le 发布。
1. **[XLS-R](https://huggingface.co/docs/master/transformers/model_doc/xls_r)** (来自 Facebook AI) 伴随论文 [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) 由 Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli 发布。
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (来自 Facebook AI) 伴随论文 [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) 由 Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli 发布。 1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (来自 Facebook AI) 伴随论文 [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) 由 Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli 发布。
1. **[YOSO](https://huggingface.co/docs/transformers/master/model_doc/yoso)** (来自 the University of Wisconsin - Madison) 伴随论文 [You Only Sample (Almost) 由 Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh 发布。
1. 想要贡献新的模型?我们这里有一份**详细指引和模板**来引导你添加新的模型。你可以在 [`templates`](./templates) 目录中找到他们。记得查看 [贡献指南](./CONTRIBUTING.md) 并在开始写 PR 前联系维护人员或开一个新的 issue 来获得反馈。 1. 想要贡献新的模型?我们这里有一份**详细指引和模板**来引导你添加新的模型。你可以在 [`templates`](./templates) 目录中找到他们。记得查看 [贡献指南](./CONTRIBUTING.md) 并在开始写 PR 前联系维护人员或开一个新的 issue 来获得反馈。
要检查某个模型是否已有 Flax、PyTorch 或 TensorFlow 的实现,或其是否在 🤗 Tokenizers 库中有对应词符化器tokenizer敬请参阅[此表](https://huggingface.co/docs/transformers/index#supported-frameworks)。 要检查某个模型是否已有 Flax、PyTorch 或 TensorFlow 的实现,或其是否在 🤗 Tokenizers 库中有对应词符化器tokenizer敬请参阅[此表](https://huggingface.co/docs/transformers/index#supported-frameworks)。

View File

@ -53,7 +53,7 @@ user: 使用者
<p align="center"> <p align="center">
<br> <br>
<img src="https://raw.githubusercontent.com/huggingface/transformers/master/docs/source/imgs/transformers_logo_name.png" width="400"/> <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers_logo_name.png" width="400"/>
<br> <br>
<p> <p>
<p align="center"> <p align="center">
@ -89,7 +89,7 @@ user: 使用者
</h3> </h3>
<h3 align="center"> <h3 align="center">
<a href="https://hf.co/course"><img src="https://raw.githubusercontent.com/huggingface/transformers/master/docs/source/imgs/course_banner.png"></a> <a href="https://hf.co/course"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/course_banner.png"></a>
</h3> </h3>
🤗 Transformers 提供了數以千計的預訓練模型,支援 100 多種語言的文本分類、資訊擷取、問答、摘要、翻譯、文本生成。它的宗旨是讓最先進的 NLP 技術人人易用。 🤗 Transformers 提供了數以千計的預訓練模型,支援 100 多種語言的文本分類、資訊擷取、問答、摘要、翻譯、文本生成。它的宗旨是讓最先進的 NLP 技術人人易用。
@ -251,33 +251,35 @@ conda install -c huggingface transformers
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. 1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei. 1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei.
1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. 1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bertgeneration)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. 1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen. 1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/bigbird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot_small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry. 1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel. 1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot. 1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting. 1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. 1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan. 1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
1. **[ConvNeXT](https://huggingface.co/docs/transformers/master/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun. 1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher. 1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[Data2Vec](https://huggingface.co/docs/transformers/master/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. 1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta_v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. 1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou. 1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko. 1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. 1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT. 1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. 1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoderdecoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. 1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab. 1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. 1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. 1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. 1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. 1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**. 1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released with the paper [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki. 1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released with the paper [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
@ -293,18 +295,23 @@ conda install -c huggingface transformers
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal. 1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin. 1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team. 1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MaskFormer](https://huggingface.co/docs/transformers/master/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov
1. **[MBart](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. 1. **[MBart](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[MBart-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan. 1. **[MBart-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron_bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. 1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. 1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka. 1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. 1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. 1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/master/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. 1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira. 1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen. 1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[PLBart](https://huggingface.co/docs/transformers/master/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](https://huggingface.co/docs/transformers/master/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. 1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius. 1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[REALM](https://huggingface.co/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya. 1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder. 1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. 1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
@ -316,21 +323,32 @@ conda install -c huggingface transformers
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook) released with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau. 1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook) released with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University) released with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy. 1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University) released with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
1. **[SqueezeBert](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. 1. **[SqueezeBert](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](https://huggingface.co/docs/transformers/master/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. 1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released with the paper [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. 1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released with the paper [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos. 1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transformerxl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. 1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft) released with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei. 1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft) released with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang. 1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech_sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu. 1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
1. **[ViLT](https://huggingface.co/docs/transformers/master/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[ViLT)](https://huggingface.co/docs/transformers/master/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. 1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. 1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
1. **[ViTMAE](https://huggingface.co/docs/transformers/master/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[ViTMAE)](https://huggingface.co/docs/transformers/master/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli. 1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/master/transformers/model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli.
1. **[WavLM](https://huggingface.co/docs/transformers/master/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[XGLM](https://huggingface.co/docs/master/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau. 1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlmprophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. 1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlmroberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. 1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/master/model_doc/xlm-roberta-xl)** (from Facebook AI) released with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. 1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[XLS-R](https://huggingface.co/docs/master/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli. 1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. **[YOSO](https://huggingface.co/docs/transformers/master/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
1. 想要貢獻新的模型?我們這裡有一份**詳細指引和模板**來引導你加入新的模型。你可以在 [`templates`](./templates) 目錄中找到它們。記得查看[貢獻指引](./CONTRIBUTING.md)並在開始寫 PR 前聯繫維護人員或開一個新的 issue 來獲得 feedbacks。 1. 想要貢獻新的模型?我們這裡有一份**詳細指引和模板**來引導你加入新的模型。你可以在 [`templates`](./templates) 目錄中找到它們。記得查看[貢獻指引](./CONTRIBUTING.md)並在開始寫 PR 前聯繫維護人員或開一個新的 issue 來獲得 feedbacks。
要檢查某個模型是否已有 Flax、PyTorch 或 TensorFlow 的實作,或其是否在🤗 Tokenizers 函式庫中有對應的 tokenizer敬請參閱[此表](https://huggingface.co/docs/transformers/index#supported-frameworks)。 要檢查某個模型是否已有 Flax、PyTorch 或 TensorFlow 的實作,或其是否在🤗 Tokenizers 函式庫中有對應的 tokenizer敬請參閱[此表](https://huggingface.co/docs/transformers/index#supported-frameworks)。

View File

@ -15,6 +15,7 @@
# tests directory-specific settings - this file is run automatically # tests directory-specific settings - this file is run automatically
# by pytest before any tests are run # by pytest before any tests are run
import doctest
import sys import sys
import warnings import warnings
from os.path import abspath, dirname, join from os.path import abspath, dirname, join
@ -22,7 +23,7 @@ from os.path import abspath, dirname, join
# allow having multiple repository checkouts and not needing to remember to rerun # allow having multiple repository checkouts and not needing to remember to rerun
# 'pip install -e .[dev]' when switching between checkouts and running tests. # 'pip install -e .[dev]' when switching between checkouts and running tests.
git_repo_path = abspath(join(dirname(dirname(__file__)), "src")) git_repo_path = abspath(join(dirname(__file__), "src"))
sys.path.insert(1, git_repo_path) sys.path.insert(1, git_repo_path)
# silence FutureWarning warnings in tests since often we can't act on them until # silence FutureWarning warnings in tests since often we can't act on them until
@ -59,3 +60,19 @@ def pytest_sessionfinish(session, exitstatus):
# If no tests are collected, pytest exists with code 5, which makes the CI fail. # If no tests are collected, pytest exists with code 5, which makes the CI fail.
if exitstatus == 5: if exitstatus == 5:
session.exitstatus = 0 session.exitstatus = 0
# Doctest custom flag to ignore output.
IGNORE_RESULT = doctest.register_optionflag('IGNORE_RESULT')
OutputChecker = doctest.OutputChecker
class CustomOutputChecker(OutputChecker):
def check_output(self, want, got, optionflags):
if IGNORE_RESULT & optionflags:
return True
return OutputChecker.check_output(self, want, got, optionflags)
doctest.OutputChecker = CustomOutputChecker

View File

@ -0,0 +1,22 @@
FROM nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu20.04
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg
RUN python3 -m pip install --no-cache-dir --upgrade pip
ARG REF=master
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime]
RUN python3 -m pip install --no-cache-dir -U torch tensorflow
RUN python3 -m pip uninstall -y flax jax
RUN python3 -m pip install --no-cache-dir torch-scatter -f https://data.pyg.org/whl/torch-$(python3 -c "from torch import version; print(version.__version__.split('+')[0])")+cu102.html
RUN python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git pytesseract https://github.com/kpu/kenlm/archive/master.zip
RUN python3 -m pip install -U "itsdangerous<2.1.0"
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop

View File

@ -0,0 +1,16 @@
FROM python:3.8
LABEL maintainer="Hugging Face"
RUN apt update
RUN git clone https://github.com/huggingface/transformers
RUN python3 -m pip install --no-cache-dir --upgrade pip && python3 -m pip install --no-cache-dir git+https://github.com/huggingface/doc-builder ./transformers[dev,deepspeed]
RUN apt-get -y update && apt-get install -y libsndfile1-dev && apt install -y tesseract-ocr
RUN python3 -m pip install --no-cache-dir torch-scatter -f https://data.pyg.org/whl/torch-$(python -c "from torch import version; print(version.__version__.split('+')[0])")+cpu.html
RUN python3 -m pip install --no-cache-dir torchvision git+https://github.com/facebookresearch/detectron2.git pytesseract https://github.com/kpu/kenlm/archive/master.zip
RUN python3 -m pip install --no-cache-dir pytorch-quantization --extra-index-url https://pypi.ngc.nvidia.com
RUN python3 -m pip install -U "itsdangerous<2.1.0"
RUN doc-builder build transformers transformers/docs/source --build_dir doc-build-dev --notebook_dir notebooks/transformers_doc --clean --version pr_$PR_NUMBER
RUN rm -rf doc-build-dev

View File

@ -0,0 +1,21 @@
FROM nvcr.io/nvidia/pytorch:21.03-py3
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
RUN apt -y update
RUN apt install -y libaio-dev
RUN python3 -m pip install --no-cache-dir --upgrade pip
ARG REF=master
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir -e ./transformers[testing,deepspeed]
RUN git clone https://github.com/microsoft/DeepSpeed && cd DeepSpeed && rm -rf build && \
DS_BUILD_CPU_ADAM=1 DS_BUILD_AIO=1 DS_BUILD_UTILS=1 python3 -m pip install -e . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check 2>&1
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop
RUN python3 -c "from deepspeed.launcher.runner import main"

View File

@ -1,30 +1,26 @@
FROM nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04 FROM nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu20.04
LABEL maintainer="Hugging Face" LABEL maintainer="Hugging Face"
LABEL repository="transformers"
RUN apt update && \ ARG DEBIAN_FRONTEND=noninteractive
apt install -y bash \
build-essential \
git \
curl \
ca-certificates \
python3 \
python3-pip && \
rm -rf /var/lib/apt/lists
RUN python3 -m pip install --no-cache-dir --upgrade pip && \ RUN apt update
python3 -m pip install --no-cache-dir \ RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg
mkl \ RUN python3 -m pip install --no-cache-dir --upgrade pip
torch
RUN git clone https://github.com/NVIDIA/apex ARG REF=master
RUN cd apex && \ RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
python3 setup.py install && \ RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch,testing]
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
WORKDIR /workspace # If set to nothing, will install the latest version
COPY . transformers/ ARG PYTORCH=''
RUN cd transformers/ && \
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"] RUN [ ${#PYTORCH} -gt 0 ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; python3 -m pip install --no-cache-dir -U $VERSION
RUN python3 -m pip uninstall -y tensorflow flax
RUN python3 -m pip install --no-cache-dir torch-scatter -f https://data.pyg.org/whl/torch-$(python3 -c "from torch import version; print(version.__version__.split('+')[0])")+cu102.html
RUN python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git pytesseract https://github.com/kpu/kenlm/archive/master.zip
RUN python3 -m pip install -U "itsdangerous<2.1.0"
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop

View File

@ -1,25 +1,23 @@
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 FROM nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu20.04
LABEL maintainer="Hugging Face" LABEL maintainer="Hugging Face"
LABEL repository="transformers"
RUN apt update && \ ARG DEBIAN_FRONTEND=noninteractive
apt install -y bash \
build-essential \
git \
curl \
ca-certificates \
python3 \
python3-pip && \
rm -rf /var/lib/apt/lists
RUN python3 -m pip install --no-cache-dir --upgrade pip && \ RUN apt update
python3 -m pip install --no-cache-dir \ RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg
mkl \ RUN python3 -m pip install --no-cache-dir --upgrade pip
tensorflow
WORKDIR /workspace ARG REF=master
COPY . transformers/ RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN cd transformers/ && \ RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-tensorflow,testing]
python3 -m pip install --no-cache-dir .
CMD ["/bin/bash"] # If set to nothing, will install the latest version
ARG TENSORFLOW=''
RUN [ ${#TENSORFLOW} -gt 0 ] && VERSION='tensorflow=='$TENSORFLOW'.*' || VERSION='tensorflow'; python3 -m pip install --no-cache-dir -U $VERSION
RUN python3 -m pip uninstall -y torch flax
RUN python3 -m pip install -U "itsdangerous<2.1.0"
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop

View File

@ -1,19 +0,0 @@
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SOURCEDIR = source
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

View File

@ -23,6 +23,12 @@ you can install them with the following command, at the root of the code reposit
pip install -e ".[docs]" pip install -e ".[docs]"
``` ```
Then you need to install our special tool that builds the documentation:
```bash
pip install git+https://github.com/huggingface/hf-doc-utils
```
--- ---
**NOTE** **NOTE**
@ -31,88 +37,72 @@ check how they look like before committing for instance). You don't have to comm
--- ---
## Packages installed
Here's an overview of all the packages installed. If you ran the previous command installing all packages from
`requirements.txt`, you do not need to run the following commands.
Building it requires the package `sphinx` that you can
install using:
```bash
pip install -U sphinx
```
You would also need the custom installed [theme](https://github.com/readthedocs/sphinx_rtd_theme) by
[Read The Docs](https://readthedocs.org/). You can install it using the following command:
```bash
pip install sphinx_rtd_theme
```
The third necessary package is the `recommonmark` package to accept Markdown as well as Restructured text:
```bash
pip install recommonmark
```
## Building the documentation ## Building the documentation
Once you have setup `sphinx`, you can build the documentation by running the following command in the `/docs` folder: Once you have setup the `hf-doc-utilsr` and additional packages, you can generate the documentation by
typing the following command:
```bash ```bash
make html hf-doc-utils build transformers docs/source/ --build_dir ~/tmp/test-build
``` ```
A folder called ``_build/html`` should have been created. You can now open the file ``_build/html/index.html`` in your You can adapt the `--build_dir` to set any temporary folder that you prefer. This command will create it and generate
browser. the MDX files that will be rendered as the documentation on the main website. You can inspect them in your favorite
Markdown editor.
--- ---
**NOTE** **NOTE**
If you are adding/removing elements from the toc-tree or from any structural item, it is recommended to clean the build It's not possible to see locally how the final documentation will look like for now. Once you have opened a PR, you
directory before rebuilding. Run the following command to clean and build: will see a bot add a comment to a link where the documentation with your changes lives.
```bash
make clean && make html
```
--- ---
It should build the static app that will be available under `/docs/_build/html` ## Adding a new element to the navigation bar
## Adding a new element to the tree (toc-tree) Accepted files are Markdown (.md or .mdx).
Accepted files are reStructuredText (.rst) and Markdown (.md). Create a file with its extension and put it Create a file with its extension and put it in the source directory. You can then link it to the toc-tree by putting
in the source directory. You can then link it to the toc-tree by putting the filename without the extension. the filename without the extension in the [`_toctree.yml`](https://github.com/huggingface/transformers/blob/master/docs/source/_toctree.yml) file.
## Preview the documentation in a pull request ## Renaming section headers and moving sections
Once you have made your pull request, you can check what the documentation will look like after it's merged by It helps to keep the old links working when renaming section header and/or moving sections from one document to another. This is because the old links are likely to be used in Issues, Forums and Social media and it'd be make for a much more superior user experience if users reading those months later could still easily navigate to the originally intended information.
following these steps:
Therefore we simply keep a little map of moved sections at the end of the document where the original section was. The key is to preserve the original anchor.
So if you renamed a section from: "Section A" to "Section B", then you can add at the end of the file:
```
Sections that were moved:
[ <a href="#section-b">Section A</a><a id="section-a"></a> ]
```
and of course if you moved it to another file, then:
```
Sections that were moved:
[ <a href="../new-file#section-b">Section A</a><a id="section-a"></a> ]
```
Use the relative style to link to the new file so that the versioned docs continue to work.
For an example of a rich moved sections set please see the very end of [the Trainer doc](https://github.com/huggingface/transformers/blob/master/docs/source/main_classes/trainer.mdx).
- Look at the checks at the bottom of the conversation page of your PR (you may need to click on "show all checks" to
expand them).
- Click on "details" next to the `ci/circleci: build_doc` check.
- In the new window, click on the "Artifacts" tab.
- Locate the file "docs/_build/html/index.html" (or any specific page you want to check) and click on it to get a
preview.
## Writing Documentation - Specification ## Writing Documentation - Specification
The `huggingface/transformers` documentation follows the The `huggingface/transformers` documentation follows the
[Google documentation](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) style. It is [Google documentation](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) style for docstrings,
mostly written in ReStructuredText although we can write them directly in Markdown.
([Sphinx simple documentation](https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html),
[Sourceforge complete documentation](https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html)).
### Adding a new tutorial ### Adding a new tutorial
Adding a new tutorial or section is done in two steps: Adding a new tutorial or section is done in two steps:
- Add a new file under `./source`. This file can either be ReStructuredText (.rst) or Markdown (.md). - Add a new file under `./source`. This file can either be ReStructuredText (.rst) or Markdown (.md).
- Link that file in `./source/index.rst` on the correct toc-tree. - Link that file in `./source/_toctree.yml` on the correct toc-tree.
Make sure to put your new file under the proper section. It's unlikely to go in the first section (*Get Started*), so Make sure to put your new file under the proper section. It's unlikely to go in the first section (*Get Started*), so
depending on the intended targets (beginners, more advanced users or researchers) it should go in section two, three or depending on the intended targets (beginners, more advanced users or researchers) it should go in section two, three or
@ -122,8 +112,8 @@ four.
When adding a new model: When adding a new model:
- Create a file `xxx.rst` under `./source/model_doc` (don't hesitate to copy an existing file as template). - Create a file `xxx.mdx` or under `./source/model_doc` (don't hesitate to copy an existing file as template).
- Link that file in `./source/index.rst` on the `model_doc` toc-tree. - Link that file in `./source/_toctree.yml`.
- Write a short overview of the model: - Write a short overview of the model:
- Overview with paper & authors - Overview with paper & authors
- Paper abstract - Paper abstract
@ -137,64 +127,82 @@ When adding a new model:
- PyTorch head models - PyTorch head models
- TensorFlow base model - TensorFlow base model
- TensorFlow head models - TensorFlow head models
- Flax base model
- Flax head models
These classes should be added using our Markdown syntax. Usually as follows:
These classes should be added using the RST syntax. Usually as follows:
``` ```
XXXConfig ## XXXConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XXXConfig [[autodoc]] XXXConfig
:members:
``` ```
This will include every public method of the configuration that is documented. If for some reason you wish for a method This will include every public method of the configuration that is documented. If for some reason you wish for a method
not to be displayed in the documentation, you can do so by specifying which methods should be in the docs: not to be displayed in the documentation, you can do so by specifying which methods should be in the docs:
``` ```
XXXTokenizer ## XXXTokenizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.XXXTokenizer [[autodoc]] XXXTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask, - build_inputs_with_special_tokens
create_token_type_ids_from_sequences, save_vocabulary - get_special_tokens_mask
- create_token_type_ids_from_sequences
- save_vocabulary
```
If you just want to add a method that is not documented (for instance magic method like `__call__` are not documented
byt default) you can put the list of methods to add in a list that contains `all`:
```
## XXXTokenizer
[[autodoc]] XXXTokenizer
- all
- __call__
``` ```
### Writing source documentation ### Writing source documentation
Values that should be put in `code` should either be surrounded by double backticks: \`\`like so\`\` or be written as Values that should be put in `code` should either be surrounded by backticks: \`like so\`. Note that argument names
an object using the :obj: syntax: :obj:\`like so\`. Note that argument names and objects like True, None or any strings and objects like True, None or any strings should usually be put in `code`.
should usually be put in `code`.
When mentioning a class, it is recommended to use the :class: syntax as the mentioned class will be automatically When mentioning a class, function or method, it is recommended to use our syntax for internal links so that our tool
linked by Sphinx: :class:\`~transformers.XXXClass\` adds a link to its documentation with this syntax: \[\`XXXClass\`\] or \[\`function\`\]. This requires the class or
function to be in the main package.
When mentioning a function, it is recommended to use the :func: syntax as the mentioned function will be automatically If you want to create a link to some internal class or function, you need to
linked by Sphinx: :func:\`~transformers.function\`. provide its path. For instance: \[\`file_utils.ModelOutput\`\]. This will be converted into a link with
`file_utils.ModelOutput` in the description. To get rid of the path and only keep the name of the object you are
linking to in the description, add a ~: \[\`~file_utils.ModelOutput\`\] will generate a link with `ModelOutput` in the description.
When mentioning a method, it is recommended to use the :meth: syntax as the mentioned method will be automatically The same works for methods so you can either use \[\`XXXClass.method\`\] or \[~\`XXXClass.method\`\].
linked by Sphinx: :meth:\`~transformers.XXXClass.method\`.
Links should be done as so (note the double underscore at the end): \`text for the link <./local-link-or-global-link#loc>\`__
#### Defining arguments in a method #### Defining arguments in a method
Arguments should be defined with the `Args:` prefix, followed by a line return and an indentation. Arguments should be defined with the `Args:` (or `Arguments:` or `Parameters:`) prefix, followed by a line return and
The argument should be followed by its type, with its shape if it is a tensor, and a line return. an indentation. The argument should be followed by its type, with its shape if it is a tensor, a colon and its
Another indentation is necessary before writing the description of the argument. description:
```
Args:
n_layers (`int`): The number of layers of the model.
```
If the description is too long to fit in one line, another indentation is necessary before writing the description
after th argument.
Here's an example showcasing everything so far: Here's an example showcasing everything so far:
``` ```
Args: Args:
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`): input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
Indices of input sequence tokens in the vocabulary. Indices of input sequence tokens in the vocabulary.
Indices can be obtained using :class:`~transformers.AlbertTokenizer`. Indices can be obtained using [`AlbertTokenizer`]. See [`~PreTrainedTokenizer.encode`] and
See :meth:`~transformers.PreTrainedTokenizer.encode` and [`~PreTrainedTokenizer.__call__`] for details.
:meth:`~transformers.PreTrainedTokenizer.__call__` for details.
`What are input IDs? <../glossary.html#input-ids>`__ [What are input IDs?](../glossary#input-ids)
``` ```
For optional arguments or arguments with defaults we follow the following syntax: imagine we have a function with the For optional arguments or arguments with defaults we follow the following syntax: imagine we have a function with the
@ -208,93 +216,183 @@ then its documentation should look like this:
``` ```
Args: Args:
x (:obj:`str`, `optional`): x (`str`, *optional*):
This argument controls ... This argument controls ...
a (:obj:`float`, `optional`, defaults to 1): a (`float`, *optional*, defaults to 1):
This argument is used to ... This argument is used to ...
``` ```
Note that we always omit the "defaults to :obj:\`None\`" when None is the default for any argument. Also note that even Note that we always omit the "defaults to \`None\`" when None is the default for any argument. Also note that even
if the first line describing your argument type and its default gets long, you can't break it on several lines. You can if the first line describing your argument type and its default gets long, you can't break it on several lines. You can
however write as many lines as you want in the indented description (see the example above with `input_ids`). however write as many lines as you want in the indented description (see the example above with `input_ids`).
#### Writing a multi-line code block #### Writing a multi-line code block
Multi-line code blocks can be useful for displaying examples. They are done like so: Multi-line code blocks can be useful for displaying examples. They are done between two lines of three backticks as usual in Markdown:
````
``` ```
Example:: # first line of code
# second line
# first line of code # etc
# second line
# etc
``` ```
````
The `Example` string at the beginning can be replaced by anything as long as there are two semicolons following it.
We follow the [doctest](https://docs.python.org/3/library/doctest.html) syntax for the examples to automatically test We follow the [doctest](https://docs.python.org/3/library/doctest.html) syntax for the examples to automatically test
the results stay consistent with the library. the results stay consistent with the library.
#### Writing a return block #### Writing a return block
Arguments should be defined with the `Args:` prefix, followed by a line return and an indentation. The return block should be introduced with the `Returns:` prefix, followed by a line return and an indentation.
The first line should be the type of the return, followed by a line return. No need to indent further for the elements The first line should be the type of the return, followed by a line return. No need to indent further for the elements
building the return. building the return.
Here's an example for tuple return, comprising several objects:
```
Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.BertConfig`) and inputs:
loss (`optional`, returned when ``masked_lm_labels`` is provided) ``torch.FloatTensor`` of shape ``(1,)``:
Total loss as the sum of the masked language modeling loss and the next sequence prediction (classification) loss.
prediction_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`)
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
```
Here's an example for a single value return: Here's an example for a single value return:
``` ```
Returns: Returns:
:obj:`List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token. `List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token.
``` ```
#### Adding a new section Here's an example for tuple return, comprising several objects:
In ReST section headers are designated as such with the help of a line of underlying characters, e.g.,:
``` ```
Section 1 Returns:
^^^^^^^^^^^^^^^^^^ `tuple(torch.FloatTensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs:
- ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.FloatTensor` of shape `(1,)` --
Sub-section 1 Total loss as the sum of the masked language modeling loss and the next sequence prediction (classification) loss.
~~~~~~~~~~~~~~~~~~ - **prediction_scores** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) --
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
``` ```
ReST allows the use of any characters to designate different section levels, as long as they are used consistently within the same document. For details see [sections doc](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#sections). Because there is no standard different documents often end up using different characters for the same levels which makes it very difficult to know which character to use when creating a new section. #### Adding an image
Specifically, if when running `make docs` you get an error like: Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos and other non-text files. We prefer to leverage a hf.co hosted `dataset` like
``` the ones hosted on [`hf-internal-testing`](https://huggingface.co/hf-internal-testing) in which to place these files and reference
docs/source/main_classes/trainer.rst:127:Title level inconsistent: them by URL. We recommend putting them in the following dataset: [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images).
``` If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images
you picked an inconsistent character for some of the levels. to this dataset.
But how do you know which characters you must use for an already existing level or when adding a new level? ## Styling the docstring
We have an automatic script running with the `make style` comment that will make sure that:
- the docstrings fully take advantage of the line width
- all code examples are formatted using black, like the code of the Transformers library
This script may have some weird failures if you made a syntax mistake or if you uncover a bug. Therefore, it's
recommended to commit your changes before running `make style`, so you can revert the changes done by that script
easily.
# Testing documentation examples
Good documentation oftens comes with an example of how a specific function or class should be used.
Each model class should contain at least one example showcasing
how to use this model class in inference. *E.g.* the class [Wav2Vec2ForCTC](https://huggingface.co/docs/transformers/model_doc/wav2vec2#transformers.Wav2Vec2ForCTC)
includes an example of how to transcribe speech to text in the
[docstring of its forward function](https://huggingface.co/docs/transformers/model_doc/wav2vec2#transformers.Wav2Vec2ForCTC.forward).
## Writing documenation examples
The syntax for Example docstrings can look as follows:
You can use this helper script:
``` ```
perl -ne '/^(.)\1{100,}/ && do { $h{$1}=++$c if !$h{$1} }; END { %h = reverse %h ; print "$_ $h{$_}\n" for sort keys %h}' docs/source/main_classes/trainer.rst Example:
1 -
2 ~ ```python
3 ^ >>> from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
4 = >>> from datasets import load_dataset
5 " >>> import torch
>>> dataset = load_dataset("hf-internal-testing/librispeech_asr_demo", "clean", split="validation")
>>> dataset = dataset.sort("id")
>>> sampling_rate = dataset.features["audio"].sampling_rate
>>> processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-960h")
>>> model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h")
>>> # audio file is decoded on the fly
>>> inputs = processor(dataset[0]["audio"]["array"], sampling_rate=sampling_rate, return_tensors="pt")
>>> with torch.no_grad():
... logits = model(**inputs).logits
>>> predicted_ids = torch.argmax(logits, dim=-1)
>>> # transcribe speech
>>> transcription = processor.batch_decode(predicted_ids)
>>> transcription[0]
'MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL'
```
``` ```
This tells you which characters have already been assigned for each level. The docstring should give a minimal, clear example of how the respective model
is to be used in inference and also include the expected (ideally sensible)
output.
Often, readers will try out the example before even going through the function
or class definitions. Therefore it is of utmost importance that the example
works as expected.
So using this particular example's output -- if your current section's header uses `=` as its underline character, you now know you're at level 4, and if you want to add a sub-section header you know you want `"` as it'd level 5. ## Docstring testing
If you needed to add yet another sub-level, then pick a character that is not used already. That is you must pick a character that is not in the output of that script. To do so each example should be included in the doctests.
We use pytests' [doctest integration](https://docs.pytest.org/doctest.html) to verify that all of our examples run correctly.
For Transformers, the doctests are run on a daily basis via GitHub Actions as can be
seen [here](https://github.com/huggingface/transformers/actions/workflows/doctests.yml).
Here is the full list of characters that can be used in this context: `= - ` : ' " ~ ^ _ * + # < >` To include your example in the daily doctests, you need add the filename that
contains the example docstring to the [documentation_tests.txt](../utils/documentation_tests.txt).
### For Python files
You will first need to run the following command (from the root of the repository) to prepare the doc file (doc-testing needs to add additional lines that we don't include in the doc source files):
```bash
python utils/prepare_for_doc_test.py src docs
```
Then you can run all the tests in the docstrings of a given file with the following command, here is how we test the modeling file of Wav2Vec2 for instance:
```bash
pytest --doctest-modules src/transformers/models/wav2vec2/modeling_wav2vec2.py -sv --doctest-continue-on-failure
```
If you want to isolate a specific docstring, just add `::` after the file name then type the whole path of the function/class/method whose docstring you want to test. For instance, here is how to just test the forward method of `Wav2Vec2ForCTC`:
```bash
pytest --doctest-modules src/transformers/models/wav2vec2/modeling_wav2vec2.py::transformers.models.wav2vec2.modeling_wav2vec2.Wav2Vec2ForCTC.forward -sv --doctest-continue-on-failure
```
Once you're done, you can run the following command (still from the root of the repository) to undo the changes made by the first command before committing:
```bash
python utils/prepare_for_doc_test.py src docs --remove_new_line
```
### For Markdown files
You will first need to run the following command (from the root of the repository) to prepare the doc file (doc-testing needs to add additional lines that we don't include in the doc source files):
```bash
python utils/prepare_for_doc_test.py src docs
```
Then you can test locally a given file with this command (here testing the quicktour):
```bash
pytest --doctest-modules docs/source/quicktour.mdx -sv --doctest-continue-on-failure --doctest-glob="*.mdx"
```
Once you're done, you can run the following command (still from the root of the repository) to undo the changes made by the first command before committing:
```bash
python utils/prepare_for_doc_test.py src docs --remove_new_line
```
### Writing doctests
Here are a few tips to help you debug the doctests and make them pass:
- The outputs of the code need to match the expected output **exactly**, so make sure you have the same outputs. In particular doctest will see a difference between single quotes and double quotes, or a missing parenthesis. The only exceptions to that rule are:
* whitespace: one give whitespace (space, tabulation, new line) is equivalent to any number of whitespace, so you can add new lines where there are spaces to make your output more readable.
* numerical values: you should never put more than 4 or 5 digits to expected results as different setups or library versions might get you slightly different results. `doctest` is configure to ignore any difference lower than the precision to which you wrote (so 1e-4 if you write 4 digits).
- Don't leave a block of code that is very long to execute. If you can't make it fast, you can either not use the doctest syntax on it (so that it's ignored), or if you want to use the doctest syntax to show the results, you can add a comment `# doctest: +SKIP` at the end of the lines of code too long to execute
- Each line of code that produces a result needs to have that result written below. You can ignore an output if you don't want to show it in your code example by adding a comment ` # doctest: +IGNORE_RESULT` at the end of the line of code produing it.

View File

@ -11,28 +11,54 @@
title: Glossary title: Glossary
title: Get started title: Get started
- sections: - sections:
- local: pipeline_tutorial
title: Pipelines for inference
- local: autoclass_tutorial
title: Load pretrained instances with an AutoClass
- local: preprocessing
title: Preprocess
- local: task_summary - local: task_summary
title: Summary of the tasks title: Summary of the tasks
- local: model_summary - local: model_summary
title: Summary of the models title: Summary of the models
- local: preprocessing
title: Preprocessing data
- local: training - local: training
title: Fine-tuning a pretrained model title: Fine-tuning a pretrained model
- local: accelerate
title: Distributed training with 🤗 Accelerate
- local: model_sharing - local: model_sharing
title: Model sharing and uploading title: Share a model
- local: tokenizer_summary - local: tokenizer_summary
title: Summary of the tokenizers title: Summary of the tokenizers
- local: multilingual - local: multilingual
title: Multi-lingual models title: Multi-lingual models
title: "Using 🤗 Transformers" title: Tutorials
- sections: - sections:
- local: examples - local: create_a_model
title: Examples title: Create a custom model
- local: multilingual
title: Inference for multilingual models
- local: troubleshooting - local: troubleshooting
title: Troubleshooting title: Troubleshooting
- local: custom_datasets - local: custom_datasets
title: Fine-tuning with custom datasets title: Fine-tuning with custom datasets
- sections:
- local: tasks/sequence_classification
title: Text classification
- local: tasks/token_classification
title: Token classification
- local: tasks/question_answering
title: Question answering
- local: tasks/language_modeling
title: Language modeling
- local: tasks/translation
title: Translation
- local: tasks/summarization
title: Summarization
- local: tasks/multiple_choice
title: Multiple choice
title: Fine-tune for downstream tasks
- local: run_scripts
title: Train with a script
- local: notebooks - local: notebooks
title: "🤗 Transformers Notebooks" title: "🤗 Transformers Notebooks"
- local: sagemaker - local: sagemaker
@ -60,10 +86,12 @@
- local: debugging - local: debugging
title: Debugging title: Debugging
- local: serialization - local: serialization
title: Exporting transformers models title: Exporting 🤗 Transformers models
- local: custom_models
title: Sharing custom models
- local: pr_checks - local: pr_checks
title: Checks on a Pull Request title: Checks on a Pull Request
title: Advanced guides title: How-to guides
- sections: - sections:
- local: bertology - local: bertology
title: BERTology title: BERTology
@ -86,6 +114,8 @@
title: Logging title: Logging
- local: main_classes/model - local: main_classes/model
title: Models title: Models
- local: main_classes/onnx
title: ONNX
- local: main_classes/optimizer_schedules - local: main_classes/optimizer_schedules
title: Optimization title: Optimization
- local: main_classes/output - local: main_classes/output
@ -120,17 +150,17 @@
title: BERT title: BERT
- local: model_doc/bertweet - local: model_doc/bertweet
title: Bertweet title: Bertweet
- local: model_doc/bertgeneration - local: model_doc/bert-generation
title: BertGeneration title: BertGeneration
- local: model_doc/bert_japanese - local: model_doc/bert-japanese
title: BertJapanese title: BertJapanese
- local: model_doc/bigbird - local: model_doc/big_bird
title: BigBird title: BigBird
- local: model_doc/bigbird_pegasus - local: model_doc/bigbird_pegasus
title: BigBirdPegasus title: BigBirdPegasus
- local: model_doc/blenderbot - local: model_doc/blenderbot
title: Blenderbot title: Blenderbot
- local: model_doc/blenderbot_small - local: model_doc/blenderbot-small
title: Blenderbot Small title: Blenderbot Small
- local: model_doc/bort - local: model_doc/bort
title: BORT title: BORT
@ -140,6 +170,8 @@
title: CamemBERT title: CamemBERT
- local: model_doc/canine - local: model_doc/canine
title: CANINE title: CANINE
- local: model_doc/convnext
title: ConvNeXT
- local: model_doc/clip - local: model_doc/clip
title: CLIP title: CLIP
- local: model_doc/convbert - local: model_doc/convbert
@ -148,9 +180,11 @@
title: CPM title: CPM
- local: model_doc/ctrl - local: model_doc/ctrl
title: CTRL title: CTRL
- local: model_doc/data2vec
title: Data2Vec
- local: model_doc/deberta - local: model_doc/deberta
title: DeBERTa title: DeBERTa
- local: model_doc/deberta_v2 - local: model_doc/deberta-v2
title: DeBERTa-v2 title: DeBERTa-v2
- local: model_doc/deit - local: model_doc/deit
title: DeiT title: DeiT
@ -164,7 +198,7 @@
title: DPR title: DPR
- local: model_doc/electra - local: model_doc/electra
title: ELECTRA title: ELECTRA
- local: model_doc/encoderdecoder - local: model_doc/encoder-decoder
title: Encoder Decoder Models title: Encoder Decoder Models
- local: model_doc/flaubert - local: model_doc/flaubert
title: FlauBERT title: FlauBERT
@ -175,7 +209,7 @@
- local: model_doc/funnel - local: model_doc/funnel
title: Funnel Transformer title: Funnel Transformer
- local: model_doc/herbert - local: model_doc/herbert
title: herBERT title: HerBERT
- local: model_doc/ibert - local: model_doc/ibert
title: I-BERT title: I-BERT
- local: model_doc/imagegpt - local: model_doc/imagegpt
@ -196,14 +230,18 @@
title: LXMERT title: LXMERT
- local: model_doc/marian - local: model_doc/marian
title: MarianMT title: MarianMT
- local: model_doc/maskformer
title: MaskFormer
- local: model_doc/m2m_100 - local: model_doc/m2m_100
title: M2M100 title: M2M100
- local: model_doc/mbart - local: model_doc/mbart
title: MBart and MBart-50 title: MBart and MBart-50
- local: model_doc/megatron_bert - local: model_doc/megatron-bert
title: MegatronBERT title: MegatronBERT
- local: model_doc/megatron_gpt2 - local: model_doc/megatron_gpt2
title: MegatronGPT2 title: MegatronGPT2
- local: model_doc/mluke
title: MLUKE
- local: model_doc/mobilebert - local: model_doc/mobilebert
title: MobileBERT title: MobileBERT
- local: model_doc/mluke - local: model_doc/mluke
@ -212,7 +250,9 @@
title: MPNet title: MPNet
- local: model_doc/mt5 - local: model_doc/mt5
title: MT5 title: MT5
- local: model_doc/gpt - local: model_doc/nystromformer
title: Nyströmformer
- local: model_doc/openai-gpt
title: OpenAI GPT title: OpenAI GPT
- local: model_doc/gpt2 - local: model_doc/gpt2
title: OpenAI GPT2 title: OpenAI GPT2
@ -228,12 +268,18 @@
title: Pegasus title: Pegasus
- local: model_doc/phobert - local: model_doc/phobert
title: PhoBERT title: PhoBERT
- local: model_doc/plbart
title: PLBart
- local: model_doc/poolformer
title: PoolFormer
- local: model_doc/prophetnet - local: model_doc/prophetnet
title: ProphetNet title: ProphetNet
- local: model_doc/qdqbert - local: model_doc/qdqbert
title: QDQBert title: QDQBert
- local: model_doc/rag - local: model_doc/rag
title: RAG title: RAG
- local: model_doc/realm
title: REALM
- local: model_doc/reformer - local: model_doc/reformer
title: Reformer title: Reformer
- local: model_doc/rembert - local: model_doc/rembert
@ -248,9 +294,9 @@
title: SegFormer title: SegFormer
- local: model_doc/sew - local: model_doc/sew
title: SEW title: SEW
- local: model_doc/sew_d - local: model_doc/sew-d
title: SEW-D title: SEW-D
- local: model_doc/speechencoderdecoder - local: model_doc/speech-encoder-decoder
title: Speech Encoder Decoder Models title: Speech Encoder Decoder Models
- local: model_doc/speech_to_text - local: model_doc/speech_to_text
title: Speech2Text title: Speech2Text
@ -260,40 +306,58 @@
title: Splinter title: Splinter
- local: model_doc/squeezebert - local: model_doc/squeezebert
title: SqueezeBERT title: SqueezeBERT
- local: model_doc/swin
title: Swin Transformer
- local: model_doc/t5 - local: model_doc/t5
title: T5 title: T5
- local: model_doc/t5v1.1 - local: model_doc/t5v1.1
title: T5v1.1 title: T5v1.1
- local: model_doc/tapas - local: model_doc/tapas
title: TAPAS title: TAPAS
- local: model_doc/transformerxl - local: model_doc/transfo-xl
title: Transformer XL title: Transformer XL
- local: model_doc/trocr - local: model_doc/trocr
title: TrOCR title: TrOCR
- local: model_doc/unispeech - local: model_doc/unispeech
title: UniSpeech title: UniSpeech
- local: model_doc/unispeech_sat - local: model_doc/unispeech-sat
title: UniSpeech-SAT title: UniSpeech-SAT
- local: model_doc/visionencoderdecoder - local: model_doc/vilt
title: ViLT
- local: model_doc/vision-encoder-decoder
title: Vision Encoder Decoder Models title: Vision Encoder Decoder Models
- local: model_doc/vision_text_dual_encoder - local: model_doc/vision-text-dual-encoder
title: Vision Text Dual Encoder title: Vision Text Dual Encoder
- local: model_doc/vit - local: model_doc/vit
title: Vision Transformer (ViT) title: Vision Transformer (ViT)
- local: model_doc/vit_mae
title: ViTMAE
- local: model_doc/visual_bert - local: model_doc/visual_bert
title: VisualBERT title: VisualBERT
- local: model_doc/wav2vec2 - local: model_doc/wav2vec2
title: Wav2Vec2 title: Wav2Vec2
- local: model_doc/wav2vec2_phoneme
title: Wav2Vec2Phoneme
- local: model_doc/wavlm
title: WavLM
- local: model_doc/xglm
title: XGLM
- local: model_doc/xlm - local: model_doc/xlm
title: XLM title: XLM
- local: model_doc/xlmprophetnet - local: model_doc/xlm-prophetnet
title: XLM-ProphetNet title: XLM-ProphetNet
- local: model_doc/xlmroberta - local: model_doc/xlm-roberta
title: XLM-RoBERTa title: XLM-RoBERTa
- local: model_doc/xlm-roberta-xl
title: XLM-RoBERTa-XL
- local: model_doc/xlnet - local: model_doc/xlnet
title: XLNet title: XLNet
- local: model_doc/xlsr_wav2vec2 - local: model_doc/xlsr_wav2vec2
title: XLSR-Wav2Vec2 title: XLSR-Wav2Vec2
- local: model_doc/xls_r
title: XLS-R
- local: model_doc/yoso
title: YOSO
title: Models title: Models
- sections: - sections:
- local: internal/modeling_utils - local: internal/modeling_utils

132
docs/source/accelerate.mdx Normal file
View File

@ -0,0 +1,132 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Distributed training with 🤗 Accelerate
As models get bigger, parallelism has emerged as a strategy for training larger models on limited hardware and accelerating training speed by several orders of magnitude. At Hugging Face, we created the [🤗 Accelerate](https://huggingface.co/docs/accelerate/index.html) library to help users easily train a 🤗 Transformers model on any type of distributed setup, whether it is multiple GPU's on one machine or multiple GPU's across several machines. In this tutorial, learn how to customize your native PyTorch training loop to enable training in a distributed environment.
## Setup
Get started by installing 🤗 Accelerate:
```bash
pip install accelerate
```
Then import and create an [`Accelerator`](https://huggingface.co/docs/accelerate/accelerator.html#accelerate.Accelerator) object. `Accelerator` will automatically detect your type of distributed setup and initialize all the necessary components for training. You don't need to explicitly place your model on a device.
```py
>>> from accelerate import Accelerator
>>> accelerator = Accelerator()
```
## Prepare to accelerate
The next step is to pass all the relevant training objects to the [`prepare`](https://huggingface.co/docs/accelerate/accelerator.html#accelerate.Accelerator.prepare) method. This includes your training and evaluation DataLoaders, a model and an optimizer:
```py
>>> train_dataloader, eval_dataloader, model, optimizer = accelerator.prepare(
... train_dataloader, eval_dataloader, model, optimizer
... )
```
## Backward
The last addition is to replace the typical `loss.backward()` in your training loop with 🤗 Accelerate's [`backward`](https://huggingface.co/docs/accelerate/accelerator.html#accelerate.Accelerator.backward) method:
```py
>>> for epoch in range(num_epochs):
... for batch in train_dataloader:
... outputs = model(**batch)
... loss = outputs.loss
... accelerator.backward(loss)
... optimizer.step()
... lr_scheduler.step()
... optimizer.zero_grad()
... progress_bar.update(1)
```
As you can see in the following code, you only need to add four additional lines of code to your training loop to enable distributed training!
```diff
+ from accelerate import Accelerator
from transformers import AdamW, AutoModelForSequenceClassification, get_scheduler
+ accelerator = Accelerator()
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
optimizer = AdamW(model.parameters(), lr=3e-5)
- device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
- model.to(device)
+ train_dataloader, eval_dataloader, model, optimizer = accelerator.prepare(
+ train_dataloader, eval_dataloader, model, optimizer
+ )
num_epochs = 3
num_training_steps = num_epochs * len(train_dataloader)
lr_scheduler = get_scheduler(
"linear",
optimizer=optimizer,
num_warmup_steps=0,
num_training_steps=num_training_steps
)
progress_bar = tqdm(range(num_training_steps))
model.train()
for epoch in range(num_epochs):
for batch in train_dataloader:
- batch = {k: v.to(device) for k, v in batch.items()}
outputs = model(**batch)
loss = outputs.loss
- loss.backward()
+ accelerator.backward(loss)
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
progress_bar.update(1)
```
## Train
Once you've added the relevant lines of code, launch your training in a script or a notebook like Colaboratory.
### Train with a script
If you are running your training from a script, run the following command to create and save a configuration file:
```bash
accelerate config
```
Then launch your training with:
```bash
accelerate launch train.py
```
### Train with a notebook
🤗 Accelerate can also run in a notebook if you're planning on using Colaboratory's TPUs. Wrap all the code responsible for training in a function, and pass it to `notebook_launcher`:
```py
>>> from accelerate import notebook_launcher
>>> notebook_launcher(training_function)
```
For more information about 🤗 Accelerate and it's rich features, refer to the [documentation](https://huggingface.co/docs/accelerate/index.html).

View File

@ -0,0 +1,140 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-->
# How to add a pipeline to 🤗 Transformers?
First and foremost, you need to decide the raw entries the pipeline will be able to take. It can be strings, raw bytes,
dictionaries or whatever seems to be the most likely desired input. Try to keep these inputs as pure Python as possible
as it makes compatibility easier (even through other languages via JSON). Those will be the `inputs` of the
pipeline (`preprocess`).
Then define the `outputs`. Same policy as the `inputs`. The simpler, the better. Those will be the outputs of
`postprocess` method.
Start by inheriting the base class `Pipeline`. with the 4 methods needed to implement `preprocess`,
`_forward`, `postprocess` and `_sanitize_parameters`.
```python
from transformers import Pipeline
class MyPipeline(Pipeline):
def _sanitize_parameters(self, **kwargs):
preprocess_kwargs = {}
if "maybe_arg" in kwargs:
preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
return preprocess_kwargs, {}, {}
def preprocess(self, inputs, maybe_arg=2):
model_input = Tensor(inputs["input_ids"])
return {"model_input": model_input}
def _forward(self, model_inputs):
# model_inputs == {"model_input": model_input}
outputs = self.model(**model_inputs)
# Maybe {"logits": Tensor(...)}
return outputs
def postprocess(self, model_outputs):
best_class = model_outputs["logits"].softmax(-1)
return best_class
```
The structure of this breakdown is to support relatively seamless support for CPU/GPU, while supporting doing
pre/postprocessing on the CPU on different threads
`preprocess` will take the originally defined inputs, and turn them into something feedable to the model. It might
contain more information and is usually a `Dict`.
`_forward` is the implementation detail and is not meant to be called directly. `forward` is the preferred
called method as it contains safeguards to make sure everything is working on the expected device. If anything is
linked to a real model it belongs in the `_forward` method, anything else is in the preprocess/postprocess.
`postprocess` methods will take the output of `_forward` and turn it into the final output that were decided
earlier.
`_sanitize_parameters` exists to allow users to pass any parameters whenever they wish, be it at initialization
time `pipeline(...., maybe_arg=4)` or at call time `pipe = pipeline(...); output = pipe(...., maybe_arg=4)`.
The returns of `_sanitize_parameters` are the 3 dicts of kwargs that will be passed directly to `preprocess`,
`_forward` and `postprocess`. Don't fill anything if the caller didn't call with any extra parameter. That
allows to keep the default arguments in the function definition which is always more "natural".
A classic example would be a `top_k` argument in the post processing in classification tasks.
```python
>>> pipe = pipeline("my-new-task")
>>> pipe("This is a test")
[{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}, {"label": "3-star", "score": 0.05}
{"label": "4-star", "score": 0.025}, {"label": "5-star", "score": 0.025}]
>>> pipe("This is a test", top_k=2)
[{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}]
```
In order to achieve that, we'll update our `postprocess` method with a default parameter to `5`. and edit
`_sanitize_parameters` to allow this new parameter.
```python
def postprocess(self, model_outputs, top_k=5):
best_class = model_outputs["logits"].softmax(-1)
# Add logic to handle top_k
return best_class
def _sanitize_parameters(self, **kwargs):
preprocess_kwargs = {}
if "maybe_arg" in kwargs:
preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
postprocess_kwargs = {}
if "top_k" in kwargs:
preprocess_kwargs["top_k"] = kwargs["top_k"]
return preprocess_kwargs, {}, postprocess_kwargs
```
Try to keep the inputs/outputs very simple and ideally JSON-serializable as it makes the pipeline usage very easy
without requiring users to understand new kind of objects. It's also relatively common to support many different types
of arguments for ease of use (audio files, can be filenames, URLs or pure bytes)
## Adding it to the list of supported tasks
Go to `src/transformers/pipelines/__init__.py` and fill in `SUPPORTED_TASKS` with your newly created pipeline.
If possible it should provide a default model.
## Adding tests
Create a new file `tests/test_pipelines_MY_PIPELINE.py` with example with the other tests.
The `run_pipeline_test` function will be very generic and run on small random models on every possible
architecture as defined by `model_mapping` and `tf_model_mapping`.
This is very important to test future compatibility, meaning if someone adds a new model for
`XXXForQuestionAnswering` then the pipeline test will attempt to run on it. Because the models are random it's
impossible to check for actual values, that's why There is a helper `ANY` that will simply attempt to match the
output of the pipeline TYPE.
You also *need* to implement 2 (ideally 4) tests.
- `test_small_model_pt` : Define 1 small model for this pipeline (doesn't matter if the results don't make sense)
and test the pipeline outputs. The results should be the same as `test_small_model_tf`.
- `test_small_model_tf` : Define 1 small model for this pipeline (doesn't matter if the results don't make sense)
and test the pipeline outputs. The results should be the same as `test_small_model_pt`.
- `test_large_model_pt` (`optional`): Tests the pipeline on a real pipeline where the results are supposed to
make sense. These tests are slow and should be marked as such. Here the goal is to showcase the pipeline and to make
sure there is no drift in future releases
- `test_large_model_tf` (`optional`): Tests the pipeline on a real pipeline where the results are supposed to
make sense. These tests are slow and should be marked as such. Here the goal is to showcase the pipeline and to make
sure there is no drift in future releases

View File

@ -1,143 +0,0 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
How to add a pipeline to 🤗 Transformers?
=======================================================================================================================
First and foremost, you need to decide the raw entries the pipeline will be able to take. It can be strings, raw bytes,
dictionaries or whatever seems to be the most likely desired input. Try to keep these inputs as pure Python as possible
as it makes compatibility easier (even through other languages via JSON). Those will be the :obj:`inputs` of the
pipeline (:obj:`preprocess`).
Then define the :obj:`outputs`. Same policy as the :obj:`inputs`. The simpler, the better. Those will be the outputs of
:obj:`postprocess` method.
Start by inheriting the base class :obj:`Pipeline`. with the 4 methods needed to implement :obj:`preprocess`,
:obj:`_forward`, :obj:`postprocess` and :obj:`_sanitize_parameters`.
.. code-block::
from transformers import Pipeline
class MyPipeline(Pipeline):
def _sanitize_parameters(self, **kwargs):
preprocess_kwargs = {}
if "maybe_arg" in kwargs:
preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
return preprocess_kwargs, {}, {}
def preprocess(self, inputs, maybe_arg=2):
model_input = Tensor(....)
return {"model_input": model_input}
def _forward(self, model_inputs):
# model_inputs == {"model_input": model_input}
outputs = self.model(**model_inputs)
# Maybe {"logits": Tensor(...)}
return outputs
def postprocess(self, model_outputs):
best_class = model_outputs["logits"].softmax(-1)
return best_class
The structure of this breakdown is to support relatively seamless support for CPU/GPU, while supporting doing
pre/postprocessing on the CPU on different threads
:obj:`preprocess` will take the originally defined inputs, and turn them into something feedable to the model. It might
contain more information and is usually a :obj:`Dict`.
:obj:`_forward` is the implementation detail and is not meant to be called directly. :obj:`forward` is the preferred
called method as it contains safeguards to make sure everything is working on the expected device. If anything is
linked to a real model it belongs in the :obj:`_forward` method, anything else is in the preprocess/postprocess.
:obj:`postprocess` methods will take the output of :obj:`_forward` and turn it into the final output that were decided
earlier.
:obj:`_sanitize_parameters` exists to allow users to pass any parameters whenever they wish, be it at initialization
time ``pipeline(...., maybe_arg=4)`` or at call time ``pipe = pipeline(...); output = pipe(...., maybe_arg=4)``.
The returns of :obj:`_sanitize_parameters` are the 3 dicts of kwargs that will be passed directly to :obj:`preprocess`,
:obj:`_forward` and :obj:`postprocess`. Don't fill anything if the caller didn't call with any extra parameter. That
allows to keep the default arguments in the function definition which is always more "natural".
A classic example would be a :obj:`top_k` argument in the post processing in classification tasks.
.. code-block::
>>> pipe = pipeline("my-new-task")
>>> pipe("This is a test")
[{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}, {"label": "3-star", "score": 0.05}
{"label": "4-star", "score": 0.025}, {"label": "5-star", "score": 0.025}]
>>> pipe("This is a test", top_k=2)
[{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}]
In order to achieve that, we'll update our :obj:`postprocess` method with a default parameter to :obj:`5`. and edit
:obj:`_sanitize_parameters` to allow this new parameter.
.. code-block::
def postprocess(self, model_outputs, top_k=5):
best_class = model_outputs["logits"].softmax(-1)
# Add logic to handle top_k
return best_class
def _sanitize_parameters(self, **kwargs):
preprocess_kwargs = {}
if "maybe_arg" in kwargs:
preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
postprocess_kwargs = {}
if "top_k" in kwargs:
preprocess_kwargs["top_k"] = kwargs["top_k"]
return preprocess_kwargs, {}, postprocess_kwargs
Try to keep the inputs/outputs very simple and ideally JSON-serializable as it makes the pipeline usage very easy
without requiring users to understand new kind of objects. It's also relatively common to support many different types
of arguments for ease of use (audio files, can be filenames, URLs or pure bytes)
Adding it to the list of supported tasks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Go to ``src/transformers/pipelines/__init__.py`` and fill in :obj:`SUPPORTED_TASKS` with your newly created pipeline.
If possible it should provide a default model.
Adding tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Create a new file ``tests/test_pipelines_MY_PIPELINE.py`` with example with the other tests.
The :obj:`run_pipeline_test` function will be very generic and run on small random models on every possible
architecture as defined by :obj:`model_mapping` and :obj:`tf_model_mapping`.
This is very important to test future compatibility, meaning if someone adds a new model for
:obj:`XXXForQuestionAnswering` then the pipeline test will attempt to run on it. Because the models are random it's
impossible to check for actual values, that's why There is a helper :obj:`ANY` that will simply attempt to match the
output of the pipeline TYPE.
You also *need* to implement 2 (ideally 4) tests.
- :obj:`test_small_model_pt` : Define 1 small model for this pipeline (doesn't matter if the results don't make sense)
and test the pipeline outputs. The results should be the same as :obj:`test_small_model_tf`.
- :obj:`test_small_model_tf` : Define 1 small model for this pipeline (doesn't matter if the results don't make sense)
and test the pipeline outputs. The results should be the same as :obj:`test_small_model_pt`.
- :obj:`test_large_model_pt` (:obj:`optional`): Tests the pipeline on a real pipeline where the results are supposed to
make sense. These tests are slow and should be marked as such. Here the goal is to showcase the pipeline and to make
sure there is no drift in future releases
- :obj:`test_large_model_tf` (:obj:`optional`): Tests the pipeline on a real pipeline where the results are supposed to
make sense. These tests are slow and should be marked as such. Here the goal is to showcase the pipeline and to make
sure there is no drift in future releases

View File

@ -0,0 +1,104 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Load pretrained instances with an AutoClass
With so many different Transformer architectures, it can be challenging to create one for your checkpoint. As a part of 🤗 Transformers core philosophy to make the library easy, simple and flexible to use, an `AutoClass` automatically infer and load the correct architecture from a given checkpoint. The `from_pretrained` method lets you quickly load a pretrained model for any architecture so you don't have to devote time and resources to train a model from scratch. Producing this type of checkpoint-agnostic code means if your code works for one checkpoint, it will work with another checkpoint - as long as it was trained for a similar task - even if the architecture is different.
<Tip>
Remember, architecture refers to the skeleton of the model and checkpoints are the weights for a given architecture. For example, [BERT](https://huggingface.co/bert-base-uncased) is an architecture, while `bert-base-uncased` is a checkpoint. Model is a general term that can mean either architecture or checkpoint.
</Tip>
In this tutorial, learn to:
* Load a pretrained tokenizer.
* Load a pretrained feature extractor.
* Load a pretrained processor.
* Load a pretrained model.
## AutoTokenizer
Nearly every NLP task begins with a tokenizer. A tokenizer converts your input into a format that can be processed by the model.
Load a tokenizer with [`AutoTokenizer.from_pretrained`]:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
```
Then tokenize your input as shown below:
```py
>>> sequence = "In a hole in the ground there lived a hobbit."
>>> print(tokenizer(sequence))
{'input_ids': [101, 1999, 1037, 4920, 1999, 1996, 2598, 2045, 2973, 1037, 7570, 10322, 4183, 1012, 102],
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
```
## AutoFeatureExtractor
For audio and vision tasks, a feature extractor processes the audio signal or image into the correct input format.
Load a feature extractor with [`AutoFeatureExtractor.from_pretrained`]:
```py
>>> from transformers import AutoFeatureExtractor
>>> feature_extractor = AutoFeatureExtractor.from_pretrained(
... "ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition"
... )
```
## AutoProcessor
Multimodal tasks require a processor that combines two types of preprocessing tools. For example, the [LayoutLMV2](model_doc/layoutlmv2) model requires a feature extractor to handle images and a tokenizer to handle text; a processor combines both of them.
Load a processor with [`AutoProcessor.from_pretrained`]:
```py
>>> from transformers import AutoProcessor
>>> processor = AutoProcessor.from_pretrained("microsoft/layoutlmv2-base-uncased")
```
## AutoModel
Finally, the `AutoModelFor` classes let you load a pretrained model for a given task (see [here](model_doc/auto) for a complete list of available tasks). For example, load a model for sequence classification with [`AutoModelForSequenceClassification.from_pretrained`]:
```py
>>> from transformers import AutoModelForSequenceClassification
>>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
===PT-TF-SPLIT===
>>> from transformers import TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
```
Easily reuse the same checkpoint to load an architecture for a different task:
```py
>>> from transformers import AutoModelForTokenClassification
>>> model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased")
===PT-TF-SPLIT===
>>> from transformers import TFAutoModelForTokenClassification
>>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert-base-uncased")
```
Generally, we recommend using the `AutoTokenizer` class and the `AutoModelFor` class to load pretrained instances of models. This will ensure you load the correct architecture every time. In the next [tutorial](preprocessing), learn how to use your newly loaded tokenizer, feature extractor and processor to preprocess a dataset for fine-tuning.

View File

@ -12,15 +12,22 @@ specific language governing permissions and limitations under the License.
# Benchmarks # Benchmarks
<Tip warning={true}>
Hugging Face's Benchmarking tools are deprecated and it is advised to use external Benchmarking libraries to measure the speed
and memory complexity of Transformer models.
</Tip>
[[open-in-colab]] [[open-in-colab]]
Let's take a look at how 🤗 Transformer models can be benchmarked, best practices, and already available benchmarks. Let's take a look at how 🤗 Transformers models can be benchmarked, best practices, and already available benchmarks.
A notebook explaining in more detail how to benchmark 🤗 Transformer models can be found [here](https://github.com/huggingface/transformers/tree/master/notebooks/05-benchmark.ipynb). A notebook explaining in more detail how to benchmark 🤗 Transformers models can be found [here](https://github.com/huggingface/notebooks/tree/master/examples/benchmark.ipynb).
## How to benchmark 🤗 Transformer models ## How to benchmark 🤗 Transformers models
The classes [`PyTorchBenchmark`] and [`TensorFlowBenchmark`] allow to flexibly benchmark 🤗 Transformer models. The benchmark classes allow us to measure the _peak memory usage_ and _required time_ for both _inference_ and _training_. The classes [`PyTorchBenchmark`] and [`TensorFlowBenchmark`] allow to flexibly benchmark 🤗 Transformers models. The benchmark classes allow us to measure the _peak memory usage_ and _required time_ for both _inference_ and _training_.
<Tip> <Tip>
@ -37,11 +44,12 @@ The benchmark classes [`PyTorchBenchmark`] and [`TensorFlowBenchmark`] expect an
>>> args = PyTorchBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]) >>> args = PyTorchBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
>>> benchmark = PyTorchBenchmark(args) >>> benchmark = PyTorchBenchmark(args)
===PT-TF-SPLIT=== ===PT-TF-SPLIT===
>>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments >>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments
>>> args = TensorFlowBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]) >>> args = TensorFlowBenchmarkArguments(
... models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
... )
>>> benchmark = TensorFlowBenchmark(args) >>> benchmark = TensorFlowBenchmark(args)
``` ```
@ -174,7 +182,9 @@ configurations must be inserted with the benchmark args as follows.
```py ```py
>>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments, BertConfig >>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments, BertConfig
>>> args = PyTorchBenchmarkArguments(models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]) >>> args = PyTorchBenchmarkArguments(
... models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
... )
>>> config_base = BertConfig() >>> config_base = BertConfig()
>>> config_384_hid = BertConfig(hidden_size=384) >>> config_384_hid = BertConfig(hidden_size=384)
>>> config_6_lay = BertConfig(num_hidden_layers=6) >>> config_6_lay = BertConfig(num_hidden_layers=6)
@ -244,7 +254,9 @@ bert-6-lay 8 512 1359
===PT-TF-SPLIT=== ===PT-TF-SPLIT===
>>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments, BertConfig >>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments, BertConfig
>>> args = TensorFlowBenchmarkArguments(models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]) >>> args = TensorFlowBenchmarkArguments(
... models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
... )
>>> config_base = BertConfig() >>> config_base = BertConfig()
>>> config_384_hid = BertConfig(hidden_size=384) >>> config_384_hid = BertConfig(hidden_size=384)
>>> config_6_lay = BertConfig(num_hidden_layers=6) >>> config_6_lay = BertConfig(num_hidden_layers=6)

36
docs/source/bertology.mdx Normal file
View File

@ -0,0 +1,36 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# BERTology
There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT
(that some call "BERTology"). Some good examples of this field are:
- BERT Rediscovers the Classical NLP Pipeline by Ian Tenney, Dipanjan Das, Ellie Pavlick:
https://arxiv.org/abs/1905.05950
- Are Sixteen Heads Really Better than One? by Paul Michel, Omer Levy, Graham Neubig: https://arxiv.org/abs/1905.10650
- What Does BERT Look At? An Analysis of BERT's Attention by Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D.
Manning: https://arxiv.org/abs/1906.04341
In order to help this new field develop, we have included a few additional features in the BERT/GPT/GPT-2 models to
help people access the inner representations, mainly adapted from the great work of Paul Michel
(https://arxiv.org/abs/1905.10650):
- accessing all the hidden-states of BERT/GPT/GPT-2,
- accessing all the attention weights for each head of BERT/GPT/GPT-2,
- retrieving heads output values and gradients to be able to compute head importance score and prune head as explained
in https://arxiv.org/abs/1905.10650.
To help you understand and use these features, we have added a specific example script: [bertology.py](https://github.com/huggingface/transformers/tree/master/examples/research_projects/bertology/run_bertology.py) while extract information and prune a model pre-trained on
GLUE.

View File

@ -1,38 +0,0 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
BERTology
-----------------------------------------------------------------------------------------------------------------------
There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT
(that some call "BERTology"). Some good examples of this field are:
* BERT Rediscovers the Classical NLP Pipeline by Ian Tenney, Dipanjan Das, Ellie Pavlick:
https://arxiv.org/abs/1905.05950
* Are Sixteen Heads Really Better than One? by Paul Michel, Omer Levy, Graham Neubig: https://arxiv.org/abs/1905.10650
* What Does BERT Look At? An Analysis of BERT's Attention by Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D.
Manning: https://arxiv.org/abs/1906.04341
In order to help this new field develop, we have included a few additional features in the BERT/GPT/GPT-2 models to
help people access the inner representations, mainly adapted from the great work of Paul Michel
(https://arxiv.org/abs/1905.10650):
* accessing all the hidden-states of BERT/GPT/GPT-2,
* accessing all the attention weights for each head of BERT/GPT/GPT-2,
* retrieving heads output values and gradients to be able to compute head importance score and prune head as explained
in https://arxiv.org/abs/1905.10650.
To help you understand and use these features, we have added a specific example script: :prefix_link:`bertology.py
<examples/research_projects/bertology/run_bertology.py>` while extract information and prune a model pre-trained on
GLUE.

View File

@ -62,3 +62,4 @@ This page regroups resources around 🤗 Transformers developed by the community
| [Speech Emotion Classification with Wav2Vec2](https://github/m3hrdadfi/soxan/blob/main/notebooks/Emotion_recognition_in_Greek_speech_using_Wav2Vec2.ipynb) | How to leverage a pretrained Wav2Vec2 model for Emotion Classification on the MEGA dataset | [Mehrdad Farahani](https://github.com/m3hrdadfi) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/m3hrdadfi/soxan/blob/main/notebooks/Emotion_recognition_in_Greek_speech_using_Wav2Vec2.ipynb) | | [Speech Emotion Classification with Wav2Vec2](https://github/m3hrdadfi/soxan/blob/main/notebooks/Emotion_recognition_in_Greek_speech_using_Wav2Vec2.ipynb) | How to leverage a pretrained Wav2Vec2 model for Emotion Classification on the MEGA dataset | [Mehrdad Farahani](https://github.com/m3hrdadfi) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/m3hrdadfi/soxan/blob/main/notebooks/Emotion_recognition_in_Greek_speech_using_Wav2Vec2.ipynb) |
| [Detect objects in an image with DETR](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/DETR_minimal_example_(with_DetrFeatureExtractor).ipynb) | How to use a trained *DetrForObjectDetection* model to detect objects in an image and visualize attention | [Niels Rogge](https://github.com/NielsRogge) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/DETR/DETR_minimal_example_(with_DetrFeatureExtractor).ipynb) | | [Detect objects in an image with DETR](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/DETR_minimal_example_(with_DetrFeatureExtractor).ipynb) | How to use a trained *DetrForObjectDetection* model to detect objects in an image and visualize attention | [Niels Rogge](https://github.com/NielsRogge) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/DETR/DETR_minimal_example_(with_DetrFeatureExtractor).ipynb) |
| [Fine-tune DETR on a custom object detection dataset](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb) | How to fine-tune *DetrForObjectDetection* on a custom object detection dataset | [Niels Rogge](https://github.com/NielsRogge) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb) | | [Fine-tune DETR on a custom object detection dataset](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb) | How to fine-tune *DetrForObjectDetection* on a custom object detection dataset | [Niels Rogge](https://github.com/NielsRogge) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb) |
| [Finetune T5 for Named Entity Recognition](https://github.com/ToluClassics/Notebooks/blob/main/T5_Ner_Finetuning.ipynb) | How to fine-tune *T5* on a Named Entity Recognition Task | [Ogundepo Odunayo](https://github.com/ToluClassics) | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1obr78FY_cBmWY5ODViCmzdY6O1KB65Vc?usp=sharing) |

View File

@ -0,0 +1,162 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Converting Tensorflow Checkpoints
A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints to models
that can be loaded using the `from_pretrained` methods of the library.
<Tip>
Since 2.3.0 the conversion script is now part of the transformers CLI (**transformers-cli**) available in any
transformers >= 2.3.0 installation.
The documentation below reflects the **transformers-cli convert** command format.
</Tip>
## BERT
You can convert any TensorFlow checkpoint for BERT (in particular [the pre-trained models released by Google](https://github.com/google-research/bert#pre-trained-models)) in a PyTorch save file by using the
[convert_bert_original_tf_checkpoint_to_pytorch.py](https://github.com/huggingface/transformers/tree/master/src/transformers/models/bert/convert_bert_original_tf_checkpoint_to_pytorch.py) script.
This CLI takes as input a TensorFlow checkpoint (three files starting with `bert_model.ckpt`) and the associated
configuration file (`bert_config.json`), and creates a PyTorch model for this configuration, loads the weights from
the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can
be imported using `from_pretrained()` (see example in [quicktour](quicktour) , [run_glue.py](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification/run_glue.py) ).
You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow
checkpoint (the three files starting with `bert_model.ckpt`) but be sure to keep the configuration file (\
`bert_config.json`) and the vocabulary file (`vocab.txt`) as these are needed for the PyTorch model too.
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (`pip install tensorflow`). The rest of the repository only requires PyTorch.
Here is an example of the conversion process for a pre-trained `BERT-Base Uncased` model:
```bash
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
transformers-cli convert --model_type bert \
--tf_checkpoint $BERT_BASE_DIR/bert_model.ckpt \
--config $BERT_BASE_DIR/bert_config.json \
--pytorch_dump_output $BERT_BASE_DIR/pytorch_model.bin
```
You can download Google's pre-trained models for the conversion [here](https://github.com/google-research/bert#pre-trained-models).
## ALBERT
Convert TensorFlow model checkpoints of ALBERT to PyTorch using the
[convert_albert_original_tf_checkpoint_to_pytorch.py](https://github.com/huggingface/transformers/tree/master/src/transformers/models/albert/convert_albert_original_tf_checkpoint_to_pytorch.py) script.
The CLI takes as input a TensorFlow checkpoint (three files starting with `model.ckpt-best`) and the accompanying
configuration file (`albert_config.json`), then creates and saves a PyTorch model. To run this conversion you will
need to have TensorFlow and PyTorch installed.
Here is an example of the conversion process for the pre-trained `ALBERT Base` model:
```bash
export ALBERT_BASE_DIR=/path/to/albert/albert_base
transformers-cli convert --model_type albert \
--tf_checkpoint $ALBERT_BASE_DIR/model.ckpt-best \
--config $ALBERT_BASE_DIR/albert_config.json \
--pytorch_dump_output $ALBERT_BASE_DIR/pytorch_model.bin
```
You can download Google's pre-trained models for the conversion [here](https://github.com/google-research/albert#pre-trained-models).
## OpenAI GPT
Here is an example of the conversion process for a pre-trained OpenAI GPT model, assuming that your NumPy checkpoint
save as the same format than OpenAI pretrained model (see [here](https://github.com/openai/finetune-transformer-lm)\
)
```bash
export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights
transformers-cli convert --model_type gpt \
--tf_checkpoint $OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT_CONFIG] \
[--finetuning_task_name OPENAI_GPT_FINETUNED_TASK] \
```
## OpenAI GPT-2
Here is an example of the conversion process for a pre-trained OpenAI GPT-2 model (see [here](https://github.com/openai/gpt-2))
```bash
export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/gpt2/pretrained/weights
transformers-cli convert --model_type gpt2 \
--tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT2_CONFIG] \
[--finetuning_task_name OPENAI_GPT2_FINETUNED_TASK]
```
## Transformer-XL
Here is an example of the conversion process for a pre-trained Transformer-XL model (see [here](https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models))
```bash
export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint
transformers-cli convert --model_type transfo_xl \
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config TRANSFO_XL_CONFIG] \
[--finetuning_task_name TRANSFO_XL_FINETUNED_TASK]
```
## XLNet
Here is an example of the conversion process for a pre-trained XLNet model:
```bash
export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint
export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config
transformers-cli convert --model_type xlnet \
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_PATH \
--config $TRANSFO_XL_CONFIG_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--finetuning_task_name XLNET_FINETUNED_TASK] \
```
## XLM
Here is an example of the conversion process for a pre-trained XLM model:
```bash
export XLM_CHECKPOINT_PATH=/path/to/xlm/checkpoint
transformers-cli convert --model_type xlm \
--tf_checkpoint $XLM_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT
[--config XML_CONFIG] \
[--finetuning_task_name XML_FINETUNED_TASK]
```
## T5
Here is an example of the conversion process for a pre-trained T5 model:
```bash
export T5=/path/to/t5/uncased_L-12_H-768_A-12
transformers-cli convert --model_type t5 \
--tf_checkpoint $T5/t5_model.ckpt \
--config $T5/t5_config.json \
--pytorch_dump_output $T5/pytorch_model.bin
```

View File

@ -1,181 +0,0 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Converting Tensorflow Checkpoints
=======================================================================================================================
A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints to models
that can be loaded using the ``from_pretrained`` methods of the library.
.. note::
Since 2.3.0 the conversion script is now part of the transformers CLI (**transformers-cli**) available in any
transformers >= 2.3.0 installation.
The documentation below reflects the **transformers-cli convert** command format.
BERT
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can convert any TensorFlow checkpoint for BERT (in particular `the pre-trained models released by Google
<https://github.com/google-research/bert#pre-trained-models>`_) in a PyTorch save file by using the
:prefix_link:`convert_bert_original_tf_checkpoint_to_pytorch.py
<src/transformers/models/bert/convert_bert_original_tf_checkpoint_to_pytorch.py>` script.
This CLI takes as input a TensorFlow checkpoint (three files starting with ``bert_model.ckpt``) and the associated
configuration file (``bert_config.json``), and creates a PyTorch model for this configuration, loads the weights from
the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can
be imported using ``from_pretrained()`` (see example in :doc:`quicktour` , :prefix_link:`run_glue.py
<examples/pytorch/text-classification/run_glue.py>` ).
You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow
checkpoint (the three files starting with ``bert_model.ckpt``) but be sure to keep the configuration file (\
``bert_config.json``) and the vocabulary file (``vocab.txt``) as these are needed for the PyTorch model too.
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (``pip install
tensorflow``). The rest of the repository only requires PyTorch.
Here is an example of the conversion process for a pre-trained ``BERT-Base Uncased`` model:
.. code-block:: shell
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
transformers-cli convert --model_type bert \
--tf_checkpoint $BERT_BASE_DIR/bert_model.ckpt \
--config $BERT_BASE_DIR/bert_config.json \
--pytorch_dump_output $BERT_BASE_DIR/pytorch_model.bin
You can download Google's pre-trained models for the conversion `here
<https://github.com/google-research/bert#pre-trained-models>`__.
ALBERT
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Convert TensorFlow model checkpoints of ALBERT to PyTorch using the
:prefix_link:`convert_albert_original_tf_checkpoint_to_pytorch.py
<src/transformers/models/albert/convert_albert_original_tf_checkpoint_to_pytorch.py>` script.
The CLI takes as input a TensorFlow checkpoint (three files starting with ``model.ckpt-best``) and the accompanying
configuration file (``albert_config.json``), then creates and saves a PyTorch model. To run this conversion you will
need to have TensorFlow and PyTorch installed.
Here is an example of the conversion process for the pre-trained ``ALBERT Base`` model:
.. code-block:: shell
export ALBERT_BASE_DIR=/path/to/albert/albert_base
transformers-cli convert --model_type albert \
--tf_checkpoint $ALBERT_BASE_DIR/model.ckpt-best \
--config $ALBERT_BASE_DIR/albert_config.json \
--pytorch_dump_output $ALBERT_BASE_DIR/pytorch_model.bin
You can download Google's pre-trained models for the conversion `here
<https://github.com/google-research/albert#pre-trained-models>`__.
OpenAI GPT
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained OpenAI GPT model, assuming that your NumPy checkpoint
save as the same format than OpenAI pretrained model (see `here <https://github.com/openai/finetune-transformer-lm>`__\
)
.. code-block:: shell
export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights
transformers-cli convert --model_type gpt \
--tf_checkpoint $OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT_CONFIG] \
[--finetuning_task_name OPENAI_GPT_FINETUNED_TASK] \
OpenAI GPT-2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained OpenAI GPT-2 model (see `here
<https://github.com/openai/gpt-2>`__)
.. code-block:: shell
export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/gpt2/pretrained/weights
transformers-cli convert --model_type gpt2 \
--tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT2_CONFIG] \
[--finetuning_task_name OPENAI_GPT2_FINETUNED_TASK]
Transformer-XL
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained Transformer-XL model (see `here
<https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models>`__)
.. code-block:: shell
export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint
transformers-cli convert --model_type transfo_xl \
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config TRANSFO_XL_CONFIG] \
[--finetuning_task_name TRANSFO_XL_FINETUNED_TASK]
XLNet
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained XLNet model:
.. code-block:: shell
export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint
export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config
transformers-cli convert --model_type xlnet \
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_PATH \
--config $TRANSFO_XL_CONFIG_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--finetuning_task_name XLNET_FINETUNED_TASK] \
XLM
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained XLM model:
.. code-block:: shell
export XLM_CHECKPOINT_PATH=/path/to/xlm/checkpoint
transformers-cli convert --model_type xlm \
--tf_checkpoint $XLM_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT
[--config XML_CONFIG] \
[--finetuning_task_name XML_FINETUNED_TASK]
T5
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained T5 model:
.. code-block:: shell
export T5=/path/to/t5/uncased_L-12_H-768_A-12
transformers-cli convert --model_type t5 \
--tf_checkpoint $T5/t5_model.ckpt \
--config $T5/t5_config.json \
--pytorch_dump_output $T5/pytorch_model.bin

View File

@ -0,0 +1,323 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Create a custom model
An [`AutoClass`](model_doc/auto) automatically infers the model architecture and downloads pretrained configuration and weights. Generally, we recommend using an `AutoClass` to produce checkpoint-agnostic code. But users who want more control over specific model parameters can create a custom 🤗 Transformers model from just a few base classes. This could be particularly useful for anyone who is interested in studying, training or experimenting with a 🤗 Transformers model. In this guide, dive deeper into creating a custom model without an `AutoClass`. Learn how to:
- Load and customize a model configuration.
- Create a model architecture.
- Create a slow and fast tokenizer for text.
- Create a feature extractor for audio or image tasks.
- Create a processor for multimodal tasks.
## Configuration
A [configuration](main_classes/configuration) refers to a model's specific attributes. Each model configuration has different attributes; for instance, all NLP models have the `hidden_size`, `num_attention_heads`, `num_hidden_layers` and `vocab_size` attributes in common. These attributes specify the number of attention heads or hidden layers to construct a model with.
Get a closer look at [DistilBERT](model_doc/distilbert) by accessing [`DistilBertConfig`] to inspect it's attributes:
```py
>>> from transformers import DistilBertConfig
>>> config = DistilBertConfig()
>>> print(config)
DistilBertConfig {
"activation": "gelu",
"attention_dropout": 0.1,
"dim": 768,
"dropout": 0.1,
"hidden_dim": 3072,
"initializer_range": 0.02,
"max_position_embeddings": 512,
"model_type": "distilbert",
"n_heads": 12,
"n_layers": 6,
"pad_token_id": 0,
"qa_dropout": 0.1,
"seq_classif_dropout": 0.2,
"sinusoidal_pos_embds": false,
"transformers_version": "4.16.2",
"vocab_size": 30522
}
```
[`DistilBertConfig`] displays all the default attributes used to build a base [`DistilBertModel`]. All attributes are customizable, creating space for experimentation. For example, you can customize a default model to:
- Try a different activation function with the `activation` parameter.
- Use a higher dropout ratio for the attention probabilities with the `attention_dropout` parameter.
```py
>>> my_config = DistilBertConfig(activation="relu", attention_dropout=0.4)
>>> print(my_config)
DistilBertConfig {
"activation": "relu",
"attention_dropout": 0.4,
"dim": 768,
"dropout": 0.1,
"hidden_dim": 3072,
"initializer_range": 0.02,
"max_position_embeddings": 512,
"model_type": "distilbert",
"n_heads": 12,
"n_layers": 6,
"pad_token_id": 0,
"qa_dropout": 0.1,
"seq_classif_dropout": 0.2,
"sinusoidal_pos_embds": false,
"transformers_version": "4.16.2",
"vocab_size": 30522
}
```
Pretrained model attributes can be modified in the [`~PretrainedConfig.from_pretrained`] function:
```py
>>> my_config = DistilBertConfig.from_pretrained("distilbert-base-uncased", activation="relu", attention_dropout=0.4)
```
Once you are satisfied with your model configuration, you can save it with [`~PretrainedConfig.save_pretrained`]. Your configuration file is stored as a JSON file in the specified save directory:
```py
>>> my_config.save_pretrained(save_directory="./your_model_save_path")
```
To reuse the configuration file, load it with [`~PretrainedConfig.from_pretrained`]:
```py
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/my_config.json")
```
<Tip>
You can also save your configuration file as a dictionary or even just the difference between your custom configuration attributes and the default configuration attributes! See the [configuration](main_classes/configuration) documentation for more details.
</Tip>
## Model
The next step is to create a [model](main_classes/models). The model - also loosely referred to as the architecture - defines what each layer is doing and what operations are happening. Attributes like `num_hidden_layers` from the configuration are used to define the architecture. Every model shares the base class [`PreTrainedModel`] and a few common methods like resizing input embeddings and pruning self-attention heads. In addition, all models are also either a [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html), [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) or [`flax.linen.Module`](https://flax.readthedocs.io/en/latest/flax.linen.html#module) subclass. This means models are compatible with each of their respective framework's usage.
Load your custom configuration attributes into the model:
```py
>>> from transformers import DistilBertModel
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/my_config.json")
>>> model = DistilBertModel(my_config)
===PT-TF-SPLIT===
>>> from transformers import TFDistilBertModel
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/my_config.json")
>>> tf_model = TFDistilBertModel(my_config)
```
This creates a model with random values instead of pretrained weights. You won't be able to use this model for anything useful yet until you train it. Training is a costly and time-consuming process. It is generally better to use a pretrained model to obtain better results faster, while using only a fraction of the resources required for training.
Create a pretrained model with [`~PreTrainedModel.from_pretrained`]:
```py
>>> model = DistilBertModel.from_pretrained("distilbert-base-uncased")
===PT-TF-SPLIT===
>>> tf_model = TFDistilBertModel.from_pretrained("distilbert-base-uncased")
```
When you load pretrained weights, the default model configuration is automatically loaded if the model is provided by 🤗 Transformers. However, you can still replace - some or all of - the default model configuration attributes with your own if you'd like:
```py
>>> model = DistilBertModel.from_pretrained("distilbert-base-uncased", config=my_config)
===PT-TF-SPLIT===
>>> tf_model = TFDistilBertModel.from_pretrained("distilbert-base-uncased", config=my_config)
```
### Model heads
At this point, you have a base DistilBERT model which outputs the *hidden states*. The hidden states are passed as inputs to a model head to produce the final output. 🤗 Transformers provides a different model head for each task as long as a model supports the task (i.e., you can't use DistilBERT for a sequence-to-sequence task like translation).
For example, [`DistilBertForSequenceClassification`] is a base DistilBERT model with a sequence classification head. The sequence classification head is a linear layer on top of the pooled outputs.
```py
>>> from transformers import DistilBertForSequenceClassification
>>> model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
===PT-TF-SPLIT===
>>> from transformers import TFDistilBertForSequenceClassification
>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
```
Easily reuse this checkpoint for another task by switching to a different model head. For a question answering task, you would use the [`DistilBertForQuestionAnswering`] model head. The question answering head is similar to the sequence classification head except it is a linear layer on top of the hidden states output.
```py
>>> from transformers import DistilBertForQuestionAnswering
>>> model = DistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
===PT-TF-SPLIT===
>>> from transformers import TFDistilBertForQuestionAnswering
>>> tf_model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
```
## Tokenizer
The last base class you need before using a model for textual data is a [tokenizer](main_classes/tokenizer) to convert raw text to tensors. There are two types of tokenizers you can use with 🤗 Transformers:
- [`PreTrainedTokenizer`]: a Python implementation of a tokenizer.
- [`PreTrainedTokenizerFast`]: a tokenizer from our Rust-based [🤗 Tokenizer](https://huggingface.co/docs/tokenizers/python/latest/) library. This tokenizer type is significantly faster - especially during batch tokenization - due to it's Rust implementation. The fast tokenizer also offers additional methods like *offset mapping* which maps tokens to their original words or characters.
Both tokenizers support common methods such as encoding and decoding, adding new tokens, and managing special tokens.
<Tip warning={true}>
Not every model supports a fast tokenizer. Take a look at this [table](index#supported-frameworks) to check if a model has fast tokenizer support.
</Tip>
If you trained your own tokenizer, you can create one from your *vocabulary* file:
```py
>>> from transformers import DistilBertTokenizer
>>> my_tokenizer = DistilBertTokenizer(vocab_file="my_vocab_file.txt", do_lower_case=False, padding_side="left")
```
It is important to remember the vocabulary from a custom tokenizer will be different from the vocabulary generated by a pretrained model's tokenizer. You need to use a pretrained model's vocabulary if you are using a pretrained model, otherwise the inputs won't make sense. Create a tokenizer with a pretrained model's vocabulary with the [`DistilBertTokenizer`] class:
```py
>>> from transformers import DistilBertTokenizer
>>> slow_tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
```
Create a fast tokenizer with the [`DistilBertTokenizerFast`] class:
```py
>>> from transformers import DistilBertTokenizerFast
>>> fast_tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert-base-uncased")
```
<Tip>
By default, [`AutoTokenizer`] will try to load a fast tokenizer. You can disable this behavior by setting `use_fast=False` in `from_pretrained`.
</Tip>
## Feature Extractor
A feature extractor processes audio or image inputs. It inherits from the base [`~feature_extraction_utils.FeatureExtractionMixin`] class, and may also inherit from the [`ImageFeatureExtractionMixin`] class for processing image features or the [`SequenceFeatureExtractor`] class for processing audio inputs.
Depending on whether you are working on an audio or vision task, create a feature extractor associated with the model you're using. For example, create a default [`ViTFeatureExtractor`] if you are using [ViT](model_doc/vit) for image classification:
```py
>>> from transformers import ViTFeatureExtractor
>>> vit_extractor = ViTFeatureExtractor()
>>> print(vit_extractor)
ViTFeatureExtractor {
"do_normalize": true,
"do_resize": true,
"feature_extractor_type": "ViTFeatureExtractor",
"image_mean": [
0.5,
0.5,
0.5
],
"image_std": [
0.5,
0.5,
0.5
],
"resample": 2,
"size": 224
}
```
<Tip>
If you aren't looking for any customization, just use the `from_pretrained` method to load a model's default feature extractor parameters.
</Tip>
Modify any of the [`ViTFeatureExtractor`] parameters to create your custom feature extractor:
```py
>>> from transformers import ViTFeatureExtractor
>>> my_vit_extractor = ViTFeatureExtractor(resample="PIL.Image.BOX", do_normalize=False, image_mean=[0.3, 0.3, 0.3])
>>> print(my_vit_extractor)
ViTFeatureExtractor {
"do_normalize": false,
"do_resize": true,
"feature_extractor_type": "ViTFeatureExtractor",
"image_mean": [
0.3,
0.3,
0.3
],
"image_std": [
0.5,
0.5,
0.5
],
"resample": "PIL.Image.BOX",
"size": 224
}
```
For audio inputs, you can create a [`Wav2Vec2FeatureExtractor`] and customize the parameters in a similar way:
```py
>>> from transformers import Wav2Vec2FeatureExtractor
>>> w2v2_extractor = Wav2Vec2FeatureExtractor()
>>> print(w2v2_extractor)
Wav2Vec2FeatureExtractor {
"do_normalize": true,
"feature_extractor_type": "Wav2Vec2FeatureExtractor",
"feature_size": 1,
"padding_side": "right",
"padding_value": 0.0,
"return_attention_mask": false,
"sampling_rate": 16000
}
```
## Processor
For models that support multimodal tasks, 🤗 Transformers offers a processor class that conveniently wraps a feature extractor and tokenizer into a single object. For example, let's use the [`Wav2Vec2Processor`] for an automatic speech recognition task (ASR). ASR transcribes audio to text, so you will need a feature extractor and a tokenizer.
Create a feature extractor to handle the audio inputs:
```py
>>> from transformers import Wav2Vec2FeatureExtractor
>>> feature_extractor = Wav2Vec2FeatureExtractor(padding_value=1.0, do_normalize=True)
```
Create a tokenizer to handle the text inputs:
```py
>>> from transformers import Wav2Vec2CTCTokenizer
>>> tokenizer = Wav2Vec2CTCTokenizer(vocab_file="my_vocab_file.txt")
```
Combine the feature extractor and tokenizer in [`Wav2Vec2Processor`]:
```py
>>> from transformers import Wav2Vec2Processor
>>> processor = Wav2Vec2Processor(feature_extractor=feature_extractor, tokenizer=tokenizer)
```
With two basic classes - configuration and model - and an additional preprocessing class (tokenizer, feature extractor, or processor), you can create any of the models supported by 🤗 Transformers. Each of these base classes are configurable, allowing you to use the specific attributes you want. You can easily setup a model for training or modify an existing pretrained model to fine-tune.

View File

@ -54,6 +54,7 @@ The 🤗 Datasets library makes it simple to load a dataset:
```python ```python
from datasets import load_dataset from datasets import load_dataset
imdb = load_dataset("imdb") imdb = load_dataset("imdb")
``` ```
@ -61,8 +62,9 @@ This loads a `DatasetDict` object which you can index into to view an example:
```python ```python
imdb["train"][0] imdb["train"][0]
{'label': 1, {
'text': 'Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as "Teachers". My 35 years in the teaching profession lead me to believe that Bromwell High\'s satire is much closer to reality than is "Teachers". The scramble to survive financially, the insightful students who can see right through their pathetic teachers\' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I\'m here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn\'t!' "label": 1,
"text": "Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as \"Teachers\". My 35 years in the teaching profession lead me to believe that Bromwell High's satire is much closer to reality than is \"Teachers\". The scramble to survive financially, the insightful students who can see right through their pathetic teachers' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I'm here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn't!",
} }
``` ```
@ -74,6 +76,7 @@ model was trained with to ensure appropriately tokenized words. Load the DistilB
```python ```python
from transformers import AutoTokenizer from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
``` ```
@ -99,6 +102,7 @@ batch. This is known as **dynamic padding**. You can do this with the `DataColla
```python ```python
from transformers import DataCollatorWithPadding from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer) data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
``` ```
@ -108,6 +112,7 @@ Now load your model with the [`AutoModelForSequenceClassification`] class along
```python ```python
from transformers import AutoModelForSequenceClassification from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2) model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
``` ```
@ -121,7 +126,7 @@ At this point, only three steps remain:
from transformers import TrainingArguments, Trainer from transformers import TrainingArguments, Trainer
training_args = TrainingArguments( training_args = TrainingArguments(
output_dir='./results', output_dir="./results",
learning_rate=2e-5, learning_rate=2e-5,
per_device_train_batch_size=16, per_device_train_batch_size=16,
per_device_eval_batch_size=16, per_device_eval_batch_size=16,
@ -150,6 +155,7 @@ Make sure you set `return_tensors="tf"` to return `tf.Tensor` outputs instead of
```python ```python
from transformers import DataCollatorWithPadding from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer, return_tensors="tf") data_collator = DataCollatorWithPadding(tokenizer, return_tensors="tf")
``` ```
@ -158,14 +164,14 @@ Next, convert your datasets to the `tf.data.Dataset` format with `to_tf_dataset`
```python ```python
tf_train_dataset = tokenized_imdb["train"].to_tf_dataset( tf_train_dataset = tokenized_imdb["train"].to_tf_dataset(
columns=['attention_mask', 'input_ids', 'label'], columns=["attention_mask", "input_ids", "label"],
shuffle=True, shuffle=True,
batch_size=16, batch_size=16,
collate_fn=data_collator, collate_fn=data_collator,
) )
tf_validation_dataset = tokenized_imdb["train"].to_tf_dataset( tf_validation_dataset = tokenized_imdb["train"].to_tf_dataset(
columns=['attention_mask', 'input_ids', 'label'], columns=["attention_mask", "input_ids", "label"],
shuffle=False, shuffle=False,
batch_size=16, batch_size=16,
collate_fn=data_collator, collate_fn=data_collator,
@ -182,17 +188,14 @@ batch_size = 16
num_epochs = 5 num_epochs = 5
batches_per_epoch = len(tokenized_imdb["train"]) // batch_size batches_per_epoch = len(tokenized_imdb["train"]) // batch_size
total_train_steps = int(batches_per_epoch * num_epochs) total_train_steps = int(batches_per_epoch * num_epochs)
optimizer, schedule = create_optimizer( optimizer, schedule = create_optimizer(init_lr=2e-5, num_warmup_steps=0, num_train_steps=total_train_steps)
init_lr=2e-5,
num_warmup_steps=0,
num_train_steps=total_train_steps
)
``` ```
Load your model with the [`TFAutoModelForSequenceClassification`] class along with the number of expected labels: Load your model with the [`TFAutoModelForSequenceClassification`] class along with the number of expected labels:
```python ```python
from transformers import TFAutoModelForSequenceClassification from transformers import TFAutoModelForSequenceClassification
model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2) model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
``` ```
@ -200,6 +203,7 @@ Compile the model:
```python ```python
import tensorflow as tf import tensorflow as tf
model.compile(optimizer=optimizer) model.compile(optimizer=optimizer)
``` ```
@ -234,14 +238,15 @@ or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/no
Load the WNUT 17 dataset from the 🤗 Datasets library: Load the WNUT 17 dataset from the 🤗 Datasets library:
```python ```python
from datasets import load_dataset >>> from datasets import load_dataset
wnut = load_dataset("wnut_17")
>>> wnut = load_dataset("wnut_17")
``` ```
A quick look at the dataset shows the labels associated with each word in the sentence: A quick look at the dataset shows the labels associated with each word in the sentence:
```python ```python
wnut["train"][0] >>> wnut["train"][0]
{'id': '0', {'id': '0',
'ner_tags': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 8, 8, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0], 'ner_tags': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 8, 8, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0],
'tokens': ['@paulwalk', 'It', "'s", 'the', 'view', 'from', 'where', 'I', "'m", 'living', 'for', 'two', 'weeks', '.', 'Empire', 'State', 'Building', '=', 'ESB', '.', 'Pretty', 'bad', 'storm', 'here', 'last', 'evening', '.'] 'tokens': ['@paulwalk', 'It', "'s", 'the', 'view', 'from', 'where', 'I', "'m", 'living', 'for', 'two', 'weeks', '.', 'Empire', 'State', 'Building', '=', 'ESB', '.', 'Pretty', 'bad', 'storm', 'here', 'last', 'evening', '.']
@ -251,21 +256,22 @@ wnut["train"][0]
View the specific NER tags by: View the specific NER tags by:
```python ```python
label_list = wnut["train"].features[f"ner_tags"].feature.names >>> label_list = wnut["train"].features[f"ner_tags"].feature.names
label_list >>> label_list
['O', [
'B-corporation', "O",
'I-corporation', "B-corporation",
'B-creative-work', "I-corporation",
'I-creative-work', "B-creative-work",
'B-group', "I-creative-work",
'I-group', "B-group",
'B-location', "I-group",
'I-location', "B-location",
'B-person', "I-location",
'I-person', "B-person",
'B-product', "I-person",
'I-product' "B-product",
"I-product",
] ]
``` ```
@ -282,6 +288,7 @@ Now you need to tokenize the text. Load the DistilBERT tokenizer with an [`AutoT
```python ```python
from transformers import AutoTokenizer from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
``` ```
@ -289,9 +296,9 @@ Since the input has already been split into words, set `is_split_into_words=True
subwords: subwords:
```python ```python
tokenized_input = tokenizer(example["tokens"], is_split_into_words=True) >>> tokenized_input = tokenizer(example["tokens"], is_split_into_words=True)
tokens = tokenizer.convert_ids_to_tokens(tokenized_input["input_ids"]) >>> tokens = tokenizer.convert_ids_to_tokens(tokenized_input["input_ids"])
tokens >>> tokens
['[CLS]', '@', 'paul', '##walk', 'it', "'", 's', 'the', 'view', 'from', 'where', 'i', "'", 'm', 'living', 'for', 'two', 'weeks', '.', 'empire', 'state', 'building', '=', 'es', '##b', '.', 'pretty', 'bad', 'storm', 'here', 'last', 'evening', '.', '[SEP]'] ['[CLS]', '@', 'paul', '##walk', 'it', "'", 's', 'the', 'view', 'from', 'where', 'i', "'", 'm', 'living', 'for', 'two', 'weeks', '.', 'empire', 'state', 'building', '=', 'es', '##b', '.', 'pretty', 'bad', 'storm', 'here', 'last', 'evening', '.', '[SEP]']
``` ```
@ -314,12 +321,14 @@ def tokenize_and_align_labels(examples):
word_ids = tokenized_inputs.word_ids(batch_index=i) # Map tokens to their respective word. word_ids = tokenized_inputs.word_ids(batch_index=i) # Map tokens to their respective word.
previous_word_idx = None previous_word_idx = None
label_ids = [] label_ids = []
for word_idx in word_ids: # Set the special tokens to -100. for word_idx in word_ids: # Set the special tokens to -100.
if word_idx is None: if word_idx is None:
label_ids.append(-100) label_ids.append(-100)
elif word_idx != previous_word_idx: # Only label the first token of a given word. elif word_idx != previous_word_idx: # Only label the first token of a given word.
label_ids.append(label[word_idx]) label_ids.append(label[word_idx])
else:
label_ids.append(-100)
previous_word_idx = word_idx
labels.append(label_ids) labels.append(label_ids)
tokenized_inputs["labels"] = labels tokenized_inputs["labels"] = labels
@ -336,6 +345,7 @@ Finally, pad your text and labels, so they are a uniform length:
```python ```python
from transformers import DataCollatorForTokenClassification from transformers import DataCollatorForTokenClassification
data_collator = DataCollatorForTokenClassification(tokenizer) data_collator = DataCollatorForTokenClassification(tokenizer)
``` ```
@ -345,6 +355,7 @@ Load your model with the [`AutoModelForTokenClassification`] class along with th
```python ```python
from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer
model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=len(label_list)) model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=len(label_list))
``` ```
@ -352,7 +363,7 @@ Gather your training arguments in [`TrainingArguments`]:
```python ```python
training_args = TrainingArguments( training_args = TrainingArguments(
output_dir='./results', output_dir="./results",
evaluation_strategy="epoch", evaluation_strategy="epoch",
learning_rate=2e-5, learning_rate=2e-5,
per_device_train_batch_size=16, per_device_train_batch_size=16,
@ -387,6 +398,7 @@ Batch your examples together and pad your text and labels, so they are a uniform
```python ```python
from transformers import DataCollatorForTokenClassification from transformers import DataCollatorForTokenClassification
data_collator = DataCollatorForTokenClassification(tokenizer, return_tensors="tf") data_collator = DataCollatorForTokenClassification(tokenizer, return_tensors="tf")
``` ```
@ -412,6 +424,7 @@ Load the model with the [`TFAutoModelForTokenClassification`] class along with t
```python ```python
from transformers import TFAutoModelForTokenClassification from transformers import TFAutoModelForTokenClassification
model = TFAutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=len(label_list)) model = TFAutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=len(label_list))
``` ```
@ -435,6 +448,7 @@ Compile the model:
```python ```python
import tensorflow as tf import tensorflow as tf
model.compile(optimizer=optimizer) model.compile(optimizer=optimizer)
``` ```
@ -469,13 +483,14 @@ Load the SQuAD dataset from the 🤗 Datasets library:
```python ```python
from datasets import load_dataset from datasets import load_dataset
squad = load_dataset("squad") squad = load_dataset("squad")
``` ```
Take a look at an example from the dataset: Take a look at an example from the dataset:
```python ```python
squad["train"][0] >>> squad["train"][0]
{'answers': {'answer_start': [515], 'text': ['Saint Bernadette Soubirous']}, {'answers': {'answer_start': [515], 'text': ['Saint Bernadette Soubirous']},
'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.', 'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.',
'id': '5733be284776f41900661182', 'id': '5733be284776f41900661182',
@ -490,6 +505,7 @@ Load the DistilBERT tokenizer with an [`AutoTokenizer`]:
```python ```python
from transformers import AutoTokenizer from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
``` ```
@ -567,6 +583,7 @@ Batch the processed examples together:
```python ```python
from transformers import default_data_collator from transformers import default_data_collator
data_collator = default_data_collator data_collator = default_data_collator
``` ```
@ -576,6 +593,7 @@ Load your model with the [`AutoModelForQuestionAnswering`] class:
```python ```python
from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer
model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased") model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")
``` ```
@ -583,7 +601,7 @@ Gather your training arguments in [`TrainingArguments`]:
```python ```python
training_args = TrainingArguments( training_args = TrainingArguments(
output_dir='./results', output_dir="./results",
evaluation_strategy="epoch", evaluation_strategy="epoch",
learning_rate=2e-5, learning_rate=2e-5,
per_device_train_batch_size=16, per_device_train_batch_size=16,
@ -618,6 +636,7 @@ Batch the processed examples together with a TensorFlow default data collator:
```python ```python
from transformers.data.data_collator import tf_default_collator from transformers.data.data_collator import tf_default_collator
data_collator = tf_default_collator data_collator = tf_default_collator
``` ```
@ -650,8 +669,8 @@ batch_size = 16
num_epochs = 2 num_epochs = 2
total_train_steps = (len(tokenized_squad["train"]) // batch_size) * num_epochs total_train_steps = (len(tokenized_squad["train"]) // batch_size) * num_epochs
optimizer, schedule = create_optimizer( optimizer, schedule = create_optimizer(
init_lr=2e-5, init_lr=2e-5,
num_warmup_steps=0, num_warmup_steps=0,
num_train_steps=total_train_steps, num_train_steps=total_train_steps,
) )
``` ```
@ -660,6 +679,7 @@ Load your model with the [`TFAutoModelForQuestionAnswering`] class:
```python ```python
from transformers import TFAutoModelForQuestionAnswering from transformers import TFAutoModelForQuestionAnswering
model = TFAutoModelForQuestionAnswering("distilbert-base-uncased") model = TFAutoModelForQuestionAnswering("distilbert-base-uncased")
``` ```
@ -667,6 +687,7 @@ Compile the model:
```python ```python
import tensorflow as tf import tensorflow as tf
model.compile(optimizer=optimizer) model.compile(optimizer=optimizer)
``` ```

View File

@ -0,0 +1,349 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Sharing custom models
The 🤗 Transformers library is designed to be easily extensible. Every model is fully coded in a given subfolder
of the repository with no abstraction, so you can easily copy a modeling file and tweak it to your needs.
If you are writing a brand new model, it might be easier to start from scratch. In this tutorial, we will show you
how to write a custom model and its configuration so it can be used inside Transformers, and how you can share it
with the community (with the code it relies on) so that anyone can use it, even if it's not present in the 🤗
Transformers library.
We will illustrate all of this on a ResNet model, by wrapping the ResNet class of the
[timm library](https://github.com/rwightman/pytorch-image-models/tree/master/timm) into a [`PreTrainedModel`].
## Writing a custom configuration
Before we dive into the model, let's first write its configuration. The configuration of a model is an object that
will contain all the necessary information to build the model. As we will see in the next section, the model can only
take a `config` to be initialized, so we really need that object to be as complete as possible.
In our example, we will take a couple of arguments of the ResNet class that we might want to tweak. Different
configurations will then give us the different types of ResNets that are possible. We then just store those arguments,
after checking the validity of a few of them.
```python
from transformers import PretrainedConfig
from typing import List
class ResnetConfig(PretrainedConfig):
model_type = "resnet"
def __init__(
self,
block_type="bottleneck",
layers: List[int] = [3, 4, 6, 3],
num_classes: int = 1000,
input_channels: int = 3,
cardinality: int = 1,
base_width: int = 64,
stem_width: int = 64,
stem_type: str = "",
avg_down: bool = False,
**kwargs,
):
if block_type not in ["basic", "bottleneck"]:
raise ValueError(f"`block` must be 'basic' or bottleneck', got {block}.")
if stem_type not in ["", "deep", "deep-tiered"]:
raise ValueError(f"`stem_type` must be '', 'deep' or 'deep-tiered', got {block}.")
self.block_type = block_type
self.layers = layers
self.num_classes = num_classes
self.input_channels = input_channels
self.cardinality = cardinality
self.base_width = base_width
self.stem_width = stem_width
self.stem_type = stem_type
self.avg_down = avg_down
super().__init__(**kwargs)
```
The three important things to remember when writing you own configuration are the following:
- you have to inherit from `PretrainedConfig`,
- the `__init__` of your `PretrainedConfig` must accept any kwargs,
- those `kwargs` need to be passed to the superclass `__init__`.
The inheritance is to make sure you get all the functionality from the 🤗 Transformers library, while the two other
constraints come from the fact a `PretrainedConfig` has more fields than the ones you are setting. When reloading a
config with the `from_pretrained` method, those fields need to be accepted by your config and then sent to the
superclass.
Defining a `model_type` for your configuration (here `model_type="resnet"`) is not mandatory, unless you want to
register your model with the auto classes (see last section).
With this done, you can easily create and save your configuration like you would do with any other model config of the
library. Here is how we can create a resnet50d config and save it:
```py
resnet50d_config = ResnetConfig(block_type="bottleneck", stem_width=32, stem_type="deep", avg_down=True)
resnet50d_config.save_pretrained("custom-resnet")
```
This will save a file named `config.json` inside the folder `custom-resnet`. You can then reload your config with the
`from_pretrained` method:
```py
resnet50d_config = ResnetConfig.from_pretrained("custom-resnet")
```
You can also use any other method of the [`PretrainedConfig`] class, like [`~PretrainedConfig.push_to_hub`] to
directly upload your config to the Hub.
## Writing a custom model
Now that we have our ResNet configuration, we can go on writing the model. We will actually write two: one that
extracts the hidden features from a batch of images (like [`BertModel`]) and one that is suitable for image
classification (like [`BertModelForSequenceClassification`]).
As we mentioned before, we'll only write a loose wrapper of the model to keep it simple for this example. The only
thing we need to do before writing this class is a map between the block types and actual block classes. Then the
model is defined from the configuration by passing everything to the `ResNet` class:
```py
from transformers import PreTrainedModel
from timm.models.resnet import BasicBlock, Bottleneck, ResNet
from .configuration_resnet import ResnetConfig
BLOCK_MAPPING = {"basic": BasicBlock, "bottleneck": Bottleneck}
class ResnetModel(PreTrainedModel):
config_class = ResnetConfig
def __init__(self, config):
super().__init__(config)
block_layer = BLOCK_MAPPING[config.block_type]
self.model = ResNet(
block_layer,
config.layers,
num_classes=config.num_classes,
in_chans=config.input_channels,
cardinality=config.cardinality,
base_width=config.base_width,
stem_width=config.stem_width,
stem_type=config.stem_type,
avg_down=config.avg_down,
)
def forward(self, tensor):
return self.model.forward_features(tensor)
```
For the model that will classify images, we just change the forward method:
```py
class ResnetModelForImageClassification(PreTrainedModel):
config_class = ResnetConfig
def __init__(self, config):
super().__init__(config)
block_layer = BLOCK_MAPPING[config.block_type]
self.model = ResNet(
block_layer,
config.layers,
num_classes=config.num_classes,
in_chans=config.input_channels,
cardinality=config.cardinality,
base_width=config.base_width,
stem_width=config.stem_width,
stem_type=config.stem_type,
avg_down=config.avg_down,
)
def forward(self, tensor, labels=None):
logits = self.model(tensor)
if labels is not None:
loss = torch.nn.cross_entropy(logits, labels)
return {"loss": loss, "logits": logits}
return {"logits": logits}
```
In both cases, notice how we inherit from `PreTrainedModel` and call the superclass initialization with the `config`
(a bit like when you write a regular `torch.nn.Module`). The line that sets the `config_class` is not mandatory, unless
you want to register your model with the auto classes (see last section).
<Tip>
If your model is very similar to a model inside the library, you can re-use the same configuration as this model.
</Tip>
You can have your model return anything you want, but returning a dictionary like we did for
`ResnetModelForImageClassification`, with the loss included when labels are passed, will make your model directly
usable inside the [`Trainer`] class. Using another output format is fine as long as you are planning on using your own
training loop or another library for training.
Now that we have our model class, let's create one:
```py
resnet50d = ResnetModelForImageClassification(resnet50d_config)
```
Again, you can use any of the methods of [`PreTrainedModel`], like [`~PreTrainedModel.save_pretrained`] or
[`~PreTrainedModel.push_to_hub`]. We will use the second in the next section, and see how to push the model weights
with the code of our model. But first, let's load some pretrained weights inside our model.
In your own use case, you will probably be training your custom model on your own data. To go fast for this tutorial,
we will use the pretrained version of the resnet50d. Since our model is just a wrapper around it, it's going to be
easy to transfer those weights:
```py
import timm
pretrained_model = timm.create_model("resnet50d", pretrained=True)
resnet50d.model.load_state_dict(pretrained_model.state_dict())
```
Now let's see how to make sure that when we do [`~PreTrainedModel.save_pretrained`] or [`~PreTrainedModel.push_to_hub`], the
code of the model is saved.
## Sending the code to the Hub
<Tip warning={true}>
This API is experimental and may have some slight breaking changes in the next releases.
</Tip>
First, make sure your model is fully defined in a `.py` file. It can rely on relative imports to some other files as
long as all the files are in the same directory (we don't support submodules for this feature yet). For our example,
we'll define a `modeling_resnet.py` file and a `configuration_resnet.py` file in a folder of the current working
directory named `resnet_model`. The configuration file contains the code for `ResnetConfig` and the modeling file
contains the code of `ResnetModel` and `ResnetModelForImageClassification`.
```
.
└── resnet_model
├── __init__.py
├── configuration_resnet.py
└── modeling_resnet.py
```
The `__init__.py` can be empty, it's just there so that Python detects `resnet_model` can be use as a module.
<Tip warning={true}>
If copying a modeling files from the library, you will need to replace all the relative imports at the top of the file
to import from the `transformers` package.
</Tip>
Note that you can re-use (or subclass) an existing configuration/model.
To share your model with the community, follow those steps: first import the ResNet model and config from the newly
created files:
```py
from resnet_model.configuration_resnet import ResnetConfig
from resnet_model.modeling_resnet import ResnetModel, ResnetModelForImageClassification
```
Then you have to tell the library you want to copy the code files of those objects when using the `save_pretrained`
method and properly register them with a given Auto class (especially for models), just run:
```py
ResnetConfig.register_for_auto_class()
ResnetModel.register_for_auto_class("AutoModel")
ResnetModelForImageClassification.register_for_auto_class("AutoModelForImageClassification")
```
Note that there is no need to specify an auto class for the configuration (there is only one auto class for them,
[`AutoConfig`]) but it's different for models. Your custom model could be suitable for many different tasks, so you
have to specify which one of the auto classes is the correct one for your model.
Next, let's create the config and models as we did before:
```py
resnet50d_config = ResnetConfig(block_type="bottleneck", stem_width=32, stem_type="deep", avg_down=True)
resnet50d = ResnetModelForImageClassification(resnet50d_config)
pretrained_model = timm.create_model("resnet50d", pretrained=True)
resnet50d.model.load_state_dict(pretrained_model.state_dict())
```
Now to send the model to the Hub, make sure you are logged in. Either run in your terminal:
```bash
huggingface-cli login
```
or from a notebook:
```py
from huggingface_hub import notebook_login
notebook_login()
```
You can then push to to your own namespace (or an organization you are a member of) like this:
```py
resnet50d.push_to_hub("custom-resnet50d")
```
On top of the modeling weights and the configuration in json format, this also copied the modeling and
configuration `.py` files in the folder `custom-resnet50d` and uploaded the result to the Hub. You can check the result
in this [model repo](https://huggingface.co/sgugger/custom-resnet50d).
See the [sharing tutorial](model_sharing) for more information on the push to Hub method.
## Using a model with custom code
You can use any configuration, model or tokenizer with custom code files in its repository with the auto-classes and
the `from_pretrained` method. All files and code uploaded to the Hub are scanned for malware (refer to the [Hub security](https://huggingface.co/docs/hub/security#malware-scanning) documentation for more information), but you should still
review the model code and author to avoid executing malicious code on your machine. Set `trust_remote_code=True` to use
a model with custom code:
```py
from transformers import AutoModelForImageClassification
model = AutoModelForImageClassification.from_pretrained("sgugger/custom-resnet50d", trust_remote_code=True)
```
It is also strongly encouraged to pass a commit hash as a `revision` to make sure the author of the models did not
update the code with some malicious new lines (unless you fully trust the authors of the models).
```py
commit_hash = "ed94a7c6247d8aedce4647f00f20de6875b5b292"
model = AutoModelForImageClassification.from_pretrained(
"sgugger/custom-resnet50d", trust_remote_code=True, revision=commit_hash
)
```
Note that when browsing the commit history of the model repo on the Hub, there is a button to easily copy the commit
hash of any commit.
## Registering a model with custom code to the auto classes
If you are writing a library that extends 🤗 Transformers, you may want to extend the auto classes to include your own
model. This is different from pushing the code to the Hub in the sense that users will need to import your library to
get the custom models (contrarily to automatically downloading the model code from the Hub).
As long as your config has a `model_type` attribute that is different from existing model types, and that your model
classes have the right `config_class` attributes, you can just add them to the auto classes likes this:
```py
from transformers import AutoConfig, AutoModel, AutoModelForImageClassification
AutoConfig.register("resnet", ResnetConfig)
AutoModel.register(ResnetConfig, ResnetModel)
AutoModelForImageClassification.register(ResnetConfig, ResnetModelForImageClassification)
```
Note that the first argument used when registering your custom config to [`AutoConfig`] needs to match the `model_type`
of your custom config, and the first argument used when registering your custom models to any auto model class needs
to match the `config_class` of those models.

335
docs/source/debugging.mdx Normal file
View File

@ -0,0 +1,335 @@
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Debugging
## Multi-GPU Network Issues Debug
When training or inferencing with `DistributedDataParallel` and multiple GPU, if you run into issue of inter-communication between processes and/or nodes, you can use the following script to diagnose network issues.
```bash
wget https://raw.githubusercontent.com/huggingface/transformers/master/scripts/distributed/torch-distributed-gpu-test.py
```
For example to test how 2 GPUs interact do:
```bash
python -m torch.distributed.run --nproc_per_node 2 --nnodes 1 torch-distributed-gpu-test.py
```
If both processes can talk to each and allocate GPU memory each will print an OK status.
For more GPUs or nodes adjust the arguments in the script.
You will find a lot more details inside the diagnostics script and even a recipe to how you could run it in a SLURM environment.
An additional level of debug is to add `NCCL_DEBUG=INFO` environment variable as follows:
```bash
NCCL_DEBUG=INFO python -m torch.distributed.run --nproc_per_node 2 --nnodes 1 torch-distributed-gpu-test.py
```
This will dump a lot of NCCL-related debug information, which you can then search online if you find that some problems are reported. Or if you're not sure how to interpret the output you can share the log file in an Issue.
## Underflow and Overflow Detection
<Tip>
This feature is currently available for PyTorch-only.
</Tip>
<Tip>
For multi-GPU training it requires DDP (`torch.distributed.launch`).
</Tip>
<Tip>
This feature can be used with any `nn.Module`-based model.
</Tip>
If you start getting `loss=NaN` or the model inhibits some other abnormal behavior due to `inf` or `nan` in
activations or weights one needs to discover where the first underflow or overflow happens and what led to it. Luckily
you can accomplish that easily by activating a special module that will do the detection automatically.
If you're using [`Trainer`], you just need to add:
```bash
--debug underflow_overflow
```
to the normal command line arguments, or pass `debug="underflow_overflow"` when creating the
[`TrainingArguments`] object.
If you're using your own training loop or another Trainer you can accomplish the same with:
```python
from .debug_utils import DebugUnderflowOverflow
debug_overflow = DebugUnderflowOverflow(model)
```
[`~debug_utils.DebugUnderflowOverflow`] inserts hooks into the model that immediately after each
forward call will test input and output variables and also the corresponding module's weights. As soon as `inf` or
`nan` is detected in at least one element of the activations or weights, the program will assert and print a report
like this (this was caught with `google/mt5-small` under fp16 mixed precision):
```
Detected inf/nan during batch_number=0
Last 21 forward frames:
abs min abs max metadata
encoder.block.1.layer.1.DenseReluDense.dropout Dropout
0.00e+00 2.57e+02 input[0]
0.00e+00 2.85e+02 output
[...]
encoder.block.2.layer.0 T5LayerSelfAttention
6.78e-04 3.15e+03 input[0]
2.65e-04 3.42e+03 output[0]
None output[1]
2.25e-01 1.00e+04 output[2]
encoder.block.2.layer.1.layer_norm T5LayerNorm
8.69e-02 4.18e-01 weight
2.65e-04 3.42e+03 input[0]
1.79e-06 4.65e+00 output
encoder.block.2.layer.1.DenseReluDense.wi_0 Linear
2.17e-07 4.50e+00 weight
1.79e-06 4.65e+00 input[0]
2.68e-06 3.70e+01 output
encoder.block.2.layer.1.DenseReluDense.wi_1 Linear
8.08e-07 2.66e+01 weight
1.79e-06 4.65e+00 input[0]
1.27e-04 2.37e+02 output
encoder.block.2.layer.1.DenseReluDense.dropout Dropout
0.00e+00 8.76e+03 input[0]
0.00e+00 9.74e+03 output
encoder.block.2.layer.1.DenseReluDense.wo Linear
1.01e-06 6.44e+00 weight
0.00e+00 9.74e+03 input[0]
3.18e-04 6.27e+04 output
encoder.block.2.layer.1.DenseReluDense T5DenseGatedGeluDense
1.79e-06 4.65e+00 input[0]
3.18e-04 6.27e+04 output
encoder.block.2.layer.1.dropout Dropout
3.18e-04 6.27e+04 input[0]
0.00e+00 inf output
```
The example output has been trimmed in the middle for brevity.
The second column shows the value of the absolute largest element, so if you have a closer look at the last few frames,
the inputs and outputs were in the range of `1e4`. So when this training was done under fp16 mixed precision the very
last step overflowed (since under `fp16` the largest number before `inf` is `64e3`). To avoid overflows under
`fp16` the activations must remain way below `1e4`, because `1e4 * 1e4 = 1e8` so any matrix multiplication with
large activations is going to lead to a numerical overflow condition.
At the very start of the trace you can discover at which batch number the problem occurred (here `Detected inf/nan during batch_number=0` means the problem occurred on the first batch).
Each reported frame starts by declaring the fully qualified entry for the corresponding module this frame is reporting
for. If we look just at this frame:
```
encoder.block.2.layer.1.layer_norm T5LayerNorm
8.69e-02 4.18e-01 weight
2.65e-04 3.42e+03 input[0]
1.79e-06 4.65e+00 output
```
Here, `encoder.block.2.layer.1.layer_norm` indicates that it was a layer norm for the first layer, of the second
block of the encoder. And the specific calls of the `forward` is `T5LayerNorm`.
Let's look at the last few frames of that report:
```
Detected inf/nan during batch_number=0
Last 21 forward frames:
abs min abs max metadata
[...]
encoder.block.2.layer.1.DenseReluDense.wi_0 Linear
2.17e-07 4.50e+00 weight
1.79e-06 4.65e+00 input[0]
2.68e-06 3.70e+01 output
encoder.block.2.layer.1.DenseReluDense.wi_1 Linear
8.08e-07 2.66e+01 weight
1.79e-06 4.65e+00 input[0]
1.27e-04 2.37e+02 output
encoder.block.2.layer.1.DenseReluDense.wo Linear
1.01e-06 6.44e+00 weight
0.00e+00 9.74e+03 input[0]
3.18e-04 6.27e+04 output
encoder.block.2.layer.1.DenseReluDense T5DenseGatedGeluDense
1.79e-06 4.65e+00 input[0]
3.18e-04 6.27e+04 output
encoder.block.2.layer.1.dropout Dropout
3.18e-04 6.27e+04 input[0]
0.00e+00 inf output
```
The last frame reports for `Dropout.forward` function with the first entry for the only input and the second for the
only output. You can see that it was called from an attribute `dropout` inside `DenseReluDense` class. We can see
that it happened during the first layer, of the 2nd block, during the very first batch. Finally, the absolute largest
input elements was `6.27e+04` and same for the output was `inf`.
You can see here, that `T5DenseGatedGeluDense.forward` resulted in output activations, whose absolute max value was
around 62.7K, which is very close to fp16's top limit of 64K. In the next frame we have `Dropout` which renormalizes
the weights, after it zeroed some of the elements, which pushes the absolute max value to more than 64K, and we get an
overflow (`inf`).
As you can see it's the previous frames that we need to look into when the numbers start going into very large for fp16
numbers.
Let's match the report to the code from `models/t5/modeling_t5.py`:
```python
class T5DenseGatedGeluDense(nn.Module):
def __init__(self, config):
super().__init__()
self.wi_0 = nn.Linear(config.d_model, config.d_ff, bias=False)
self.wi_1 = nn.Linear(config.d_model, config.d_ff, bias=False)
self.wo = nn.Linear(config.d_ff, config.d_model, bias=False)
self.dropout = nn.Dropout(config.dropout_rate)
self.gelu_act = ACT2FN["gelu_new"]
def forward(self, hidden_states):
hidden_gelu = self.gelu_act(self.wi_0(hidden_states))
hidden_linear = self.wi_1(hidden_states)
hidden_states = hidden_gelu * hidden_linear
hidden_states = self.dropout(hidden_states)
hidden_states = self.wo(hidden_states)
return hidden_states
```
Now it's easy to see the `dropout` call, and all the previous calls as well.
Since the detection is happening in a forward hook, these reports are printed immediately after each `forward`
returns.
Going back to the full report, to act on it and to fix the problem, we need to go a few frames up where the numbers
started to go up and most likely switch to the `fp32` mode here, so that the numbers don't overflow when multiplied
or summed up. Of course, there might be other solutions. For example, we could turn off `amp` temporarily if it's
enabled, after moving the original `forward` into a helper wrapper, like so:
```python
def _forward(self, hidden_states):
hidden_gelu = self.gelu_act(self.wi_0(hidden_states))
hidden_linear = self.wi_1(hidden_states)
hidden_states = hidden_gelu * hidden_linear
hidden_states = self.dropout(hidden_states)
hidden_states = self.wo(hidden_states)
return hidden_states
import torch
def forward(self, hidden_states):
if torch.is_autocast_enabled():
with torch.cuda.amp.autocast(enabled=False):
return self._forward(hidden_states)
else:
return self._forward(hidden_states)
```
Since the automatic detector only reports on inputs and outputs of full frames, once you know where to look, you may
want to analyse the intermediary stages of any specific `forward` function as well. In such a case you can use the
`detect_overflow` helper function to inject the detector where you want it, for example:
```python
from debug_utils import detect_overflow
class T5LayerFF(nn.Module):
[...]
def forward(self, hidden_states):
forwarded_states = self.layer_norm(hidden_states)
detect_overflow(forwarded_states, "after layer_norm")
forwarded_states = self.DenseReluDense(forwarded_states)
detect_overflow(forwarded_states, "after DenseReluDense")
return hidden_states + self.dropout(forwarded_states)
```
You can see that we added 2 of these and now we track if `inf` or `nan` for `forwarded_states` was detected
somewhere in between.
Actually, the detector already reports these because each of the calls in the example above is a `nn.Module`, but
let's say if you had some local direct calculations this is how you'd do that.
Additionally, if you're instantiating the debugger in your own code, you can adjust the number of frames printed from
its default, e.g.:
```python
from .debug_utils import DebugUnderflowOverflow
debug_overflow = DebugUnderflowOverflow(model, max_frames_to_save=100)
```
### Specific batch absolute mix and max value tracing
The same debugging class can be used for per-batch tracing with the underflow/overflow detection feature turned off.
Let's say you want to watch the absolute min and max values for all the ingredients of each `forward` call of a given
batch, and only do that for batches 1 and 3. Then you instantiate this class as:
```python
debug_overflow = DebugUnderflowOverflow(model, trace_batch_nums=[1, 3])
```
And now full batches 1 and 3 will be traced using the same format as the underflow/overflow detector does.
Batches are 0-indexed.
This is helpful if you know that the program starts misbehaving after a certain batch number, so you can fast-forward
right to that area. Here is a sample truncated output for such configuration:
```
*** Starting batch number=1 ***
abs min abs max metadata
shared Embedding
1.01e-06 7.92e+02 weight
0.00e+00 2.47e+04 input[0]
5.36e-05 7.92e+02 output
[...]
decoder.dropout Dropout
1.60e-07 2.27e+01 input[0]
0.00e+00 2.52e+01 output
decoder T5Stack
not a tensor output
lm_head Linear
1.01e-06 7.92e+02 weight
0.00e+00 1.11e+00 input[0]
6.06e-02 8.39e+01 output
T5ForConditionalGeneration
not a tensor output
*** Starting batch number=3 ***
abs min abs max metadata
shared Embedding
1.01e-06 7.92e+02 weight
0.00e+00 2.78e+04 input[0]
5.36e-05 7.92e+02 output
[...]
```
Here you will get a huge number of frames dumped - as many as there were forward calls in your model, so it may or may
not what you want, but sometimes it can be easier to use for debugging purposes than a normal debugger. For example, if
a problem starts happening at batch number 150. So you can dump traces for batches 149 and 150 and compare where
numbers started to diverge.
You can also specify the batch number after which to stop the training, with:
```python
debug_overflow = DebugUnderflowOverflow(model, trace_batch_nums=[1, 3], abort_after_batch_num=3)
```

View File

@ -1,299 +0,0 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Debugging
=======================================================================================================================
Underflow and Overflow Detection
-----------------------------------------------------------------------------------------------------------------------
.. note::
This feature is currently available for PyTorch-only.
.. note::
For multi-GPU training it requires DDP (``torch.distributed.launch``).
.. note::
This feature can be used with any ``nn.Module``-based model.
If you start getting ``loss=NaN`` or the model inhibits some other abnormal behavior due to ``inf`` or ``nan`` in
activations or weights one needs to discover where the first underflow or overflow happens and what led to it. Luckily
you can accomplish that easily by activating a special module that will do the detection automatically.
If you're using :class:`~transformers.Trainer`, you just need to add:
.. code-block:: bash
--debug underflow_overflow
to the normal command line arguments, or pass ``debug="underflow_overflow"`` when creating the
:class:`~transformers.TrainingArguments` object.
If you're using your own training loop or another Trainer you can accomplish the same with:
.. code-block:: python
from .debug_utils import DebugUnderflowOverflow
debug_overflow = DebugUnderflowOverflow(model)
:class:`~transformers.debug_utils.DebugUnderflowOverflow` inserts hooks into the model that immediately after each
forward call will test input and output variables and also the corresponding module's weights. As soon as ``inf`` or
``nan`` is detected in at least one element of the activations or weights, the program will assert and print a report
like this (this was caught with ``google/mt5-small`` under fp16 mixed precision):
.. code-block::
Detected inf/nan during batch_number=0
Last 21 forward frames:
abs min abs max metadata
encoder.block.1.layer.1.DenseReluDense.dropout Dropout
0.00e+00 2.57e+02 input[0]
0.00e+00 2.85e+02 output
[...]
encoder.block.2.layer.0 T5LayerSelfAttention
6.78e-04 3.15e+03 input[0]
2.65e-04 3.42e+03 output[0]
None output[1]
2.25e-01 1.00e+04 output[2]
encoder.block.2.layer.1.layer_norm T5LayerNorm
8.69e-02 4.18e-01 weight
2.65e-04 3.42e+03 input[0]
1.79e-06 4.65e+00 output
encoder.block.2.layer.1.DenseReluDense.wi_0 Linear
2.17e-07 4.50e+00 weight
1.79e-06 4.65e+00 input[0]
2.68e-06 3.70e+01 output
encoder.block.2.layer.1.DenseReluDense.wi_1 Linear
8.08e-07 2.66e+01 weight
1.79e-06 4.65e+00 input[0]
1.27e-04 2.37e+02 output
encoder.block.2.layer.1.DenseReluDense.dropout Dropout
0.00e+00 8.76e+03 input[0]
0.00e+00 9.74e+03 output
encoder.block.2.layer.1.DenseReluDense.wo Linear
1.01e-06 6.44e+00 weight
0.00e+00 9.74e+03 input[0]
3.18e-04 6.27e+04 output
encoder.block.2.layer.1.DenseReluDense T5DenseGatedGeluDense
1.79e-06 4.65e+00 input[0]
3.18e-04 6.27e+04 output
encoder.block.2.layer.1.dropout Dropout
3.18e-04 6.27e+04 input[0]
0.00e+00 inf output
The example output has been trimmed in the middle for brevity.
The second column shows the value of the absolute largest element, so if you have a closer look at the last few frames,
the inputs and outputs were in the range of ``1e4``. So when this training was done under fp16 mixed precision the very
last step overflowed (since under ``fp16`` the largest number before ``inf`` is ``64e3``). To avoid overflows under
``fp16`` the activations must remain way below ``1e4``, because ``1e4 * 1e4 = 1e8`` so any matrix multiplication with
large activations is going to lead to a numerical overflow condition.
At the very start of the trace you can discover at which batch number the problem occurred (here ``Detected inf/nan
during batch_number=0`` means the problem occurred on the first batch).
Each reported frame starts by declaring the fully qualified entry for the corresponding module this frame is reporting
for. If we look just at this frame:
.. code-block::
encoder.block.2.layer.1.layer_norm T5LayerNorm
8.69e-02 4.18e-01 weight
2.65e-04 3.42e+03 input[0]
1.79e-06 4.65e+00 output
Here, ``encoder.block.2.layer.1.layer_norm`` indicates that it was a layer norm for the first layer, of the second
block of the encoder. And the specific calls of the ``forward`` is ``T5LayerNorm``.
Let's look at the last few frames of that report:
.. code-block::
Detected inf/nan during batch_number=0
Last 21 forward frames:
abs min abs max metadata
[...]
encoder.block.2.layer.1.DenseReluDense.wi_0 Linear
2.17e-07 4.50e+00 weight
1.79e-06 4.65e+00 input[0]
2.68e-06 3.70e+01 output
encoder.block.2.layer.1.DenseReluDense.wi_1 Linear
8.08e-07 2.66e+01 weight
1.79e-06 4.65e+00 input[0]
1.27e-04 2.37e+02 output
encoder.block.2.layer.1.DenseReluDense.wo Linear
1.01e-06 6.44e+00 weight
0.00e+00 9.74e+03 input[0]
3.18e-04 6.27e+04 output
encoder.block.2.layer.1.DenseReluDense T5DenseGatedGeluDense
1.79e-06 4.65e+00 input[0]
3.18e-04 6.27e+04 output
encoder.block.2.layer.1.dropout Dropout
3.18e-04 6.27e+04 input[0]
0.00e+00 inf output
The last frame reports for ``Dropout.forward`` function with the first entry for the only input and the second for the
only output. You can see that it was called from an attribute ``dropout`` inside ``DenseReluDense`` class. We can see
that it happened during the first layer, of the 2nd block, during the very first batch. Finally, the absolute largest
input elements was ``6.27e+04`` and same for the output was ``inf``.
You can see here, that ``T5DenseGatedGeluDense.forward`` resulted in output activations, whose absolute max value was
around 62.7K, which is very close to fp16's top limit of 64K. In the next frame we have ``Dropout`` which renormalizes
the weights, after it zeroed some of the elements, which pushes the absolute max value to more than 64K, and we get an
overflow (``inf``).
As you can see it's the previous frames that we need to look into when the numbers start going into very large for fp16
numbers.
Let's match the report to the code from ``models/t5/modeling_t5.py``:
.. code-block:: python
class T5DenseGatedGeluDense(nn.Module):
def __init__(self, config):
super().__init__()
self.wi_0 = nn.Linear(config.d_model, config.d_ff, bias=False)
self.wi_1 = nn.Linear(config.d_model, config.d_ff, bias=False)
self.wo = nn.Linear(config.d_ff, config.d_model, bias=False)
self.dropout = nn.Dropout(config.dropout_rate)
self.gelu_act = ACT2FN["gelu_new"]
def forward(self, hidden_states):
hidden_gelu = self.gelu_act(self.wi_0(hidden_states))
hidden_linear = self.wi_1(hidden_states)
hidden_states = hidden_gelu * hidden_linear
hidden_states = self.dropout(hidden_states)
hidden_states = self.wo(hidden_states)
return hidden_states
Now it's easy to see the ``dropout`` call, and all the previous calls as well.
Since the detection is happening in a forward hook, these reports are printed immediately after each ``forward``
returns.
Going back to the full report, to act on it and to fix the problem, we need to go a few frames up where the numbers
started to go up and most likely switch to the ``fp32`` mode here, so that the numbers don't overflow when multiplied
or summed up. Of course, there might be other solutions. For example, we could turn off ``amp`` temporarily if it's
enabled, after moving the original ``forward`` into a helper wrapper, like so:
.. code-block:: python
def _forward(self, hidden_states):
hidden_gelu = self.gelu_act(self.wi_0(hidden_states))
hidden_linear = self.wi_1(hidden_states)
hidden_states = hidden_gelu * hidden_linear
hidden_states = self.dropout(hidden_states)
hidden_states = self.wo(hidden_states)
return hidden_states
import torch
def forward(self, hidden_states):
if torch.is_autocast_enabled():
with torch.cuda.amp.autocast(enabled=False):
return self._forward(hidden_states)
else:
return self._forward(hidden_states)
Since the automatic detector only reports on inputs and outputs of full frames, once you know where to look, you may
want to analyse the intermediary stages of any specific ``forward`` function as well. In such a case you can use the
``detect_overflow`` helper function to inject the detector where you want it, for example:
.. code-block:: python
from debug_utils import detect_overflow
class T5LayerFF(nn.Module):
[...]
def forward(self, hidden_states):
forwarded_states = self.layer_norm(hidden_states)
detect_overflow(forwarded_states, "after layer_norm")
forwarded_states = self.DenseReluDense(forwarded_states)
detect_overflow(forwarded_states, "after DenseReluDense")
return hidden_states + self.dropout(forwarded_states)
You can see that we added 2 of these and now we track if ``inf`` or ``nan`` for ``forwarded_states`` was detected
somewhere in between.
Actually, the detector already reports these because each of the calls in the example above is a `nn.Module``, but
let's say if you had some local direct calculations this is how you'd do that.
Additionally, if you're instantiating the debugger in your own code, you can adjust the number of frames printed from
its default, e.g.:
.. code-block:: python
from .debug_utils import DebugUnderflowOverflow
debug_overflow = DebugUnderflowOverflow(model, max_frames_to_save=100)
Specific batch absolute mix and max value tracing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The same debugging class can be used for per-batch tracing with the underflow/overflow detection feature turned off.
Let's say you want to watch the absolute min and max values for all the ingredients of each ``forward`` call of a given
batch, and only do that for batches 1 and 3. Then you instantiate this class as:
.. code-block:: python
debug_overflow = DebugUnderflowOverflow(model, trace_batch_nums=[1,3])
And now full batches 1 and 3 will be traced using the same format as the underflow/overflow detector does.
Batches are 0-indexed.
This is helpful if you know that the program starts misbehaving after a certain batch number, so you can fast-forward
right to that area. Here is a sample truncated output for such configuration:
.. code-block::
*** Starting batch number=1 ***
abs min abs max metadata
shared Embedding
1.01e-06 7.92e+02 weight
0.00e+00 2.47e+04 input[0]
5.36e-05 7.92e+02 output
[...]
decoder.dropout Dropout
1.60e-07 2.27e+01 input[0]
0.00e+00 2.52e+01 output
decoder T5Stack
not a tensor output
lm_head Linear
1.01e-06 7.92e+02 weight
0.00e+00 1.11e+00 input[0]
6.06e-02 8.39e+01 output
T5ForConditionalGeneration
not a tensor output
*** Starting batch number=3 ***
abs min abs max metadata
shared Embedding
1.01e-06 7.92e+02 weight
0.00e+00 2.78e+04 input[0]
5.36e-05 7.92e+02 output
[...]
Here you will get a huge number of frames dumped - as many as there were forward calls in your model, so it may or may
not what you want, but sometimes it can be easier to use for debugging purposes than a normal debugger. For example, if
a problem starts happening at batch number 150. So you can dump traces for batches 149 and 150 and compare where
numbers started to diverge.
You can also specify the batch number after which to stop the training, with:
.. code-block:: python
debug_overflow = DebugUnderflowOverflow(model, trace_batch_nums=[1,3], abort_after_batch_num=3)

View File

@ -1 +0,0 @@
../../examples/README.md

View File

@ -0,0 +1,70 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Using tokenizers from 🤗 Tokenizers
The [`PreTrainedTokenizerFast`] depends on the [🤗 Tokenizers](https://huggingface.co/docs/tokenizers) library. The tokenizers obtained from the 🤗 Tokenizers library can be
loaded very simply into 🤗 Transformers.
Before getting in the specifics, let's first start by creating a dummy tokenizer in a few lines:
```python
>>> from tokenizers import Tokenizer
>>> from tokenizers.models import BPE
>>> from tokenizers.trainers import BpeTrainer
>>> from tokenizers.pre_tokenizers import Whitespace
>>> tokenizer = Tokenizer(BPE(unk_token="[UNK]"))
>>> trainer = BpeTrainer(special_tokens=["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"])
>>> tokenizer.pre_tokenizer = Whitespace()
>>> files = [...]
>>> tokenizer.train(files, trainer)
```
We now have a tokenizer trained on the files we defined. We can either continue using it in that runtime, or save it to
a JSON file for future re-use.
## Loading directly from the tokenizer object
Let's see how to leverage this tokenizer object in the 🤗 Transformers library. The
[`PreTrainedTokenizerFast`] class allows for easy instantiation, by accepting the instantiated
*tokenizer* object as an argument:
```python
>>> from transformers import PreTrainedTokenizerFast
>>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_object=tokenizer)
```
This object can now be used with all the methods shared by the 🤗 Transformers tokenizers! Head to [the tokenizer
page](main_classes/tokenizer) for more information.
## Loading from a JSON file
In order to load a tokenizer from a JSON file, let's first start by saving our tokenizer:
```python
>>> tokenizer.save("tokenizer.json")
```
The path to which we saved this file can be passed to the [`PreTrainedTokenizerFast`] initialization
method using the `tokenizer_file` parameter:
```python
>>> from transformers import PreTrainedTokenizerFast
>>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer.json")
```
This object can now be used with all the methods shared by the 🤗 Transformers tokenizers! Head to [the tokenizer
page](main_classes/tokenizer) for more information.

View File

@ -1,62 +0,0 @@
Using tokenizers from 🤗 Tokenizers
=======================================================================================================================
The :class:`~transformers.PreTrainedTokenizerFast` depends on the `tokenizers
<https://huggingface.co/docs/tokenizers>`__ library. The tokenizers obtained from the 🤗 Tokenizers library can be
loaded very simply into 🤗 Transformers.
Before getting in the specifics, let's first start by creating a dummy tokenizer in a few lines:
.. code-block::
>>> from tokenizers import Tokenizer
>>> from tokenizers.models import BPE
>>> from tokenizers.trainers import BpeTrainer
>>> from tokenizers.pre_tokenizers import Whitespace
>>> tokenizer = Tokenizer(BPE(unk_token="[UNK]"))
>>> trainer = BpeTrainer(special_tokens=["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"])
>>> tokenizer.pre_tokenizer = Whitespace()
>>> files = [...]
>>> tokenizer.train(files, trainer)
We now have a tokenizer trained on the files we defined. We can either continue using it in that runtime, or save it to
a JSON file for future re-use.
Loading directly from the tokenizer object
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Let's see how to leverage this tokenizer object in the 🤗 Transformers library. The
:class:`~transformers.PreTrainedTokenizerFast` class allows for easy instantiation, by accepting the instantiated
`tokenizer` object as an argument:
.. code-block::
>>> from transformers import PreTrainedTokenizerFast
>>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_object=tokenizer)
This object can now be used with all the methods shared by the 🤗 Transformers tokenizers! Head to :doc:`the tokenizer
page <main_classes/tokenizer>` for more information.
Loading from a JSON file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In order to load a tokenizer from a JSON file, let's first start by saving our tokenizer:
.. code-block::
>>> tokenizer.save("tokenizer.json")
The path to which we saved this file can be passed to the :class:`~transformers.PreTrainedTokenizerFast` initialization
method using the :obj:`tokenizer_file` parameter:
.. code-block::
>>> from transformers import PreTrainedTokenizerFast
>>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer.json")
This object can now be used with all the methods shared by the 🤗 Transformers tokenizers! Head to :doc:`the tokenizer
page <main_classes/tokenizer>` for more information.

300
docs/source/glossary.mdx Normal file
View File

@ -0,0 +1,300 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Glossary
## General terms
- autoencoding models: see MLM
- autoregressive models: see CLM
- CLM: causal language modeling, a pretraining task where the model reads the texts in order and has to predict the
next word. It's usually done by reading the whole sentence but using a mask inside the model to hide the future
tokens at a certain timestep.
- deep learning: machine learning algorithms which uses neural networks with several layers.
- MLM: masked language modeling, a pretraining task where the model sees a corrupted version of the texts, usually done
by masking some tokens randomly, and has to predict the original text.
- multimodal: a task that combines texts with another kind of inputs (for instance images).
- NLG: natural language generation, all tasks related to generating text (for instance talk with transformers,
translation).
- NLP: natural language processing, a generic way to say "deal with texts".
- NLU: natural language understanding, all tasks related to understanding what is in a text (for instance classifying
the whole text, individual words).
- pretrained model: a model that has been pretrained on some data (for instance all of Wikipedia). Pretraining methods
involve a self-supervised objective, which can be reading the text and trying to predict the next word (see CLM) or
masking some words and trying to predict them (see MLM).
- RNN: recurrent neural network, a type of model that uses a loop over a layer to process texts.
- self-attention: each element of the input finds out which other elements of the input they should attend to.
- seq2seq or sequence-to-sequence: models that generate a new sequence from an input, like translation models, or
summarization models (such as [Bart](model_doc/bart) or [T5](model_doc/t5)).
- token: a part of a sentence, usually a word, but can also be a subword (non-common words are often split in subwords)
or a punctuation symbol.
- transformer: self-attention based deep learning model architecture.
## Model inputs
Every model is different yet bears similarities with the others. Therefore most models use the same inputs, which are
detailed here alongside usage examples.
<a id='input-ids'></a>
### Input IDs
The input ids are often the only required parameters to be passed to the model as input. *They are token indices,
numerical representations of tokens building the sequences that will be used as input by the model*.
<Youtube id="VFp38yj8h3A"/>
Each tokenizer works differently but the underlying mechanism remains the same. Here's an example using the BERT
tokenizer, which is a [WordPiece](https://arxiv.org/pdf/1609.08144.pdf) tokenizer:
```python
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence = "A Titan RTX has 24GB of VRAM"
```
The tokenizer takes care of splitting the sequence into tokens available in the tokenizer vocabulary.
```python
>>> tokenized_sequence = tokenizer.tokenize(sequence)
```
The tokens are either words or subwords. Here for instance, "VRAM" wasn't in the model vocabulary, so it's been split
in "V", "RA" and "M". To indicate those tokens are not separate words but parts of the same word, a double-hash prefix
is added for "RA" and "M":
```python
>>> print(tokenized_sequence)
['A', 'Titan', 'R', '##T', '##X', 'has', '24', '##GB', 'of', 'V', '##RA', '##M']
```
These tokens can then be converted into IDs which are understandable by the model. This can be done by directly feeding
the sentence to the tokenizer, which leverages the Rust implementation of [🤗 Tokenizers](https://github.com/huggingface/tokenizers) for peak performance.
```python
>>> inputs = tokenizer(sequence)
```
The tokenizer returns a dictionary with all the arguments necessary for its corresponding model to work properly. The
token indices are under the key "input_ids":
```python
>>> encoded_sequence = inputs["input_ids"]
>>> print(encoded_sequence)
[101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102]
```
Note that the tokenizer automatically adds "special tokens" (if the associated model relies on them) which are special
IDs the model sometimes uses.
If we decode the previous sequence of ids,
```python
>>> decoded_sequence = tokenizer.decode(encoded_sequence)
```
we will see
```python
>>> print(decoded_sequence)
[CLS] A Titan RTX has 24GB of VRAM [SEP]
```
because this is the way a [`BertModel`] is going to expect its inputs.
<a id='attention-mask'></a>
### Attention mask
The attention mask is an optional argument used when batching sequences together.
<Youtube id="M6adb1j2jPI"/>
This argument indicates to the model which tokens should be attended to, and which should not.
For example, consider these two sequences:
```python
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence_a = "This is a short sequence."
>>> sequence_b = "This is a rather long sequence. It is at least longer than the sequence A."
>>> encoded_sequence_a = tokenizer(sequence_a)["input_ids"]
>>> encoded_sequence_b = tokenizer(sequence_b)["input_ids"]
```
The encoded versions have different lengths:
```python
>>> len(encoded_sequence_a), len(encoded_sequence_b)
(8, 19)
```
Therefore, we can't put them together in the same tensor as-is. The first sequence needs to be padded up to the length
of the second one, or the second one needs to be truncated down to the length of the first one.
In the first case, the list of IDs will be extended by the padding indices. We can pass a list to the tokenizer and ask
it to pad like this:
```python
>>> padded_sequences = tokenizer([sequence_a, sequence_b], padding=True)
```
We can see that 0s have been added on the right of the first sentence to make it the same length as the second one:
```python
>>> padded_sequences["input_ids"]
[[101, 1188, 1110, 170, 1603, 4954, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [101, 1188, 1110, 170, 1897, 1263, 4954, 119, 1135, 1110, 1120, 1655, 2039, 1190, 1103, 4954, 138, 119, 102]]
```
This can then be converted into a tensor in PyTorch or TensorFlow. The attention mask is a binary tensor indicating the
position of the padded indices so that the model does not attend to them. For the [`BertTokenizer`],
`1` indicates a value that should be attended to, while `0` indicates a padded value. This attention mask is
in the dictionary returned by the tokenizer under the key "attention_mask":
```python
>>> padded_sequences["attention_mask"]
[[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
```
<a id='token-type-ids'></a>
### Token Type IDs
Some models' purpose is to do classification on pairs of sentences or question answering.
<Youtube id="0u3ioSwev3s"/>
These require two different sequences to be joined in a single "input_ids" entry, which usually is performed with the
help of special tokens, such as the classifier (`[CLS]`) and separator (`[SEP]`) tokens. For example, the BERT
model builds its two sequence input as such:
```python
>>> # [CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP]
```
We can use our tokenizer to automatically generate such a sentence by passing the two sequences to `tokenizer` as two
arguments (and not a list, like before) like this:
```python
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence_a = "HuggingFace is based in NYC"
>>> sequence_b = "Where is HuggingFace based?"
>>> encoded_dict = tokenizer(sequence_a, sequence_b)
>>> decoded = tokenizer.decode(encoded_dict["input_ids"])
```
which will return:
```python
>>> print(decoded)
[CLS] HuggingFace is based in NYC [SEP] Where is HuggingFace based? [SEP]
```
This is enough for some models to understand where one sequence ends and where another begins. However, other models,
such as BERT, also deploy token type IDs (also called segment IDs). They are represented as a binary mask identifying
the two types of sequence in the model.
The tokenizer returns this mask as the "token_type_ids" entry:
```python
>>> encoded_dict["token_type_ids"]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]
```
The first sequence, the "context" used for the question, has all its tokens represented by a `0`, whereas the
second sequence, corresponding to the "question", has all its tokens represented by a `1`.
Some models, like [`XLNetModel`] use an additional token represented by a `2`.
<a id='position-ids'></a>
### Position IDs
Contrary to RNNs that have the position of each token embedded within them, transformers are unaware of the position of
each token. Therefore, the position IDs (`position_ids`) are used by the model to identify each token's position in
the list of tokens.
They are an optional parameter. If no `position_ids` are passed to the model, the IDs are automatically created as
absolute positional embeddings.
Absolute positional embeddings are selected in the range `[0, config.max_position_embeddings - 1]`. Some models use
other types of positional embeddings, such as sinusoidal position embeddings or relative position embeddings.
<a id='labels'></a>
### Labels
The labels are an optional argument which can be passed in order for the model to compute the loss itself. These labels
should be the expected prediction of the model: it will use the standard loss in order to compute the loss between its
predictions and the expected value (the label).
These labels are different according to the model head, for example:
- For sequence classification models (e.g., [`BertForSequenceClassification`]), the model expects a
tensor of dimension `(batch_size)` with each value of the batch corresponding to the expected label of the
entire sequence.
- For token classification models (e.g., [`BertForTokenClassification`]), the model expects a tensor
of dimension `(batch_size, seq_length)` with each value corresponding to the expected label of each individual
token.
- For masked language modeling (e.g., [`BertForMaskedLM`]), the model expects a tensor of dimension
`(batch_size, seq_length)` with each value corresponding to the expected label of each individual token: the
labels being the token ID for the masked token, and values to be ignored for the rest (usually -100).
- For sequence to sequence tasks,(e.g., [`BartForConditionalGeneration`],
[`MBartForConditionalGeneration`]), the model expects a tensor of dimension `(batch_size, tgt_seq_length)` with each value corresponding to the target sequences associated with each input sequence. During
training, both *BART* and *T5* will make the appropriate *decoder_input_ids* and decoder attention masks internally.
They usually do not need to be supplied. This does not apply to models leveraging the Encoder-Decoder framework. See
the documentation of each model for more information on each specific model's labels.
The base models (e.g., [`BertModel`]) do not accept labels, as these are the base transformer
models, simply outputting features.
<a id='decoder-input-ids'></a>
### Decoder input IDs
This input is specific to encoder-decoder models, and contains the input IDs that will be fed to the decoder. These
inputs should be used for sequence to sequence tasks, such as translation or summarization, and are usually built in a
way specific to each model.
Most encoder-decoder models (BART, T5) create their `decoder_input_ids` on their own from the `labels`. In
such models, passing the `labels` is the preferred way to handle training.
Please check each model's docs to see how they handle these input IDs for sequence to sequence training.
<a id='feed-forward-chunking'></a>
### Feed Forward Chunking
In each residual attention block in transformers the self-attention layer is usually followed by 2 feed forward layers.
The intermediate embedding size of the feed forward layers is often bigger than the hidden size of the model (e.g., for
`bert-base-uncased`).
For an input of size `[batch_size, sequence_length]`, the memory required to store the intermediate feed forward
embeddings `[batch_size, sequence_length, config.intermediate_size]` can account for a large fraction of the memory
use. The authors of [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) noticed that since the
computation is independent of the `sequence_length` dimension, it is mathematically equivalent to compute the output
embeddings of both feed forward layers `[batch_size, config.hidden_size]_0, ..., [batch_size, config.hidden_size]_n`
individually and concat them afterward to `[batch_size, sequence_length, config.hidden_size]` with `n = sequence_length`, which trades increased computation time against reduced memory use, but yields a mathematically
**equivalent** result.
For models employing the function [`apply_chunking_to_forward`], the `chunk_size` defines the
number of output embeddings that are computed in parallel and thus defines the trade-off between memory and time
complexity. If `chunk_size` is set to 0, no feed forward chunking is done.

View File

@ -1,322 +0,0 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Glossary
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
General terms
-----------------------------------------------------------------------------------------------------------------------
- autoencoding models: see MLM
- autoregressive models: see CLM
- CLM: causal language modeling, a pretraining task where the model reads the texts in order and has to predict the
next word. It's usually done by reading the whole sentence but using a mask inside the model to hide the future
tokens at a certain timestep.
- deep learning: machine learning algorithms which uses neural networks with several layers.
- MLM: masked language modeling, a pretraining task where the model sees a corrupted version of the texts, usually done
by masking some tokens randomly, and has to predict the original text.
- multimodal: a task that combines texts with another kind of inputs (for instance images).
- NLG: natural language generation, all tasks related to generating text (for instance talk with transformers,
translation).
- NLP: natural language processing, a generic way to say "deal with texts".
- NLU: natural language understanding, all tasks related to understanding what is in a text (for instance classifying
the whole text, individual words).
- pretrained model: a model that has been pretrained on some data (for instance all of Wikipedia). Pretraining methods
involve a self-supervised objective, which can be reading the text and trying to predict the next word (see CLM) or
masking some words and trying to predict them (see MLM).
- RNN: recurrent neural network, a type of model that uses a loop over a layer to process texts.
- self-attention: each element of the input finds out which other elements of the input they should attend to.
- seq2seq or sequence-to-sequence: models that generate a new sequence from an input, like translation models, or
summarization models (such as :doc:`Bart </model_doc/bart>` or :doc:`T5 </model_doc/t5>`).
- token: a part of a sentence, usually a word, but can also be a subword (non-common words are often split in subwords)
or a punctuation symbol.
- transformer: self-attention based deep learning model architecture.
Model inputs
-----------------------------------------------------------------------------------------------------------------------
Every model is different yet bears similarities with the others. Therefore most models use the same inputs, which are
detailed here alongside usage examples.
.. _input-ids:
Input IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The input ids are often the only required parameters to be passed to the model as input. *They are token indices,
numerical representations of tokens building the sequences that will be used as input by the model*.
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/VFp38yj8h3A" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
Each tokenizer works differently but the underlying mechanism remains the same. Here's an example using the BERT
tokenizer, which is a `WordPiece <https://arxiv.org/pdf/1609.08144.pdf>`__ tokenizer:
.. code-block::
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence = "A Titan RTX has 24GB of VRAM"
The tokenizer takes care of splitting the sequence into tokens available in the tokenizer vocabulary.
.. code-block::
>>> tokenized_sequence = tokenizer.tokenize(sequence)
The tokens are either words or subwords. Here for instance, "VRAM" wasn't in the model vocabulary, so it's been split
in "V", "RA" and "M". To indicate those tokens are not separate words but parts of the same word, a double-hash prefix
is added for "RA" and "M":
.. code-block::
>>> print(tokenized_sequence)
['A', 'Titan', 'R', '##T', '##X', 'has', '24', '##GB', 'of', 'V', '##RA', '##M']
These tokens can then be converted into IDs which are understandable by the model. This can be done by directly feeding
the sentence to the tokenizer, which leverages the Rust implementation of `huggingface/tokenizers
<https://github.com/huggingface/tokenizers>`__ for peak performance.
.. code-block::
>>> inputs = tokenizer(sequence)
The tokenizer returns a dictionary with all the arguments necessary for its corresponding model to work properly. The
token indices are under the key "input_ids":
.. code-block::
>>> encoded_sequence = inputs["input_ids"]
>>> print(encoded_sequence)
[101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102]
Note that the tokenizer automatically adds "special tokens" (if the associated model relies on them) which are special
IDs the model sometimes uses.
If we decode the previous sequence of ids,
.. code-block::
>>> decoded_sequence = tokenizer.decode(encoded_sequence)
we will see
.. code-block::
>>> print(decoded_sequence)
[CLS] A Titan RTX has 24GB of VRAM [SEP]
because this is the way a :class:`~transformers.BertModel` is going to expect its inputs.
.. _attention-mask:
Attention mask
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The attention mask is an optional argument used when batching sequences together.
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/M6adb1j2jPI" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
This argument indicates to the model which tokens should be attended to, and which should not.
For example, consider these two sequences:
.. code-block::
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence_a = "This is a short sequence."
>>> sequence_b = "This is a rather long sequence. It is at least longer than the sequence A."
>>> encoded_sequence_a = tokenizer(sequence_a)["input_ids"]
>>> encoded_sequence_b = tokenizer(sequence_b)["input_ids"]
The encoded versions have different lengths:
.. code-block::
>>> len(encoded_sequence_a), len(encoded_sequence_b)
(8, 19)
Therefore, we can't put them together in the same tensor as-is. The first sequence needs to be padded up to the length
of the second one, or the second one needs to be truncated down to the length of the first one.
In the first case, the list of IDs will be extended by the padding indices. We can pass a list to the tokenizer and ask
it to pad like this:
.. code-block::
>>> padded_sequences = tokenizer([sequence_a, sequence_b], padding=True)
We can see that 0s have been added on the right of the first sentence to make it the same length as the second one:
.. code-block::
>>> padded_sequences["input_ids"]
[[101, 1188, 1110, 170, 1603, 4954, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [101, 1188, 1110, 170, 1897, 1263, 4954, 119, 1135, 1110, 1120, 1655, 2039, 1190, 1103, 4954, 138, 119, 102]]
This can then be converted into a tensor in PyTorch or TensorFlow. The attention mask is a binary tensor indicating the
position of the padded indices so that the model does not attend to them. For the :class:`~transformers.BertTokenizer`,
:obj:`1` indicates a value that should be attended to, while :obj:`0` indicates a padded value. This attention mask is
in the dictionary returned by the tokenizer under the key "attention_mask":
.. code-block::
>>> padded_sequences["attention_mask"]
[[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
.. _token-type-ids:
Token Type IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some models' purpose is to do classification on pairs of sentences or question answering.
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/0u3ioSwev3s" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
These require two different sequences to be joined in a single "input_ids" entry, which usually is performed with the
help of special tokens, such as the classifier (``[CLS]``) and separator (``[SEP]``) tokens. For example, the BERT
model builds its two sequence input as such:
.. code-block::
>>> # [CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP]
We can use our tokenizer to automatically generate such a sentence by passing the two sequences to ``tokenizer`` as two
arguments (and not a list, like before) like this:
.. code-block::
>>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence_a = "HuggingFace is based in NYC"
>>> sequence_b = "Where is HuggingFace based?"
>>> encoded_dict = tokenizer(sequence_a, sequence_b)
>>> decoded = tokenizer.decode(encoded_dict["input_ids"])
which will return:
.. code-block::
>>> print(decoded)
[CLS] HuggingFace is based in NYC [SEP] Where is HuggingFace based? [SEP]
This is enough for some models to understand where one sequence ends and where another begins. However, other models,
such as BERT, also deploy token type IDs (also called segment IDs). They are represented as a binary mask identifying
the two types of sequence in the model.
The tokenizer returns this mask as the "token_type_ids" entry:
.. code-block::
>>> encoded_dict['token_type_ids']
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]
The first sequence, the "context" used for the question, has all its tokens represented by a :obj:`0`, whereas the
second sequence, corresponding to the "question", has all its tokens represented by a :obj:`1`.
Some models, like :class:`~transformers.XLNetModel` use an additional token represented by a :obj:`2`.
.. _position-ids:
Position IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Contrary to RNNs that have the position of each token embedded within them, transformers are unaware of the position of
each token. Therefore, the position IDs (``position_ids``) are used by the model to identify each token's position in
the list of tokens.
They are an optional parameter. If no ``position_ids`` are passed to the model, the IDs are automatically created as
absolute positional embeddings.
Absolute positional embeddings are selected in the range ``[0, config.max_position_embeddings - 1]``. Some models use
other types of positional embeddings, such as sinusoidal position embeddings or relative position embeddings.
.. _labels:
Labels
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The labels are an optional argument which can be passed in order for the model to compute the loss itself. These labels
should be the expected prediction of the model: it will use the standard loss in order to compute the loss between its
predictions and the expected value (the label).
These labels are different according to the model head, for example:
- For sequence classification models (e.g., :class:`~transformers.BertForSequenceClassification`), the model expects a
tensor of dimension :obj:`(batch_size)` with each value of the batch corresponding to the expected label of the
entire sequence.
- For token classification models (e.g., :class:`~transformers.BertForTokenClassification`), the model expects a tensor
of dimension :obj:`(batch_size, seq_length)` with each value corresponding to the expected label of each individual
token.
- For masked language modeling (e.g., :class:`~transformers.BertForMaskedLM`), the model expects a tensor of dimension
:obj:`(batch_size, seq_length)` with each value corresponding to the expected label of each individual token: the
labels being the token ID for the masked token, and values to be ignored for the rest (usually -100).
- For sequence to sequence tasks,(e.g., :class:`~transformers.BartForConditionalGeneration`,
:class:`~transformers.MBartForConditionalGeneration`), the model expects a tensor of dimension :obj:`(batch_size,
tgt_seq_length)` with each value corresponding to the target sequences associated with each input sequence. During
training, both `BART` and `T5` will make the appropriate `decoder_input_ids` and decoder attention masks internally.
They usually do not need to be supplied. This does not apply to models leveraging the Encoder-Decoder framework. See
the documentation of each model for more information on each specific model's labels.
The base models (e.g., :class:`~transformers.BertModel`) do not accept labels, as these are the base transformer
models, simply outputting features.
.. _decoder-input-ids:
Decoder input IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This input is specific to encoder-decoder models, and contains the input IDs that will be fed to the decoder. These
inputs should be used for sequence to sequence tasks, such as translation or summarization, and are usually built in a
way specific to each model.
Most encoder-decoder models (BART, T5) create their :obj:`decoder_input_ids` on their own from the :obj:`labels`. In
such models, passing the :obj:`labels` is the preferred way to handle training.
Please check each model's docs to see how they handle these input IDs for sequence to sequence training.
.. _feed-forward-chunking:
Feed Forward Chunking
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In each residual attention block in transformers the self-attention layer is usually followed by 2 feed forward layers.
The intermediate embedding size of the feed forward layers is often bigger than the hidden size of the model (e.g., for
``bert-base-uncased``).
For an input of size ``[batch_size, sequence_length]``, the memory required to store the intermediate feed forward
embeddings ``[batch_size, sequence_length, config.intermediate_size]`` can account for a large fraction of the memory
use. The authors of `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451>`_ noticed that since the
computation is independent of the ``sequence_length`` dimension, it is mathematically equivalent to compute the output
embeddings of both feed forward layers ``[batch_size, config.hidden_size]_0, ..., [batch_size, config.hidden_size]_n``
individually and concat them afterward to ``[batch_size, sequence_length, config.hidden_size]`` with ``n =
sequence_length``, which trades increased computation time against reduced memory use, but yields a mathematically
**equivalent** result.
For models employing the function :func:`~.transformers.apply_chunking_to_forward`, the ``chunk_size`` defines the
number of output embeddings that are computed in parallel and thus defines the trade-off between memory and time
complexity. If ``chunk_size`` is set to 0, no feed forward chunking is done.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 342 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.7 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 126 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 162 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 99 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 54 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 159 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 71 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 352 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 418 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 373 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 8.7 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 691 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 9.7 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 16 KiB

View File

@ -12,25 +12,18 @@ specific language governing permissions and limitations under the License.
# 🤗 Transformers # 🤗 Transformers
State-of-the-art Machine Learning for Jax, Pytorch and TensorFlow State-of-the-art Machine Learning for PyTorch, TensorFlow and JAX.
🤗 Transformers (formerly known as _pytorch-transformers_ and _pytorch-pretrained-bert_) provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio. 🤗 Transformers provides APIs to easily download and train state-of-the-art pretrained models. Using pretrained models can reduce your compute costs, carbon footprint, and save you time from training a model from scratch. The models can be used across different modalities such as:
These models can applied on: * 📝 Text: text classification, information extraction, question answering, summarization, translation, and text generation in over 100 languages.
* 🖼️ Images: image classification, object detection, and segmentation.
* 🗣️ Audio: speech recognition and audio classification.
* 🐙 Multimodal: table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
* 📝 Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages. Our library supports seamless integration between three of the most popular deep learning libraries: [PyTorch](https://pytorch.org/), [TensorFlow](https://www.tensorflow.org/) and [JAX](https://jax.readthedocs.io/en/latest/). Train your model in three lines of code in one framework, and load it for inference with another.
* 🖼️ Images, for tasks like image classification, object detection, and segmentation.
* 🗣️ Audio, for tasks like speech recognition and audio classification.
Transformer models can also perform tasks on **several modalities combined**, such as table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering. Each 🤗 Transformers architecture is defined in a standalone Python module so they can be easily customized for research and experiments.
🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our [model hub](https://huggingface.co/models). At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.
🤗 Transformers is backed by the three most popular deep learning libraries — [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/) — with a seamless integration between them. It's straightforward to train your models with one before loading them for inference with the other.
This is the documentation of our repository [transformers](https://github.com/huggingface/transformers). You can
also follow our [online course](https://huggingface.co/course) that teaches how to use this library, as well as the
other libraries developed by Hugging Face and the Hub.
## If you are looking for custom support from the Hugging Face team ## If you are looking for custom support from the Hugging Face team
@ -38,35 +31,6 @@ other libraries developed by Hugging Face and the Hub.
<img alt="HuggingFace Expert Acceleration Program" src="https://huggingface.co/front/thumbnails/support.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);"> <img alt="HuggingFace Expert Acceleration Program" src="https://huggingface.co/front/thumbnails/support.png" style="max-width: 600px; border: 1px solid #eee; border-radius: 4px; box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05);">
</a><br> </a><br>
## Features
1. Easy-to-use state-of-the-art models:
- High performance on natural language understanding & generation, computer vision, and audio tasks.
- Low barrier to entry for educators and practitioners.
- Few user-facing abstractions with just three classes to learn.
- A unified API for using all our pretrained models.
1. Lower compute costs, smaller carbon footprint:
- Researchers can share trained models instead of always retraining.
- Practitioners can reduce compute time and production costs.
- Dozens of architectures with over 20,000 pretrained models, some in more than 100 languages.
1. Choose the right framework for every part of a model's lifetime:
- Train state-of-the-art models in 3 lines of code.
- Move a single model between TF2.0/PyTorch/JAX frameworks at will.
- Seamlessly pick the right framework for training, evaluation and production.
1. Easily customize a model or an example to your needs:
- We provide examples for each architecture to reproduce the results published by its original authors.
- Model internals are exposed as consistently as possible.
- Model files can be used independently of the library for quick experiments.
[All the model checkpoints](https://huggingface.co/models) are seamlessly integrated from the huggingface.co [model
hub](https://huggingface.co) where they are uploaded directly by [users](https://huggingface.co/users) and
[organizations](https://huggingface.co/organizations).
Current number of checkpoints: <img src="https://img.shields.io/endpoint?url=https://huggingface.co/api/shields/models&color=brightgreen">
## Contents ## Contents
The documentation is organized in five parts: The documentation is organized in five parts:
@ -91,38 +55,40 @@ conversion utilities for the following models.
<!--This list is updated automatically from the README with _make fix-copies_. Do not update manually! --> <!--This list is updated automatically from the README with _make fix-copies_. Do not update manually! -->
1. **[ALBERT](model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. 1. **[ALBERT](model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
1. **[BART](model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer. 1. **[BART](model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BARThez](model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis. 1. **[BARThez](model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
1. **[BARTpho](model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen. 1. **[BARTpho](model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
1. **[BEiT](model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei. 1. **[BEiT](model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei.
1. **[BERT](model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. 1. **[BERT](model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERTweet](model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen. 1. **[BERTweet](model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
1. **[BERT For Sequence Generation](model_doc/bertgeneration)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. 1. **[BERT For Sequence Generation](model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[BigBird-RoBERTa](model_doc/bigbird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 1. **[BigBird-RoBERTa](model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[BigBird-Pegasus](model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 1. **[BigBird-Pegasus](model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 1. **[Blenderbot](model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](model_doc/blenderbot_small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston. 1. **[BlenderbotSmall](model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BORT](model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry. 1. **[BORT](model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel. 1. **[ByT5](model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot. 1. **[CamemBERT](model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CANINE](model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting. 1. **[CANINE](model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
1. **[ConvNeXT](model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[CLIP](model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever. 1. **[CLIP](model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[ConvBERT](model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan. 1. **[ConvBERT](model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
1. **[CPM](model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun. 1. **[CPM](model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CTRL](model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher. 1. **[CTRL](model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[Data2Vec](model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. 1. **[DeBERTa](model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](model_doc/deberta_v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. 1. **[DeBERTa-v2](model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeiT](model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou. 1. **[DeiT](model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko. 1. **[DETR](model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DialoGPT](model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. 1. **[DialoGPT](model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation) and a German version of DistilBERT. 1. **[DistilBERT](model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation) and a German version of DistilBERT.
1. **[DPR](model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. 1. **[DPR](model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[EncoderDecoder](model_doc/encoderdecoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. 1. **[EncoderDecoder](model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[ELECTRA](model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. 1. **[ELECTRA](model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[FlauBERT](model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab. 1. **[FlauBERT](model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FNet](model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon. 1. **[FNet](model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
1. **[Funnel Transformer](model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le. 1. **[Funnel Transformer](model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GPT](model_doc/gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. 1. **[GPT](model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT-2](model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**. 1. **[GPT-2](model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[GPT-J](model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki. 1. **[GPT-J](model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
1. **[GPT Neo](model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. 1. **[GPT Neo](model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
@ -139,21 +105,26 @@ conversion utilities for the following models.
1. **[LXMERT](model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal. 1. **[LXMERT](model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[M2M100](model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin. 1. **[M2M100](model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
1. **[MarianMT](model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team. 1. **[MarianMT](model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MaskFormer](model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
1. **[MBart](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. 1. **[MBart](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[MBart-50](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan. 1. **[MBart-50](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[Megatron-BERT](model_doc/megatron_bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. 1. **[Megatron-BERT](model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[Megatron-GPT2](model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro. 1. **[Megatron-GPT2](model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[MPNet](model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. 1. **[MPNet](model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel. 1. **[MT5](model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Nyströmformer](model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[Pegasus](model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu. 1. **[Pegasus](model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira. 1. **[Perceiver IO](model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen. 1. **[PhoBERT](model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[PLBart](model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[ProphetNet](model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. 1. **[ProphetNet](model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[QDQBert](model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius. 1. **[QDQBert](model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[REALM](https://huggingface.co/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya. 1. **[Reformer](model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RemBERT](model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder. 1. **[RemBERT](model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/abs/2010.12821) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
1. **[RoBERTa](model_doc/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. 1. **[RoBERTa](model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
1. **[RoFormer](model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu. 1. **[RoFormer](model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
1. **[SegFormer](model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo. 1. **[SegFormer](model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
1. **[SEW](model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. 1. **[SEW](model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
1. **[SEW-D](model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi. 1. **[SEW-D](model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
@ -161,21 +132,30 @@ conversion utilities for the following models.
1. **[SpeechToTextTransformer2](model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau. 1. **[SpeechToTextTransformer2](model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
1. **[Splinter](model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy. 1. **[Splinter](model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
1. **[SqueezeBert](model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer. 1. **[SqueezeBert](model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[T5](model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. 1. **[T5](model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[T5v1.1](model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu. 1. **[T5v1.1](model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos. 1. **[TAPAS](model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[Transformer-XL](model_doc/transformerxl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. 1. **[Transformer-XL](model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[TrOCR](model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei. 1. **[TrOCR](model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[UniSpeech](model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang. 1. **[UniSpeech](model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
1. **[UniSpeechSat](model_doc/unispeech_sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu. 1. **[UniSpeechSat](model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
1. **[ViLT](model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[Vision Transformer (ViT)](model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby. 1. **[Vision Transformer (ViT)](model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
1. **[ViTMAE](model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[VisualBERT](model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. 1. **[VisualBERT](model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
1. **[WavLM](model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[Wav2Vec2](model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli. 1. **[Wav2Vec2](model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/master/transformers/model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli.
1. **[XGLM](https://huggingface.co/docs/master/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau. 1. **[XLM](model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](model_doc/xlmprophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou. 1. **[XLM-ProphetNet](model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](model_doc/xlmroberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. 1. **[XLM-RoBERTa](model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLM-RoBERTa-XL](model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
1. **[XLNet](model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. 1. **[XLNet](model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[XLSR-Wav2Vec2](model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli. 1. **[XLSR-Wav2Vec2](model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. **[XLS-R](https://huggingface.co/docs/master/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
1. **[YOSO](model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
### Supported frameworks ### Supported frameworks
@ -187,7 +167,7 @@ Flax), PyTorch, and/or TensorFlow.
<!--This table is updated automatically from the auto modules with _make fix-copies_. Do not update manually!--> <!--This table is updated automatically from the auto modules with _make fix-copies_. Do not update manually!-->
| Model | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support | | Model | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support |
|-----------------------------|----------------|----------------|-----------------|--------------------|--------------| |:---------------------------:|:--------------:|:--------------:|:---------------:|:------------------:|:------------:|
| ALBERT | ✅ | ✅ | ✅ | ✅ | ✅ | | ALBERT | ✅ | ✅ | ✅ | ✅ | ✅ |
| BART | ✅ | ✅ | ✅ | ✅ | ✅ | | BART | ✅ | ✅ | ✅ | ✅ | ✅ |
| BEiT | ❌ | ❌ | ✅ | ❌ | ✅ | | BEiT | ❌ | ❌ | ✅ | ❌ | ✅ |
@ -199,9 +179,12 @@ Flax), PyTorch, and/or TensorFlow.
| BlenderbotSmall | ✅ | ✅ | ✅ | ✅ | ✅ | | BlenderbotSmall | ✅ | ✅ | ✅ | ✅ | ✅ |
| CamemBERT | ✅ | ✅ | ✅ | ✅ | ❌ | | CamemBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| Canine | ✅ | ❌ | ✅ | ❌ | ❌ | | Canine | ✅ | ❌ | ✅ | ❌ | ❌ |
| CLIP | ✅ | ✅ | ✅ | | ✅ | | CLIP | ✅ | ✅ | ✅ | | ✅ |
| ConvBERT | ✅ | ✅ | ✅ | ✅ | ❌ | | ConvBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| ConvNext | ❌ | ❌ | ✅ | ✅ | ❌ |
| CTRL | ✅ | ❌ | ✅ | ✅ | ❌ | | CTRL | ✅ | ❌ | ✅ | ✅ | ❌ |
| Data2VecAudio | ❌ | ❌ | ✅ | ❌ | ❌ |
| Data2VecText | ❌ | ❌ | ✅ | ❌ | ❌ |
| DeBERTa | ✅ | ✅ | ✅ | ✅ | ❌ | | DeBERTa | ✅ | ✅ | ✅ | ✅ | ❌ |
| DeBERTa-v2 | ✅ | ❌ | ✅ | ✅ | ❌ | | DeBERTa-v2 | ✅ | ❌ | ✅ | ✅ | ❌ |
| DeiT | ❌ | ❌ | ✅ | ❌ | ❌ | | DeiT | ❌ | ❌ | ✅ | ❌ | ❌ |
@ -227,45 +210,57 @@ Flax), PyTorch, and/or TensorFlow.
| LXMERT | ✅ | ✅ | ✅ | ✅ | ❌ | | LXMERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| M2M100 | ✅ | ❌ | ✅ | ❌ | ❌ | | M2M100 | ✅ | ❌ | ✅ | ❌ | ❌ |
| Marian | ✅ | ❌ | ✅ | ✅ | ✅ | | Marian | ✅ | ❌ | ✅ | ✅ | ✅ |
| MaskFormer | ❌ | ❌ | ✅ | ❌ | ❌ |
| mBART | ✅ | ✅ | ✅ | ✅ | ✅ | | mBART | ✅ | ✅ | ✅ | ✅ | ✅ |
| MegatronBert | ❌ | ❌ | ✅ | ❌ | ❌ | | MegatronBert | ❌ | ❌ | ✅ | ❌ | ❌ |
| MobileBERT | ✅ | ✅ | ✅ | ✅ | ❌ | | MobileBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| MPNet | ✅ | ✅ | ✅ | ✅ | ❌ | | MPNet | ✅ | ✅ | ✅ | ✅ | ❌ |
| mT5 | ✅ | ✅ | ✅ | ✅ | ✅ | | mT5 | ✅ | ✅ | ✅ | ✅ | ✅ |
| Nystromformer | ❌ | ❌ | ✅ | ❌ | ❌ |
| OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ | | OpenAI GPT | ✅ | ✅ | ✅ | ✅ | ❌ |
| OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ | | OpenAI GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ |
| Pegasus | ✅ | ✅ | ✅ | ✅ | ✅ | | Pegasus | ✅ | ✅ | ✅ | ✅ | ✅ |
| Perceiver | ✅ | ❌ | ✅ | ❌ | ❌ | | Perceiver | ✅ | ❌ | ✅ | ❌ | ❌ |
| PLBart | ✅ | ❌ | ✅ | ❌ | ❌ |
| PoolFormer | ❌ | ❌ | ✅ | ❌ | ❌ |
| ProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ | | ProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ |
| QDQBert | ❌ | ❌ | ✅ | ❌ | ❌ | | QDQBert | ❌ | ❌ | ✅ | ❌ | ❌ |
| RAG | ✅ | ❌ | ✅ | ✅ | ❌ | | RAG | ✅ | ❌ | ✅ | ✅ | ❌ |
| Realm | ✅ | ✅ | ✅ | ❌ | ❌ |
| Reformer | ✅ | ✅ | ✅ | ❌ | ❌ | | Reformer | ✅ | ✅ | ✅ | ❌ | ❌ |
| RemBERT | ✅ | ✅ | ✅ | ✅ | ❌ | | RemBERT | ✅ | ✅ | ✅ | ✅ | ❌ |
| RetriBERT | ✅ | ✅ | ✅ | ❌ | ❌ | | RetriBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ | | RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ |
| RoFormer | ✅ | ✅ | ✅ | ✅ | | | RoFormer | ✅ | ✅ | ✅ | ✅ | |
| SegFormer | ❌ | ❌ | ✅ | ❌ | ❌ | | SegFormer | ❌ | ❌ | ✅ | ❌ | ❌ |
| SEW | ❌ | ❌ | ✅ | ❌ | ❌ | | SEW | ❌ | ❌ | ✅ | ❌ | ❌ |
| SEW-D | ❌ | ❌ | ✅ | ❌ | ❌ | | SEW-D | ❌ | ❌ | ✅ | ❌ | ❌ |
| Speech Encoder decoder | ❌ | ❌ | ✅ | ❌ | | | Speech Encoder decoder | ❌ | ❌ | ✅ | ❌ | |
| Speech2Text | ✅ | ❌ | ✅ | | ❌ | | Speech2Text | ✅ | ❌ | ✅ | | ❌ |
| Speech2Text2 | ✅ | ❌ | ❌ | ❌ | ❌ | | Speech2Text2 | ✅ | ❌ | ❌ | ❌ | ❌ |
| Splinter | ✅ | ✅ | ✅ | ❌ | ❌ | | Splinter | ✅ | ✅ | ✅ | ❌ | ❌ |
| SqueezeBERT | ✅ | ✅ | ✅ | ❌ | ❌ | | SqueezeBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
| Swin | ❌ | ❌ | ✅ | ❌ | ❌ |
| T5 | ✅ | ✅ | ✅ | ✅ | ✅ | | T5 | ✅ | ✅ | ✅ | ✅ | ✅ |
| TAPAS | ✅ | ❌ | ✅ | ✅ | ❌ | | TAPAS | ✅ | ❌ | ✅ | ✅ | ❌ |
| Transformer-XL | ✅ | ❌ | ✅ | ✅ | ❌ | | Transformer-XL | ✅ | ❌ | ✅ | ✅ | ❌ |
| TrOCR | ❌ | ❌ | ✅ | ❌ | ❌ | | TrOCR | ❌ | ❌ | ✅ | ❌ | ❌ |
| UniSpeech | ❌ | ❌ | ✅ | ❌ | ❌ | | UniSpeech | ❌ | ❌ | ✅ | ❌ | ❌ |
| UniSpeechSat | ❌ | ❌ | ✅ | ❌ | ❌ | | UniSpeechSat | ❌ | ❌ | ✅ | ❌ | ❌ |
| Vision Encoder decoder | ❌ | ❌ | ✅ | ❌ | | | ViLT | ❌ | ❌ | ✅ | ❌ | |
| Vision Encoder decoder | ❌ | ❌ | ✅ | ✅ | ✅ |
| VisionTextDualEncoder | ❌ | ❌ | ✅ | ❌ | ✅ | | VisionTextDualEncoder | ❌ | ❌ | ✅ | ❌ | ✅ |
| VisualBert | ❌ | ❌ | ✅ | ❌ | ❌ | | VisualBert | ❌ | ❌ | ✅ | ❌ | ❌ |
| ViT | ❌ | ❌ | ✅ | ✅ | ✅ | | ViT | ❌ | ❌ | ✅ | ✅ | ✅ |
| ViTMAE | ❌ | ❌ | ✅ | ❌ | ❌ |
| Wav2Vec2 | ✅ | ❌ | ✅ | ✅ | ✅ | | Wav2Vec2 | ✅ | ❌ | ✅ | ✅ | ✅ |
| WavLM | ❌ | ❌ | ✅ | ❌ | ❌ |
| XGLM | ✅ | ✅ | ✅ | ❌ | ✅ |
| XLM | ✅ | ❌ | ✅ | ✅ | ❌ | | XLM | ✅ | ❌ | ✅ | ✅ | ❌ |
| XLM-RoBERTa | ✅ | ✅ | ✅ | ✅ | | | XLM-RoBERTa | ✅ | ✅ | ✅ | ✅ | |
| XLM-RoBERTa-XL | ❌ | ❌ | ✅ | ❌ | ❌ |
| XLMProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ | | XLMProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ |
| XLNet | ✅ | ✅ | ✅ | ✅ | ❌ | | XLNet | ✅ | ✅ | ✅ | ✅ | ❌ |
| YOSO | ❌ | ❌ | ✅ | ❌ | ❌ |
<!-- End table--> <!-- End table-->

View File

@ -1,198 +0,0 @@
<!---
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Installation
🤗 Transformers is tested on Python 3.6+, and PyTorch 1.1.0+ or TensorFlow 2.0+.
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're
unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). Create a virtual environment with the version of Python you're going
to use and activate it.
Now, if you want to use 🤗 Transformers, you can install it with pip. If you'd like to play with the examples, you
must install it from source.
## Installation with pip
First you need to install one of, or both, TensorFlow 2.0 and PyTorch.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available),
[PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) and/or
[Flax installation page](https://github.com/google/flax#quick-install)
regarding the specific install command for your platform.
When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:
```bash
pip install transformers
```
Alternatively, for CPU-support only, you can install 🤗 Transformers and PyTorch in one line with:
```bash
pip install transformers[torch]
```
or 🤗 Transformers and TensorFlow 2.0 in one line with:
```bash
pip install transformers[tf-cpu]
```
or 🤗 Transformers and Flax in one line with:
```bash
pip install transformers[flax]
```
To check 🤗 Transformers is properly installed, run the following command:
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))"
```
It should download a pretrained model then print something like
```bash
[{'label': 'POSITIVE', 'score': 0.9998704791069031}]
```
(Note that TensorFlow will print additional stuff before that last statement.)
## Installing from source
Here is how to quickly install `transformers` from source:
```bash
pip install git+https://github.com/huggingface/transformers
```
Note that this will install not the latest released version, but the bleeding edge `master` version, which you may want to use in case a bug has been fixed since the last official release and a new release hasn't been yet rolled out.
While we strive to keep `master` operational at all times, if you notice some issues, they usually get fixed within a few hours or a day and you're more than welcome to help us detect any problems by opening an [Issue](https://github.com/huggingface/transformers/issues) and this way, things will get fixed even sooner.
Again, you can run:
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I hate you'))"
```
to check 🤗 Transformers is properly installed.
## Editable install
If you want to constantly use the bleeding edge `master` version of the source code, or if you want to contribute to the library and need to test the changes in the code you're making, you will need an editable install. This is done by cloning the repository and installing with the following commands:
``` bash
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install -e .
```
This command performs a magical link between the folder you cloned the repository to and your python library paths, and it'll look inside this folder in addition to the normal library-wide paths. So if normally your python packages get installed into:
```
~/anaconda3/envs/main/lib/python3.7/site-packages/
```
now this editable install will reside where you clone the folder to, e.g. `~/transformers/` and python will search it too.
Do note that you have to keep that `transformers` folder around and not delete it to continue using the `transformers` library.
Now, let's get to the real benefit of this installation approach. Say, you saw some new feature has been just committed into `master`. If you have already performed all the steps above, to update your transformers to include all the latest commits, all you need to do is to `cd` into that cloned repository folder and update the clone to the latest version:
```
cd ~/transformers/
git pull
```
There is nothing else to do. Your python environment will find the bleeding edge version of `transformers` on the next run.
## With conda
Since Transformers version v4.0.0, we now have a conda channel: `huggingface`.
🤗 Transformers can be installed using conda as follows:
```
conda install -c huggingface transformers
```
Follow the installation pages of TensorFlow, PyTorch or Flax to see how to install them with conda.
## Caching models
This library provides pretrained models that will be downloaded and cached locally. Unless you specify a location with
`cache_dir=...` when you use methods like `from_pretrained`, these models will automatically be downloaded in the
folder given by the shell environment variable ``TRANSFORMERS_CACHE``. The default value for it will be the Hugging
Face cache home followed by ``/transformers/``. This is (by order of priority):
* shell environment variable ``HF_HOME``
* shell environment variable ``XDG_CACHE_HOME`` + ``/huggingface/``
* default: ``~/.cache/huggingface/``
So if you don't have any specific environment variable set, the cache directory will be at
``~/.cache/huggingface/transformers/``.
**Note:** If you have set a shell environment variable for one of the predecessors of this library
(``PYTORCH_TRANSFORMERS_CACHE`` or ``PYTORCH_PRETRAINED_BERT_CACHE``), those will be used if there is no shell
environment variable for ``TRANSFORMERS_CACHE``.
### Offline mode
It's possible to run 🤗 Transformers in a firewalled or a no-network environment.
Setting environment variable `TRANSFORMERS_OFFLINE=1` will tell 🤗 Transformers to use local files only and will not try to look things up.
Most likely you may want to couple this with `HF_DATASETS_OFFLINE=1` that performs the same for 🤗 Datasets if you're using the latter.
Here is an example of how this can be used on a filesystem that is shared between a normally networked and a firewalled to the external world instances.
On the instance with the normal network run your program which will download and cache models (and optionally datasets if you use 🤗 Datasets). For example:
```
python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
```
and then with the same filesystem you can now run the same program on a firewalled instance:
```
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
```
and it should succeed without any hanging waiting to timeout.
#### Fetching models and tokenizers to use offline
When running a script the first time like mentioned above, the downloaded files will be cached for future reuse.
However, it is also possible to download files and point to their local path instead.
Downloading files can be done through the Web Interface by clicking on the "Download" button, but it can also be handled
programmatically using the `huggingface_hub` library that is a dependency to `transformers`:
- Using `snapshot_download` to download an entire repository
- Using `hf_hub_download` to download a specific file
See the reference for these methods in the huggingface_hub
[documentation](https://github.com/huggingface/huggingface_hub/tree/main/src/huggingface_hub).
## Do you want to run a Transformer model on a mobile device?
You should check out our [swift-coreml-transformers](https://github.com/huggingface/swift-coreml-transformers) repo.
It contains a set of tools to convert PyTorch or TensorFlow 2.0 trained Transformer models (currently contains `GPT-2`,
`DistilGPT-2`, `BERT`, and `DistilBERT`) to CoreML models that run on iOS devices.
At some point in the future, you'll be able to seamlessly move from pretraining or fine-tuning models in PyTorch or
TensorFlow 2.0 to productizing them in CoreML, or prototype a model or an app in CoreML then research its
hyperparameters or architecture from PyTorch or TensorFlow 2.0. Super exciting!

View File

@ -0,0 +1,235 @@
<!---
Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Installation
Install 🤗 Transformers for whichever deep learning library you're working with, setup your cache, and optionally configure 🤗 Transformers to run offline.
🤗 Transformers is tested on Python 3.6+, PyTorch 1.1.0+, TensorFlow 2.0+, and Flax. Follow the installation instructions below for the deep learning library you are using:
* [PyTorch](https://pytorch.org/get-started/locally/) installation instructions.
* [TensorFlow 2.0](https://www.tensorflow.org/install/pip) installation instructions.
* [Flax](https://flax.readthedocs.io/en/latest/) installation instructions.
## Install with pip
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, take a look at this [guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). A virtual environment makes it easier to manage different projects, and avoid compatibility issues between dependencies.
Start by creating a virtual environment in your project directory:
```bash
python -m venv .env
```
Activate the virtual environment:
```bash
source .env/bin/activate
```
Now you're ready to install 🤗 Transformers with the following command:
```bash
pip install transformers
```
For CPU-support only, you can conveniently install 🤗 Transformers and a deep learning library in one line. For example, install 🤗 Transformers and PyTorch with:
```bash
pip install transformers[torch]
```
🤗 Transformers and TensorFlow 2.0:
```bash
pip install transformers[tf-cpu]
```
🤗 Transformers and Flax:
```bash
pip install transformers[flax]
```
Finally, check if 🤗 Transformers has been properly installed by running the following command. It will download a pretrained model:
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))"
```
Then print out the label and score:
```bash
[{'label': 'POSITIVE', 'score': 0.9998704791069031}]
```
## Install from source
Install 🤗 Transformers from source with the following command:
```bash
pip install git+https://github.com/huggingface/transformers
```
This command installs the bleeding edge `master` version rather than the latest `stable` version. The `master` version is useful for staying up-to-date with the latest developments. For instance, if a bug has been fixed since the last official release but a new release hasn't been rolled out yet. However, this means the `master` version may not always be stable. We strive to keep the `master` version operational, and most issues are usually resolved within a few hours or a day. If you run into a problem, please open an [Issue](https://github.com/huggingface/transformers/issues) so we can fix it even sooner!
Check if 🤗 Transformers has been properly installed by running the following command:
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I love you'))"
```
## Editable install
You will need an editable install if you'd like to:
* Use the `master` version of the source code.
* Contribute to 🤗 Transformers and need to test changes in the code.
Clone the repository and install 🤗 Transformers with the following commands:
```bash
git clone https://github.com/huggingface/transformers.git
cd transformers
pip install -e .
```
These commands will link the folder you cloned the repository to and your Python library paths. Python will now look inside the folder you cloned to in addition to the normal library paths. For example, if your Python packages are typically installed in `~/anaconda3/envs/main/lib/python3.7/site-packages/`, Python will also search the folder you cloned to: `~/transformers/`.
<Tip warning={true}>
You must keep the `transformers` folder if you want to keep using the library.
</Tip>
Now you can easily update your clone to the latest version of 🤗 Transformers with the following command:
```bash
cd ~/transformers/
git pull
```
Your Python environment will find the `master` version of 🤗 Transformers on the next run.
## Install with conda
Install from the conda channel `huggingface`:
```bash
conda install -c huggingface transformers
```
## Cache setup
Pretrained models are downloaded and locally cached at: `~/.cache/huggingface/transformers/`. This is the default directory given by the shell environment variable `TRANSFORMERS_CACHE`. On Windows, the default directory is given by `C:\Users\username\.cache\huggingface\transformers`. You can change the shell environment variables shown below - in order of priority - to specify a different cache directory:
1. Shell environment variable (default): `TRANSFORMERS_CACHE`.
2. Shell environment variable: `HF_HOME` + `transformers/`.
3. Shell environment variable: `XDG_CACHE_HOME` + `/huggingface/transformers`.
<Tip>
🤗 Transformers will use the shell environment variables `PYTORCH_TRANSFORMERS_CACHE` or `PYTORCH_PRETRAINED_BERT_CACHE` if you are coming from an earlier iteration of this library and have set those environment variables, unless you specify the shell environment variable `TRANSFORMERS_CACHE`.
</Tip>
## Offline mode
🤗 Transformers is able to run in a firewalled or offline environment by only using local files. Set the environment variable `TRANSFORMERS_OFFLINE=1` to enable this behavior.
<Tip>
Add [🤗 Datasets](https://huggingface.co/docs/datasets/) to your offline training workflow by setting the environment variable `HF_DATASETS_OFFLINE=1`.
</Tip>
For example, you would typically run a program on a normal network firewalled to external instances with the following command:
```bash
python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
```
Run this same program in an offline instance with:
```bash
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 \
python examples/pytorch/translation/run_translation.py --model_name_or_path t5-small --dataset_name wmt16 --dataset_config ro-en ...
```
The script should now run without hanging or waiting to timeout because it knows it should only look for local files.
### Fetch models and tokenizers to use offline
Another option for using 🤗 Transformers offline is to download the files ahead of time, and then point to their local path when you need to use them offline. There are three ways to do this:
* Download a file through the user interface on the [Model Hub](https://huggingface.co/models) by clicking on the ↓ icon.
![download-icon](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/download-icon.png)
* Use the [`PreTrainedModel.from_pretrained`] and [`PreTrainedModel.save_pretrained`] workflow:
1. Download your files ahead of time with [`PreTrainedModel.from_pretrained`]:
```py
>>> from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
>>> tokenizer = AutoTokenizer.from_pretrained("bigscience/T0_3B")
>>> model = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0_3B")
```
2. Save your files to a specified directory with [`PreTrainedModel.save_pretrained`]:
```py
>>> tokenizer.save_pretrained("./your/path/bigscience_t0")
>>> model.save_pretrained("./your/path/bigscience_t0")
```
3. Now when you're offline, reload your files with [`PreTrainedModel.from_pretrained`] from the specified directory:
```py
>>> tokenizer = AutoTokenizer.from_pretrained("./your/path/bigscience_t0")
>>> model = AutoModel.from_pretrained("./your/path/bigscience_t0")
```
* Programmatically download files with the [huggingface_hub](https://github.com/huggingface/huggingface_hub/tree/main/src/huggingface_hub) library:
1. Install the `huggingface_hub` library in your virtual environment:
```bash
python -m pip install huggingface_hub
```
2. Use the [`hf_hub_download`](https://huggingface.co/docs/hub/adding-a-library#download-files-from-the-hub) function to download a file to a specific path. For example, the following command downloads the `config.json` file from the [T0](https://huggingface.co/bigscience/T0_3B) model to your desired path:
```py
>>> from huggingface_hub import hf_hub_download
>>> hf_hub_download(repo_id="bigscience/T0_3B", filename="config.json", cache_dir="./your/path/bigscience_t0")
```
Once your file is downloaded and locally cached, specify it's local path to load and use it:
```py
>>> from transformers import AutoConfig
>>> config = AutoConfig.from_pretrained("./your/path/bigscience_t0/config.json")
```
<Tip>
See the [How to download files from the Hub](https://huggingface.co/docs/hub/how-to-downstream) section for more details on downloading files stored on the Hub.
</Tip>

View File

@ -0,0 +1,46 @@
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# General Utilities
This page lists all of Transformers general utility functions that are found in the file `file_utils.py`.
Most of those are only useful if you are studying the general code in the library.
## Enums and namedtuples
[[autodoc]] file_utils.ExplicitEnum
[[autodoc]] file_utils.PaddingStrategy
[[autodoc]] file_utils.TensorType
## Special Decorators
[[autodoc]] file_utils.add_start_docstrings
[[autodoc]] file_utils.add_start_docstrings_to_model_forward
[[autodoc]] file_utils.add_end_docstrings
[[autodoc]] file_utils.add_code_sample_docstrings
[[autodoc]] file_utils.replace_return_docstrings
## Special Properties
[[autodoc]] file_utils.cached_property
## Other Utilities
[[autodoc]] file_utils._LazyModule

View File

@ -1,54 +0,0 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
General Utilities
-----------------------------------------------------------------------------------------------------------------------
This page lists all of Transformers general utility functions that are found in the file ``file_utils.py``.
Most of those are only useful if you are studying the general code in the library.
Enums and namedtuples
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.file_utils.ExplicitEnum
.. autoclass:: transformers.file_utils.PaddingStrategy
.. autoclass:: transformers.file_utils.TensorType
Special Decorators
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.file_utils.add_start_docstrings
.. autofunction:: transformers.file_utils.add_start_docstrings_to_model_forward
.. autofunction:: transformers.file_utils.add_end_docstrings
.. autofunction:: transformers.file_utils.add_code_sample_docstrings
.. autofunction:: transformers.file_utils.replace_return_docstrings
Special Properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.file_utils.cached_property
Other Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.file_utils._LazyModule

View File

@ -0,0 +1,254 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Utilities for Generation
This page lists all the utility functions used by [`~generation_utils.GenerationMixin.generate`],
[`~generation_utils.GenerationMixin.greedy_search`],
[`~generation_utils.GenerationMixin.sample`],
[`~generation_utils.GenerationMixin.beam_search`],
[`~generation_utils.GenerationMixin.beam_sample`],
[`~generation_utils.GenerationMixin.group_beam_search`], and
[`~generation_utils.GenerationMixin.constrained_beam_search`].
Most of those are only useful if you are studying the code of the generate methods in the library.
## Generate Outputs
The output of [`~generation_utils.GenerationMixin.generate`] is an instance of a subclass of
[`~file_utils.ModelOutput`]. This output is a data structure containing all the information returned
by [`~generation_utils.GenerationMixin.generate`], but that can also be used as tuple or dictionary.
Here's an example:
```python
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
```
The `generation_output` object is a [`~generation_utils.GreedySearchDecoderOnlyOutput`], as we can
see in the documentation of that class below, it means it has the following attributes:
- `sequences`: the generated sequences of tokens
- `scores` (optional): the prediction scores of the language modelling head, for each generation step
- `hidden_states` (optional): the hidden states of the model, for each generation step
- `attentions` (optional): the attention weights of the model, for each generation step
Here we have the `scores` since we passed along `output_scores=True`, but we don't have `hidden_states` and
`attentions` because we didn't pass `output_hidden_states=True` or `output_attentions=True`.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
will get `None`. Here for instance `generation_output.scores` are all the generated prediction scores of the
language modeling head, and `generation_output.attentions` is `None`.
When using our `generation_output` object as a tuple, it only keeps the attributes that don't have `None` values.
Here, for instance, it has two elements, `loss` then `logits`, so
```python
generation_output[:2]
```
will return the tuple `(generation_output.sequences, generation_output.scores)` for instance.
When using our `generation_output` object as a dictionary, it only keeps the attributes that don't have `None`
values. Here, for instance, it has two keys that are `sequences` and `scores`.
We document here all output types.
### GreedySearchOutput
[[autodoc]] generation_utils.GreedySearchDecoderOnlyOutput
[[autodoc]] generation_utils.GreedySearchEncoderDecoderOutput
[[autodoc]] generation_flax_utils.FlaxGreedySearchOutput
### SampleOutput
[[autodoc]] generation_utils.SampleDecoderOnlyOutput
[[autodoc]] generation_utils.SampleEncoderDecoderOutput
[[autodoc]] generation_flax_utils.FlaxSampleOutput
### BeamSearchOutput
[[autodoc]] generation_utils.BeamSearchDecoderOnlyOutput
[[autodoc]] generation_utils.BeamSearchEncoderDecoderOutput
### BeamSampleOutput
[[autodoc]] generation_utils.BeamSampleDecoderOnlyOutput
[[autodoc]] generation_utils.BeamSampleEncoderDecoderOutput
## LogitsProcessor
A [`LogitsProcessor`] can be used to modify the prediction scores of a language model head for
generation.
[[autodoc]] LogitsProcessor
- __call__
[[autodoc]] LogitsProcessorList
- __call__
[[autodoc]] LogitsWarper
- __call__
[[autodoc]] MinLengthLogitsProcessor
- __call__
[[autodoc]] TemperatureLogitsWarper
- __call__
[[autodoc]] RepetitionPenaltyLogitsProcessor
- __call__
[[autodoc]] TopPLogitsWarper
- __call__
[[autodoc]] TopKLogitsWarper
- __call__
[[autodoc]] NoRepeatNGramLogitsProcessor
- __call__
[[autodoc]] NoBadWordsLogitsProcessor
- __call__
[[autodoc]] PrefixConstrainedLogitsProcessor
- __call__
[[autodoc]] HammingDiversityLogitsProcessor
- __call__
[[autodoc]] ForcedBOSTokenLogitsProcessor
- __call__
[[autodoc]] ForcedEOSTokenLogitsProcessor
- __call__
[[autodoc]] InfNanRemoveLogitsProcessor
- __call__
[[autodoc]] TFLogitsProcessor
- __call__
[[autodoc]] TFLogitsProcessorList
- __call__
[[autodoc]] TFLogitsWarper
- __call__
[[autodoc]] TFTemperatureLogitsWarper
- __call__
[[autodoc]] TFTopPLogitsWarper
- __call__
[[autodoc]] TFTopKLogitsWarper
- __call__
[[autodoc]] TFMinLengthLogitsProcessor
- __call__
[[autodoc]] TFNoBadWordsLogitsProcessor
- __call__
[[autodoc]] TFNoRepeatNGramLogitsProcessor
- __call__
[[autodoc]] TFRepetitionPenaltyLogitsProcessor
- __call__
[[autodoc]] FlaxLogitsProcessor
- __call__
[[autodoc]] FlaxLogitsProcessorList
- __call__
[[autodoc]] FlaxLogitsWarper
- __call__
[[autodoc]] FlaxTemperatureLogitsWarper
- __call__
[[autodoc]] FlaxTopPLogitsWarper
- __call__
[[autodoc]] FlaxTopKLogitsWarper
- __call__
[[autodoc]] FlaxForcedBOSTokenLogitsProcessor
- __call__
[[autodoc]] FlaxForcedEOSTokenLogitsProcessor
- __call__
[[autodoc]] FlaxMinLengthLogitsProcessor
- __call__
## StoppingCriteria
A [`StoppingCriteria`] can be used to change when to stop generation (other than EOS token).
[[autodoc]] StoppingCriteria
- __call__
[[autodoc]] StoppingCriteriaList
- __call__
[[autodoc]] MaxLengthCriteria
- __call__
[[autodoc]] MaxTimeCriteria
- __call__
## Constraints
A [`Constraint`] can be used to force the generation to include specific tokens or sequences in the output.
[[autodoc]] Constraint
[[autodoc]] PhrasalConstraint
[[autodoc]] DisjunctiveConstraint
[[autodoc]] ConstraintListState
## BeamSearch
[[autodoc]] BeamScorer
- process
- finalize
[[autodoc]] BeamSearchScorer
- process
- finalize
[[autodoc]] ConstrainedBeamSearchScorer
- process
- finalize
## Utilities
[[autodoc]] top_k_top_p_filtering
[[autodoc]] tf_top_k_top_p_filtering

View File

@ -1,230 +0,0 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Utilities for Generation
-----------------------------------------------------------------------------------------------------------------------
This page lists all the utility functions used by :meth:`~transformers.generation_utils.GenerationMixin.generate`,
:meth:`~transformers.generation_utils.GenerationMixin.greedy_search`,
:meth:`~transformers.generation_utils.GenerationMixin.sample`,
:meth:`~transformers.generation_utils.GenerationMixin.beam_search`,
:meth:`~transformers.generation_utils.GenerationMixin.beam_sample`, and
:meth:`~transformers.generation_utils.GenerationMixin.group_beam_search`.
Most of those are only useful if you are studying the code of the generate methods in the library.
Generate Outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The output of :meth:`~transformers.generation_utils.GenerationMixin.generate` is an instance of a subclass of
:class:`~transformers.file_utils.ModelOutput`. This output is a data structure containing all the information returned
by :meth:`~transformers.generation_utils.GenerationMixin.generate`, but that can also be used as tuple or dictionary.
Here's an example:
.. code-block::
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
The ``generation_output`` object is a :class:`~transformers.generation_utils.GreedySearchDecoderOnlyOutput`, as we can
see in the documentation of that class below, it means it has the following attributes:
- ``sequences``: the generated sequences of tokens
- ``scores`` (optional): the prediction scores of the language modelling head, for each generation step
- ``hidden_states`` (optional): the hidden states of the model, for each generation step
- ``attentions`` (optional): the attention weights of the model, for each generation step
Here we have the ``scores`` since we passed along ``output_scores=True``, but we don't have ``hidden_states`` and
``attentions`` because we didn't pass ``output_hidden_states=True`` or ``output_attentions=True``.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
will get ``None``. Here for instance ``generation_output.scores`` are all the generated prediction scores of the
language modeling head, and ``generation_output.attentions`` is ``None``.
When using our ``generation_output`` object as a tuple, it only keeps the attributes that don't have ``None`` values.
Here, for instance, it has two elements, ``loss`` then ``logits``, so
.. code-block::
generation_output[:2]
will return the tuple ``(generation_output.sequences, generation_output.scores)`` for instance.
When using our ``generation_output`` object as a dictionary, it only keeps the attributes that don't have ``None``
values. Here, for instance, it has two keys that are ``sequences`` and ``scores``.
We document here all output types.
GreedySearchOutput
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.generation_utils.GreedySearchDecoderOnlyOutput
:members:
.. autoclass:: transformers.generation_utils.GreedySearchEncoderDecoderOutput
:members:
.. autoclass:: transformers.generation_flax_utils.FlaxGreedySearchOutput
:members:
SampleOutput
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.generation_utils.SampleDecoderOnlyOutput
:members:
.. autoclass:: transformers.generation_utils.SampleEncoderDecoderOutput
:members:
.. autoclass:: transformers.generation_flax_utils.FlaxSampleOutput
:members:
BeamSearchOutput
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.generation_utils.BeamSearchDecoderOnlyOutput
:members:
.. autoclass:: transformers.generation_utils.BeamSearchEncoderDecoderOutput
:members:
BeamSampleOutput
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.generation_utils.BeamSampleDecoderOnlyOutput
:members:
.. autoclass:: transformers.generation_utils.BeamSampleEncoderDecoderOutput
:members:
LogitsProcessor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A :class:`~transformers.LogitsProcessor` can be used to modify the prediction scores of a language model head for
generation.
.. autoclass:: transformers.LogitsProcessor
:members: __call__
.. autoclass:: transformers.LogitsProcessorList
:members: __call__
.. autoclass:: transformers.LogitsWarper
:members: __call__
.. autoclass:: transformers.MinLengthLogitsProcessor
:members: __call__
.. autoclass:: transformers.TemperatureLogitsWarper
:members: __call__
.. autoclass:: transformers.RepetitionPenaltyLogitsProcessor
:members: __call__
.. autoclass:: transformers.TopPLogitsWarper
:members: __call__
.. autoclass:: transformers.TopKLogitsWarper
:members: __call__
.. autoclass:: transformers.NoRepeatNGramLogitsProcessor
:members: __call__
.. autoclass:: transformers.NoBadWordsLogitsProcessor
:members: __call__
.. autoclass:: transformers.PrefixConstrainedLogitsProcessor
:members: __call__
.. autoclass:: transformers.HammingDiversityLogitsProcessor
:members: __call__
.. autoclass:: transformers.ForcedBOSTokenLogitsProcessor
:members: __call__
.. autoclass:: transformers.ForcedEOSTokenLogitsProcessor
:members: __call__
.. autoclass:: transformers.InfNanRemoveLogitsProcessor
:members: __call__
.. autoclass:: transformers.FlaxLogitsProcessor
:members: __call__
.. autoclass:: transformers.FlaxLogitsProcessorList
:members: __call__
.. autoclass:: transformers.FlaxLogitsWarper
:members: __call__
.. autoclass:: transformers.FlaxTemperatureLogitsWarper
:members: __call__
.. autoclass:: transformers.FlaxTopPLogitsWarper
:members: __call__
.. autoclass:: transformers.FlaxTopKLogitsWarper
:members: __call__
.. autoclass:: transformers.FlaxForcedBOSTokenLogitsProcessor
:members: __call__
.. autoclass:: transformers.FlaxForcedEOSTokenLogitsProcessor
:members: __call__
.. autoclass:: transformers.FlaxMinLengthLogitsProcessor
:members: __call__
StoppingCriteria
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A :class:`~transformers.StoppingCriteria` can be used to change when to stop generation (other than EOS token).
.. autoclass:: transformers.StoppingCriteria
:members: __call__
.. autoclass:: transformers.StoppingCriteriaList
:members: __call__
.. autoclass:: transformers.MaxLengthCriteria
:members: __call__
.. autoclass:: transformers.MaxTimeCriteria
:members: __call__
BeamSearch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BeamScorer
:members: process, finalize
.. autoclass:: transformers.BeamSearchScorer
:members: process, finalize
Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.top_k_top_p_filtering
.. autofunction:: transformers.tf_top_k_top_p_filtering

View File

@ -0,0 +1,82 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Custom Layers and Utilities
This page lists all the custom layers used by the library, as well as the utility functions it provides for modeling.
Most of those are only useful if you are studying the code of the models in the library.
## Pytorch custom modules
[[autodoc]] modeling_utils.Conv1D
[[autodoc]] modeling_utils.PoolerStartLogits
- forward
[[autodoc]] modeling_utils.PoolerEndLogits
- forward
[[autodoc]] modeling_utils.PoolerAnswerClass
- forward
[[autodoc]] modeling_utils.SquadHeadOutput
[[autodoc]] modeling_utils.SQuADHead
- forward
[[autodoc]] modeling_utils.SequenceSummary
- forward
## PyTorch Helper Functions
[[autodoc]] apply_chunking_to_forward
[[autodoc]] modeling_utils.find_pruneable_heads_and_indices
[[autodoc]] modeling_utils.prune_layer
[[autodoc]] modeling_utils.prune_conv1d_layer
[[autodoc]] modeling_utils.prune_linear_layer
## TensorFlow custom layers
[[autodoc]] modeling_tf_utils.TFConv1D
[[autodoc]] modeling_tf_utils.TFSharedEmbeddings
- call
[[autodoc]] modeling_tf_utils.TFSequenceSummary
## TensorFlow loss functions
[[autodoc]] modeling_tf_utils.TFCausalLanguageModelingLoss
[[autodoc]] modeling_tf_utils.TFMaskedLanguageModelingLoss
[[autodoc]] modeling_tf_utils.TFMultipleChoiceLoss
[[autodoc]] modeling_tf_utils.TFQuestionAnsweringLoss
[[autodoc]] modeling_tf_utils.TFSequenceClassificationLoss
[[autodoc]] modeling_tf_utils.TFTokenClassificationLoss
## TensorFlow Helper Functions
[[autodoc]] modeling_tf_utils.get_initializer
[[autodoc]] modeling_tf_utils.keras_serializable
[[autodoc]] modeling_tf_utils.shape_list

View File

@ -1,97 +0,0 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Custom Layers and Utilities
-----------------------------------------------------------------------------------------------------------------------
This page lists all the custom layers used by the library, as well as the utility functions it provides for modeling.
Most of those are only useful if you are studying the code of the models in the library.
Pytorch custom modules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_utils.Conv1D
.. autoclass:: transformers.modeling_utils.PoolerStartLogits
:members: forward
.. autoclass:: transformers.modeling_utils.PoolerEndLogits
:members: forward
.. autoclass:: transformers.modeling_utils.PoolerAnswerClass
:members: forward
.. autoclass:: transformers.modeling_utils.SquadHeadOutput
.. autoclass:: transformers.modeling_utils.SQuADHead
:members: forward
.. autoclass:: transformers.modeling_utils.SequenceSummary
:members: forward
PyTorch Helper Functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.apply_chunking_to_forward
.. autofunction:: transformers.modeling_utils.find_pruneable_heads_and_indices
.. autofunction:: transformers.modeling_utils.prune_layer
.. autofunction:: transformers.modeling_utils.prune_conv1d_layer
.. autofunction:: transformers.modeling_utils.prune_linear_layer
TensorFlow custom layers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_utils.TFConv1D
.. autoclass:: transformers.modeling_tf_utils.TFSharedEmbeddings
:members: call
.. autoclass:: transformers.modeling_tf_utils.TFSequenceSummary
TensorFlow loss functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_utils.TFCausalLanguageModelingLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFMaskedLanguageModelingLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFMultipleChoiceLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFQuestionAnsweringLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFSequenceClassificationLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFTokenClassificationLoss
:members:
TensorFlow Helper Functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.modeling_tf_utils.get_initializer
.. autofunction:: transformers.modeling_tf_utils.keras_serializable
.. autofunction:: transformers.modeling_tf_utils.shape_list

View File

@ -0,0 +1,40 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Utilities for pipelines
This page lists all the utility functions the library provides for pipelines.
Most of those are only useful if you are studying the code of the models in the library.
## Argument handling
[[autodoc]] pipelines.ArgumentHandler
[[autodoc]] pipelines.ZeroShotClassificationArgumentHandler
[[autodoc]] pipelines.QuestionAnsweringArgumentHandler
## Data format
[[autodoc]] pipelines.PipelineDataFormat
[[autodoc]] pipelines.CsvPipelineDataFormat
[[autodoc]] pipelines.JsonPipelineDataFormat
[[autodoc]] pipelines.PipedPipelineDataFormat
## Utilities
[[autodoc]] pipelines.PipelineException

View File

@ -1,50 +0,0 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Utilities for pipelines
-----------------------------------------------------------------------------------------------------------------------
This page lists all the utility functions the library provides for pipelines.
Most of those are only useful if you are studying the code of the models in the library.
Argument handling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.pipelines.ArgumentHandler
.. autoclass:: transformers.pipelines.ZeroShotClassificationArgumentHandler
.. autoclass:: transformers.pipelines.QuestionAnsweringArgumentHandler
Data format
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.pipelines.PipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.CsvPipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.JsonPipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.PipedPipelineDataFormat
:members:
Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.pipelines.PipelineException

View File

@ -0,0 +1,38 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Utilities for Tokenizers
This page lists all the utility functions used by the tokenizers, mainly the class
[`~tokenization_utils_base.PreTrainedTokenizerBase`] that implements the common methods between
[`PreTrainedTokenizer`] and [`PreTrainedTokenizerFast`] and the mixin
[`~tokenization_utils_base.SpecialTokensMixin`].
Most of those are only useful if you are studying the code of the tokenizers in the library.
## PreTrainedTokenizerBase
[[autodoc]] tokenization_utils_base.PreTrainedTokenizerBase
- __call__
- all
## SpecialTokensMixin
[[autodoc]] tokenization_utils_base.SpecialTokensMixin
## Enums and namedtuples
[[autodoc]] tokenization_utils_base.TruncationStrategy
[[autodoc]] tokenization_utils_base.CharSpan
[[autodoc]] tokenization_utils_base.TokenSpan

View File

@ -1,45 +0,0 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Utilities for Tokenizers
-----------------------------------------------------------------------------------------------------------------------
This page lists all the utility functions used by the tokenizers, mainly the class
:class:`~transformers.tokenization_utils_base.PreTrainedTokenizerBase` that implements the common methods between
:class:`~transformers.PreTrainedTokenizer` and :class:`~transformers.PreTrainedTokenizerFast` and the mixin
:class:`~transformers.tokenization_utils_base.SpecialTokensMixin`.
Most of those are only useful if you are studying the code of the tokenizers in the library.
PreTrainedTokenizerBase
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.tokenization_utils_base.PreTrainedTokenizerBase
:special-members: __call__
:members:
SpecialTokensMixin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.tokenization_utils_base.SpecialTokensMixin
:members:
Enums and namedtuples
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.tokenization_utils_base.TruncationStrategy
.. autoclass:: transformers.tokenization_utils_base.CharSpan
.. autoclass:: transformers.tokenization_utils_base.TokenSpan

View File

@ -0,0 +1,43 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Utilities for Trainer
This page lists all the utility functions used by [`Trainer`].
Most of those are only useful if you are studying the code of the Trainer in the library.
## Utilities
[[autodoc]] EvalPrediction
[[autodoc]] IntervalStrategy
[[autodoc]] set_seed
[[autodoc]] torch_distributed_zero_first
## Callbacks internals
[[autodoc]] trainer_callback.CallbackHandler
## Distributed Evaluation
[[autodoc]] trainer_pt_utils.DistributedTensorGatherer
## Distributed Evaluation
[[autodoc]] HfArgumentParser
## Debug Utilities
[[autodoc]] debug_utils.DebugUnderflowOverflow

View File

@ -1,54 +0,0 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Utilities for Trainer
-----------------------------------------------------------------------------------------------------------------------
This page lists all the utility functions used by :class:`~transformers.Trainer`.
Most of those are only useful if you are studying the code of the Trainer in the library.
Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.EvalPrediction
.. autoclass:: transformers.IntervalStrategy
.. autofunction:: transformers.set_seed
.. autofunction:: transformers.torch_distributed_zero_first
Callbacks internals
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.trainer_callback.CallbackHandler
Distributed Evaluation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.trainer_pt_utils.DistributedTensorGatherer
:members:
Distributed Evaluation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.HfArgumentParser
Debug Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.debug_utils.DebugUnderflowOverflow

View File

@ -0,0 +1,111 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Callbacks
Callbacks are objects that can customize the behavior of the training loop in the PyTorch
[`Trainer`] (this feature is not yet implemented in TensorFlow) that can inspect the training loop
state (for progress reporting, logging on TensorBoard or other ML platforms...) and take decisions (like early
stopping).
Callbacks are "read only" pieces of code, apart from the [`TrainerControl`] object they return, they
cannot change anything in the training loop. For customizations that require changes in the training loop, you should
subclass [`Trainer`] and override the methods you need (see [trainer](trainer) for examples).
By default a [`Trainer`] will use the following callbacks:
- [`DefaultFlowCallback`] which handles the default behavior for logging, saving and evaluation.
- [`PrinterCallback`] or [`ProgressCallback`] to display progress and print the
logs (the first one is used if you deactivate tqdm through the [`TrainingArguments`], otherwise
it's the second one).
- [`~integrations.TensorBoardCallback`] if tensorboard is accessible (either through PyTorch >= 1.4
or tensorboardX).
- [`~integrations.WandbCallback`] if [wandb](https://www.wandb.com/) is installed.
- [`~integrations.CometCallback`] if [comet_ml](https://www.comet.ml/site/) is installed.
- [`~integrations.MLflowCallback`] if [mlflow](https://www.mlflow.org/) is installed.
- [`~integrations.AzureMLCallback`] if [azureml-sdk](https://pypi.org/project/azureml-sdk/) is
installed.
- [`~integrations.CodeCarbonCallback`] if [codecarbon](https://pypi.org/project/codecarbon/) is
installed.
The main class that implements callbacks is [`TrainerCallback`]. It gets the
[`TrainingArguments`] used to instantiate the [`Trainer`], can access that
Trainer's internal state via [`TrainerState`], and can take some actions on the training loop via
[`TrainerControl`].
## Available Callbacks
Here is the list of the available [`TrainerCallback`] in the library:
[[autodoc]] integrations.CometCallback
- setup
[[autodoc]] DefaultFlowCallback
[[autodoc]] PrinterCallback
[[autodoc]] ProgressCallback
[[autodoc]] EarlyStoppingCallback
[[autodoc]] integrations.TensorBoardCallback
[[autodoc]] integrations.WandbCallback
- setup
[[autodoc]] integrations.MLflowCallback
- setup
[[autodoc]] integrations.AzureMLCallback
[[autodoc]] integrations.CodeCarbonCallback
## TrainerCallback
[[autodoc]] TrainerCallback
Here is an example of how to register a custom callback with the PyTorch [`Trainer`]:
```python
class MyCallback(TrainerCallback):
"A callback that prints a message at the beginning of training"
def on_train_begin(self, args, state, control, **kwargs):
print("Starting training")
trainer = Trainer(
model,
args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
callbacks=[MyCallback], # We can either pass the callback class this way or an instance of it (MyCallback())
)
```
Another way to register a callback is to call `trainer.add_callback()` as follows:
```python
trainer = Trainer(...)
trainer.add_callback(MyCallback)
# Alternatively, we can pass an instance of the callback class
trainer.add_callback(MyCallback())
```
## TrainerState
[[autodoc]] TrainerState
## TrainerControl
[[autodoc]] TrainerControl

View File

@ -1,115 +0,0 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Callbacks
-----------------------------------------------------------------------------------------------------------------------
Callbacks are objects that can customize the behavior of the training loop in the PyTorch
:class:`~transformers.Trainer` (this feature is not yet implemented in TensorFlow) that can inspect the training loop
state (for progress reporting, logging on TensorBoard or other ML platforms...) and take decisions (like early
stopping).
Callbacks are "read only" pieces of code, apart from the :class:`~transformers.TrainerControl` object they return, they
cannot change anything in the training loop. For customizations that require changes in the training loop, you should
subclass :class:`~transformers.Trainer` and override the methods you need (see :doc:`trainer` for examples).
By default a :class:`~transformers.Trainer` will use the following callbacks:
- :class:`~transformers.DefaultFlowCallback` which handles the default behavior for logging, saving and evaluation.
- :class:`~transformers.PrinterCallback` or :class:`~transformers.ProgressCallback` to display progress and print the
logs (the first one is used if you deactivate tqdm through the :class:`~transformers.TrainingArguments`, otherwise
it's the second one).
- :class:`~transformers.integrations.TensorBoardCallback` if tensorboard is accessible (either through PyTorch >= 1.4
or tensorboardX).
- :class:`~transformers.integrations.WandbCallback` if `wandb <https://www.wandb.com/>`__ is installed.
- :class:`~transformers.integrations.CometCallback` if `comet_ml <https://www.comet.ml/site/>`__ is installed.
- :class:`~transformers.integrations.MLflowCallback` if `mlflow <https://www.mlflow.org/>`__ is installed.
- :class:`~transformers.integrations.AzureMLCallback` if `azureml-sdk <https://pypi.org/project/azureml-sdk/>`__ is
installed.
The main class that implements callbacks is :class:`~transformers.TrainerCallback`. It gets the
:class:`~transformers.TrainingArguments` used to instantiate the :class:`~transformers.Trainer`, can access that
Trainer's internal state via :class:`~transformers.TrainerState`, and can take some actions on the training loop via
:class:`~transformers.TrainerControl`.
Available Callbacks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here is the list of the available :class:`~transformers.TrainerCallback` in the library:
.. autoclass:: transformers.integrations.CometCallback
:members: setup
.. autoclass:: transformers.DefaultFlowCallback
.. autoclass:: transformers.PrinterCallback
.. autoclass:: transformers.ProgressCallback
.. autoclass:: transformers.EarlyStoppingCallback
.. autoclass:: transformers.integrations.TensorBoardCallback
.. autoclass:: transformers.integrations.WandbCallback
:members: setup
.. autoclass:: transformers.integrations.MLflowCallback
:members: setup
.. autoclass:: transformers.integrations.AzureMLCallback
TrainerCallback
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TrainerCallback
:members:
Here is an example of how to register a custom callback with the PyTorch :class:`~transformers.Trainer`:
.. code-block:: python
class MyCallback(TrainerCallback):
"A callback that prints a message at the beginning of training"
def on_train_begin(self, args, state, control, **kwargs):
print("Starting training")
trainer = Trainer(
model,
args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
callbacks=[MyCallback] # We can either pass the callback class this way or an instance of it (MyCallback())
)
Another way to register a callback is to call ``trainer.add_callback()`` as follows:
.. code-block:: python
trainer = Trainer(...)
trainer.add_callback(MyCallback)
# Alternatively, we can pass an instance of the callback class
trainer.add_callback(MyCallback())
TrainerState
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TrainerState
:members:
TrainerControl
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TrainerControl
:members:

View File

@ -0,0 +1,28 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Configuration
The base class [`PretrainedConfig`] implements the common methods for loading/saving a configuration
either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded
from HuggingFace's AWS S3 repository).
Each derived config class implements model specific attributes. Common attributes present in all config classes are:
`hidden_size`, `num_attention_heads`, and `num_hidden_layers`. Text models further implement:
`vocab_size`.
## PretrainedConfig
[[autodoc]] PretrainedConfig
- push_to_hub
- all

View File

@ -1,31 +0,0 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Configuration
-----------------------------------------------------------------------------------------------------------------------
The base class :class:`~transformers.PretrainedConfig` implements the common methods for loading/saving a configuration
either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded
from HuggingFace's AWS S3 repository).
Each derived config class implements model specific attributes. Common attributes present in all config classes are:
:obj:`hidden_size`, :obj:`num_attention_heads`, and :obj:`num_hidden_layers`. Text models further implement:
:obj:`vocab_size`.
PretrainedConfig
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.PretrainedConfig
:special-members: push_to_hub
:members:

View File

@ -0,0 +1,64 @@
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Data Collator
Data collators are objects that will form a batch by using a list of dataset elements as input. These elements are of
the same type as the elements of `train_dataset` or `eval_dataset`.
To be able to build batches, data collators may apply some processing (like padding). Some of them (like
[`DataCollatorForLanguageModeling`]) also apply some random data augmentation (like random masking)
on the formed batch.
Examples of use can be found in the [example scripts](../examples) or [example notebooks](../notebooks).
## Default data collator
[[autodoc]] data.data_collator.default_data_collator
## DefaultDataCollator
[[autodoc]] data.data_collator.DefaultDataCollator
## DataCollatorWithPadding
[[autodoc]] data.data_collator.DataCollatorWithPadding
## DataCollatorForTokenClassification
[[autodoc]] data.data_collator.DataCollatorForTokenClassification
## DataCollatorForSeq2Seq
[[autodoc]] data.data_collator.DataCollatorForSeq2Seq
## DataCollatorForLanguageModeling
[[autodoc]] data.data_collator.DataCollatorForLanguageModeling
- numpy_mask_tokens
- tf_mask_tokens
- torch_mask_tokens
## DataCollatorForWholeWordMask
[[autodoc]] data.data_collator.DataCollatorForWholeWordMask
- numpy_mask_tokens
- tf_mask_tokens
- torch_mask_tokens
## DataCollatorForPermutationLanguageModeling
[[autodoc]] data.data_collator.DataCollatorForPermutationLanguageModeling
- numpy_mask_tokens
- tf_mask_tokens
- torch_mask_tokens

View File

@ -1,78 +0,0 @@
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Data Collator
-----------------------------------------------------------------------------------------------------------------------
Data collators are objects that will form a batch by using a list of dataset elements as input. These elements are of
the same type as the elements of :obj:`train_dataset` or :obj:`eval_dataset`.
To be able to build batches, data collators may apply some processing (like padding). Some of them (like
:class:`~transformers.DataCollatorForLanguageModeling`) also apply some random data augmentation (like random masking)
on the formed batch.
Examples of use can be found in the :doc:`example scripts <../examples>` or :doc:`example notebooks <../notebooks>`.
Default data collator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.data.data_collator.default_data_collator
DefaultDataCollator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DefaultDataCollator
:members:
DataCollatorWithPadding
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorWithPadding
:members:
DataCollatorForTokenClassification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForTokenClassification
:members:
DataCollatorForSeq2Seq
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForSeq2Seq
:members:
DataCollatorForLanguageModeling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForLanguageModeling
:members: numpy_mask_tokens, tf_mask_tokens, torch_mask_tokens
DataCollatorForWholeWordMask
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForWholeWordMask
:members: numpy_mask_tokens, tf_mask_tokens, torch_mask_tokens
DataCollatorForPermutationLanguageModeling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.data.data_collator.DataCollatorForPermutationLanguageModeling
:members: numpy_mask_tokens, tf_mask_tokens, torch_mask_tokens

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,38 @@
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Feature Extractor
A feature extractor is in charge of preparing input features for a multi-modal model. This includes feature extraction
from sequences, *e.g.*, pre-processing audio files to Log-Mel Spectrogram features, feature extraction from images
*e.g.* cropping image image files, but also padding, normalization, and conversion to Numpy, PyTorch, and TensorFlow
tensors.
## FeatureExtractionMixin
[[autodoc]] feature_extraction_utils.FeatureExtractionMixin
- from_pretrained
- save_pretrained
## SequenceFeatureExtractor
[[autodoc]] SequenceFeatureExtractor
- pad
## BatchFeature
[[autodoc]] BatchFeature
## ImageFeatureExtractionMixin
[[autodoc]] image_utils.ImageFeatureExtractionMixin

View File

@ -1,48 +0,0 @@
..
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Feature Extractor
-----------------------------------------------------------------------------------------------------------------------
A feature extractor is in charge of preparing input features for a multi-modal model. This includes feature extraction
from sequences, *e.g.*, pre-processing audio files to Log-Mel Spectrogram features, feature extraction from images
*e.g.* cropping image image files, but also padding, normalization, and conversion to Numpy, PyTorch, and TensorFlow
tensors.
FeatureExtractionMixin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.feature_extraction_utils.FeatureExtractionMixin
:members: from_pretrained, save_pretrained
SequenceFeatureExtractor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.SequenceFeatureExtractor
:members: pad
BatchFeature
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BatchFeature
:members:
ImageFeatureExtractionMixin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.image_utils.ImageFeatureExtractionMixin
:members:

View File

@ -0,0 +1,24 @@
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Keras callbacks
When training a Transformers model with Keras, there are some library-specific callbacks available to automate common
tasks:
## KerasMetricCallback
[[autodoc]] KerasMetricCallback
## PushToHubCallback
[[autodoc]] PushToHubCallback

Some files were not shown because too many files have changed in this diff Show More