* examples: add Bloom support for token classification (FLAX, PyTorch and TensorFlow)
* examples: remove support for Bloom in token classication (FLAX and TensorFlow currently have no support for it)
* bnb minor modifications
- refactor documentation
- add troubleshooting README
- add PyPi library on DockerFile
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* put in one block
- put bash instructions in one block
* update readme
- refactor a bit hardware requirements
* change text a bit
* Apply suggestions from code review
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* apply suggestions
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* add link to paper
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update tests/mixed_int8/README.md
* Apply suggestions from code review
* refactor a bit
* add instructions Turing & Amperer
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* add A6000
* clarify a bit
* remove small part
* Update tests/mixed_int8/README.md
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Update run_translation_no_trainer.py
found an error in selecting `no_decay` parameters and some small modifications when the user continues to train from a checkpoint
* fixs `no_decay` and `resume_step` issue
1. change `no_decay` list
2. if use continue to train their model from provided checkpoint, the `resume_step` will not be initialized properly if `args.gradient_accumulation_steps != 1`
* [fsmt] deal with -100 indices in decoder ids
Fixes: https://github.com/huggingface/transformers/issues/17945
decoder ids get the default index -100, which breaks the model - like t5 and many other models add a fix to replace -100 with the correct pad index.
For some reason this use case hasn't been used with this model until recently - so this issue was there since the beginning it seems.
Any suggestions to how to add a simple test here? or perhaps we have something similar already? user's script is quite massive.
* style
* Supporting seq2seq models for `bitsandbytes` integration
- `bitsandbytes` integration supports now seq2seq models
- check if a model has tied weights as an additional check
* small modification
- tie the weights before looking at tied weights!
* initial commit
* add small test
* add cross pt tf flag to test
* fix quality
* style
* update test with new repo
* fix failing test
* update
* fix wrong param ordering
* style
* update based on review
* update related to recent new caching mechanism
* quality
* Update based on review
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
* quality and style
* Update src/transformers/modeling_flax_utils.py
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix critical trace warnings to allow ONNX export
* Force input to `sqrt` to be float type
* Cleanup code
* Remove unused import statement
* Update model sew
* Small refactor
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* Use broadcasting instead of repeat
* Implement suggestion
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* Match deberta v2 changes in sew_d
* Improve code quality
* Update code quality
* Consistency of small refactor
* Match changes in sew_d
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* changing BartLearnedPositionalEmbedding forward signature and references to it
* removing debugging dead code (thanks style checker)
* blackened modeling_bart file
* removing copy inconsistencies via make fix-copies
* changing references to copied signatures in Bart variants
* make fix-copies once more
* using expand over repeat (thanks @michaelbenayoun)
* expand instead of repeat for all model copies
Co-authored-by: Daniel Jones <jonesdaniel@microsoft.com>
* fix typos
* fix sequence_length docs of LayoutLMv3Model
* delete trailing white spaces
* fix layoutlmv3 docs more
* apply make fixup & quality
* change to two versions of input docstring
* apply make fixup & quality
* onnx config for clip
* default opset as 14
* changes from the original repo
* input values order fix
* outputs fix
* remove unused import
* ran make fix-copies
* black format
* review comments: forward ref, import fix, model change revert, .to cleanup
* make style
* formatting fixes
* revert groupvit
* comment for cast to int32
* comment fix
* make .T as .t() for onnx conversion
* ran make fix-copies
* remove unneeded comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix copies
* remove comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* `pipeline` support for `device="mps"` (or any other string)
* Simplify `if` nesting
* Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix? @sgugger
* passing `attr=None` is not the same as not passing `attr` 🤯
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Finished QA example
* Dodge a merge conflict
* Update text classification and LM examples
* Update NER example
* New Keras metrics WIP, fix NER example
* Update NER example
* Update MC, summarization and translation examples
* Add XLA warnings when shapes are variable
* Make sure batch_size is consistently scaled by num_replicas
* Add PushToHubCallback to all models
* Add docs links for KerasMetricCallback
* Add docs links for prepare_tf_dataset and jit_compile
* Correct inferred model names
* Don't assume the dataset has 'lang'
* Don't assume the dataset has 'lang'
* Write metrics in text classification
* Add 'framework' to TrainingArguments and TFTrainingArguments
* Export metrics in all examples and add tests
* Fix training args for Flax
* Update command line args for translation test
* make fixup
* Fix accidentally running other tests in fp16
* Remove do_train/do_eval from run_clm.py
* Remove do_train/do_eval from run_mlm.py
* Add tensorflow tests to circleci
* Fix circleci
* Update examples/tensorflow/language-modeling/run_mlm.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/test_tensorflow_examples.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/translation/run_translation.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Update examples/tensorflow/token-classification/run_ner.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Fix save path for tests
* Fix some model card kwargs
* Explain the magical -1000
* Actually enable tests this time
* Skip text classification PR until we fix shape inference
* make fixup
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* fix deberta issues
* add different code paths for gpu and tpu
* shorter gpu take along axis
* Stable Dropout without tf cond
* variable must be float
* first commit
* correct replace function
* add final changes
- works like charm!
- cannot implement tests yet
- tested
* clean up a bit
* add bitsandbytes dependencies
* working version
- added import function
- added bitsandbytes utils file
* small fix
* small fix
- fix import issue
* fix import issues
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor a bit
- move bitsandbytes utils to utils
- change comments on functions
* reformat docstring
- reformat docstring on init_empty_weights_8bit
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* revert bad formatting
* change to bitsandbytes
* refactor a bit
- remove init8bit since it is useless
* more refactoring
- fixed init empty weights issue
- added threshold param
* small hack to make it work
* Update src/transformers/modeling_utils.py
* Update src/transformers/modeling_utils.py
* revmoe the small hack
* modify utils file
* make style + refactor a bit
* create correctly device map
* add correct dtype for device map creation
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply suggestions
- remove with torch.grad
- do not rely on Python bool magic!
* add docstring
- add docstring for new kwargs
* add docstring
- comment `replace_8bit_linear` function
- fix weird formatting
* - added more documentation
- added new utility function for memory footprint tracking
- colab demo to add
* few modifs
- typo doc
- force cast into float16 when load_in_8bit is enabled
* added colab link
* add test architecture + docstring a bit
* refactor a bit testing class
* make style + refactor a bit
* enhance checks
- add more checks
- start writing saving test
* clean up a bit
* male style
* add more details on doc
* add more tests
- still needs to fix 2 tests
* replace by "or"
- could not fix it from GitHub GUI
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor a bit testing code + add readme
* make style
* fix import issue
* Update src/transformers/modeling_utils.py
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* add few comments
* add more doctring + make style
* more docstring
* raise error when loaded in 8bit
* make style
* add warning if loaded on CPU
* add small sanity check
* fix small comment
* add bitsandbytes on dockerfile
* Improve documentation
- improve documentation from comments
* add few comments
* slow tests pass on the VM but not on the CI VM
* Fix merge conflict
* make style
* another test should pass on a multi gpu setup
* fix bad import in testing file
* Fix slow tests
- remove dummy batches
- no more CUDA illegal memory errors
* odify dockerfile
* Update docs/source/en/main_classes/model.mdx
* Update Dockerfile
* Update model.mdx
* Update Dockerfile
* Apply suggestions from code review
* few modifications
- lm head can stay on disk/cpu
- change model name so that test pass
* change test value
- change test value to the correct output
- torch bmm changed to baddmm in bloom modeling when merging
* modify installation guidelines
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* replace `n`by `name`
* merge `load_in_8bit` and `low_cpu_mem_usage`
* first try - keep the lm head in full precision
* better check
- check the attribute `base_model_prefix` instead of computing the number of parameters
* added more tests
* Update src/transformers/utils/bitsandbytes.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Merge branch 'integration-8bit' of https://github.com/younesbelkada/transformers into integration-8bit
* improve documentation
- fix typos for installation
- change title in the documentation
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* update features
* MT5OnnxConfig added with updated with tests and docs
* fix imports
* fix onnc_config_cls for mt5
Co-authored-by: Thomas Chaigneau <thomas.deeptools.ai>
* Added accelerate gradient accumulation wrapper to run_image_classification_no_trainer.py example script
* make fixup changes
* PR comments
* changed input to Acceletor based on PR comment, ran make fixup
* Added comment explaining the sync_gradients statement
* Fixed lr scheduler max steps
* Changed run_clm_no_trainer.py script to use accelerate gradient accum wrapper
* Fixed all scripts except wav2vec2 pretraining to use accelerate gradient accum wrapper
* Added accelerate gradient accum wrapper for wav2vec2_pretraining_no_trainer.py script
* make fixup and lr_scheduler step inserted back into run_qa_beam_search_no_trainer.py
* removed changes to run_wav2vec2_pretraining_no_trainer.py script and fixed using wrong constant in qa_beam_search_no_trainer.py script
* [DX fix] Fixing QA pipeline streaming a dataset.
QuestionAnsweringArgumentHandler would iterate over the whole dataset
effectively killing all properties of the pipeline.
This restores nice properties when using `Dataset` or `Generator` since
those are meant to be consumed lazily.
* Handling TF better.
* Delete valohai.yaml
* NLP => ML
* typo
* website supports https
* datasets
* 60k + modalities
* unrelated link fixing for accelerate
* Ok those links were actually broken
* Fix link
* Make `AutoTokenizer` auto-link
* wording tweak
* add at least one non-nlp task
* Draft new cached_file
* Initial draft for config and model
* Small fixes
* Fix first batch of tests
* Look in cache when internet is down
* Fix last tests
* Bad black, not fixing all quality errors
* Make diff less
* Implement change for TF and Flax models
* Add tokenizer and feature extractor
* For compatibility with main
* Add utils to move the cache and auto-do it at first use.
* Quality
* Deal with empty commit shas
* Deal with empty etag
* Address review comments
* Adding a better error message when the model is improperly configured
within transformers.
* Update src/transformers/pipelines/__init__.py
* Black version.
* Overriding task aliases so that tokenizer+feature_extractor
values are correct.
* Fixing task aliases by overriding their names early
* X.
* Fixing feature-extraction.
* black again.
* Normalizing `translation` too.
* Fixing last few corner cases.
translation need to use its non normalized name (translation_XX_to_YY,
so that the task_specific_params are correctly overloaded).
This can be removed and cleaned up in a later PR.
`speech-encode-decoder` actually REQUIRES to pass a `tokenizer` manually
so the error needs to be discarded when the `tokenizer` is already
there.
* doc-builder fix.
* Fixing the real issue.
* Removing dead code.
* Do not import the actual config classes.
* First draft
* Add VideoMAEForVideoClassification
* Improve conversion script
* Add VideoMAEForPreTraining
* Add VideoMAEFeatureExtractor
* Improve VideoMAEFeatureExtractor
* Improve docs
* Add first draft of model tests
* Improve VideoMAEForPreTraining
* Fix base_model_prefix
* Make model take pixel_values of shape (B, T, C, H, W)
* Add loss computation of VideoMAEForPreTraining
* Improve tests
* Improve model testsé
* Make all tests pass
* Add VideoMAE to main README
* Add tests for VideoMAEFeatureExtractor
* Add integration test
* Improve conversion script
* Rename patch embedding class
* Remove VideoMAELayer from init
* Update design of patch embeddings
* Improve comments
* Improve conversion script
* Improve conversion script
* Add conversion of pretrained model
* Add loss verification of pretrained model
* Add loss verification of unnormalized targets
* Add integration test for pretraining model
* Apply suggestions from code review
* Fix bug to make feature extractor resize only shorter edge
* Address more comments
* Improve normalization of videos
* Add doc examples
* Move constants to dedicated script
* Remove scripts
* Transfer checkpoints, fix docs
* Update script
* Update image mean and std
* Fix doc tests
* Set return_tensors to NumPy by default
* Revert the previous change
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Enable HFTracer to trace with custom dummy inputs instead of pre-computed ones
* Add HFTracer.trace docstring, and make it possible to handle callable and torch.nn.Module in general
* Remove pdb comment
* Apply suggestions
* Cleanup some code
* Improve signatures
* Try to reduce the number of reshape/copies
* I don't think we actually need the layer_num scaling trick
* No need for duplication
* Try to fix beam_search
* Fix beam search
* Removing layer num normalization seems to be breaking
* Not sure self.layer_number normalization actually matters
* Try and be backward compatible
* Try to fix beam_search
* Revert attempt to be backward compatible
* Improve documentation on past_key_values format
* Optimize the device allocation in case of hidden_states in multiple devices
* No need to manually cast the values to a specific device
* Rename with long version of variables
* Improve type hinting
* Add comment that explains that some methods return views
* Actually i think the attention casting only makes sense when we use torch.float16
* We don't actually need layer_number to be passed anymore
* Fix FX test
* Bypass torch.baddbmm
* Apply suggestions from code review
* Add comment about support for torchScript v1.11
* fix ONNX support for bloom (#18456)
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
Comparisons like
version.parse(torch.__version__) > version.parse("1.6")
are True for torch==1.6.0+cu101 or torch==1.6.0+cpu
version.parse(version.parse(torch.__version__).base_version) are preferred (and available in pytorch_utils.py
* fix: keras fit tests for segformer tf and minor refactors.
* refactor: test_keras_fit to make it simpler using the existing one.
* fix: styling issues.
* Add file in spanish docs to be translated
* Translate first two sections to Spanish
* Translate four additional sections to Spanish
* Finish translation to Spanish
* Improve writing style in Spanish
* Add suggested changes from reviewer
This PR moves GroupViT and LXMert to their correct sections. As pointed out by @NielsRogge and @LysandreJik, GroupViT and LXMert are both multimodal models.
* Update pipeline word heuristic to work with whitespace in token offsets
This change checks for whitespace in the input string at either the
character preceding the token or in the first character of the token.
This works with tokenizers that return offsets excluding whitespace
between words or with offsets including whitespace.
fixes#18111
starting
* Use smaller model, ensure expected tokenization
* Re-run CI (please squash)
`torch.Tensor` creates an unitialized tensor (as via `torch.empty`), this leads to undeterministic behavior, poor initialization, and nans if you have unlucky init. The paper does not specify the initialization for bias terms, so I guess zero seems like a good choice - no bias initially. `torch.Tensor` is usually populated with zeros, so this fix will be close to the intended behavior:
```
>>> torch.Tensor(100, 100).sum()
tensor(0.)
>>> torch.Tensor(100, 100).sum()
tensor(nan)
>>> torch.Tensor(100, 100).sum()
tensor(0.)
```
* Added option for users to modify config parameter used by pytesseract during feature extraction
- Added optional 'tess_config' kwarg when setting up LayoutLMV2 processor that is used by pytesseract during feature extraction
- Eg. Can be used to modify psm values by setting tess_config to '--psm 7'
- Different psm values significantly influences the output of layoutlmv2
* Update src/transformers/models/layoutlmv2/feature_extraction_layoutlmv2.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/layoutlmv2/feature_extraction_layoutlmv2.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Updated variable names to be more explicit
* Fixed styles
* Added option for users to modify config parameter when calling pytesseract during feature extraction
- Added option to set "tesseract_config" parameter during LayoutLMV3 processor initialization
- Can be used to modify PSM values, eg. by setting tesseract_config="--psm 6"
* Removed from function signature
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* add LUKE models for downstream tasks
* add new LUKE models to docs
* fix typos
* remove commented lines
* exclude None items from tuple return values
Fix#18385
I don't know whether `use_auth_token`, `cache_dir` and `local_files_only` should be passed to `(cls.slow_tokenizer_class)._from_pretrained`, but I guess it should.
* Bloom model can now be traced
* Bloom traced model can be torch scripted and serialized
* Bloom can be traced with variable keyword arguments
* Enable XLNet support
* Disable XLNet for now
Currently, tensorflow examples use the `load_metric` function from
Datasets library, commit migrates function call to `load` function
from Evaluate library.
* Migrate metric to Evaluate library in tf examples
Currently tensorflow examples use `load_metric` function from Datasets
library , commit migrates function call to `load` function to
Evaluate library.
Fix for #18306
* Migrate metric to Evaluate library in tf examples
Currently tensorflow examples use `load_metric` function from Datasets
library , commit migrates function call to `load` function to
Evaluate library.
Fix for #18306
* Migrate `metric` to Evaluate for all tf examples
Currently tensorflow examples use `load_metric` function from Datasets
library , commit migrates function call to `load` function to
Evaluate library.
Left the term fine-tuning since there is no correct translation into Italian and the English term is generally used. The same was done with some terms like "learning rate"
* start from 1.12, torch_ccl is renamed as oneccl_bindings_for_pytorch and should import it before use
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* add doc for perf_train_cpu_many
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* update doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Add files generated using transformer-cli add-new-model-like command
* Add changes for swinv2 attention and forward method
* Add fixes
* Add modifications for weight conversion and remaining args in swin model
* Add changes for patchmerging
* Add changes for SwinV2selfattention
* Update conversion script
* Add final fixes for the swin_v2 model
* Add changes for conversion script for pretrained window size case
* Add pretrained window size value from config in SwinV2Encoder class
* Make fixup
* Add swinv2 to models_not_in_readme to utils/check_copies.py
* Modify Swinv2v2 to Swin Transformer V2
* Remove copied from, to run make fixup command
* Add updates to swinv2tf from main branch
* Add pretrained_window_size to config, to make tests pass
* Add modified weights from nandwalritik profile for swinv2
* Update model weights from swinv2 from nandwalritik profile
* Add fix for build_pr_documentation CI fix
* Add fixes for weight conversion
* Add change to make input with padding work
* Add fixes for test cases
* Add few changes from swin to swinv2 to pass test cases
* Remove tests for tensorflow as swinv2 for TF is not added yet
* Overide test_pt_tf_model_equivalence function as TF implementation for swinv2 is not added yet
* Add modeling_tf_swinv2 to _ignore_modules as test file is removed for this one right now.
* Update docs url for swinv2 in README.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Undo changes for check_repo
* Update url in readme.md
* Remove overrided function to test pt_tf_model_equivalence
* Remove TF model imports for Swinv2 as its not implemented in this PR
* Add changes for index.mdx
* Add swinv2 papers link,abstract and contributors details
* Rename cpb_mlp to continous_position_bias_mlp
* Add tips for swinv2 model
* Update src/transformers/models/swinv2/configuration_swinv2.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update src/transformers/models/swinv2/configuration_swinv2.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Fix indentation for docstring example in src/transformers/models/swinv2/configuration_swinv2.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update import order in src/transformers/models/swinv2/configuration_swinv2.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add copyright statements in weights conversion script.
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Remove Swinv2 from models_not_in_readme
* Reformat code
* Remove TF implementation file for swinv2
* Update start docstring.
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add changes for docstring
* Update orgname for weights to microsoft
* Remove to_2tuple function
* Add copied from statements wherever applicable
* Add copied from to Swinv2ForMaskedImageModelling class
* Reformat code.
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add unittest.skip(with reason.) for test_inputs_embeds test case.
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add updates for test_modeling_swinv2.py
* Add @unittest.skip() annotation for clarity to create_and_test_config_common_properties function
* Add continuous_position_bias_mlp parameter to conversion script
* Add test for testing masked_image_modelling for swinv2
* Update Swinv2 to Swin Transformer v2 in docs/source/en/model_doc/swinv2.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update Swinv2 to Swin Transformer v2 in docs/source/en/model_doc/swinv2.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/swinv2.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update docs/source/en/model_doc/swinv2.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add suggested changes
* Add copied from to forward methods of Swinv2Stage and Swinv2Encoder
* Add push_to_hub flag to weight conversion script
* Change order or Swinv2DropPath class
* Add id2label mapping for imagenet 21k
* Add updated url for SwinV2 functions and classes used in implementation
* Update input_feature dimensions format, mentioned in comments.
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
* Add suggested changes for modeling_swin2.py
* Update docs
* Remove create_and_test_config_common_properties function, as test_model_common_attributes is sufficient.
* Fix indentation.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add changes for making Nit objects in code style
* Add suggested changes
* Add suggested changes for test_modelling_swinv2
* make fix-copies
* Update docs/source/en/model_doc/swinv2.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fixes torch jit tracing for LayoutLMv2 model.
Pytorch seems to reuse memory for input_shape which caused a mismatch in shapes later in the forward pass.
* Fixed code quality
* avoid unneeded allocation of vector for shape
* add info about megatron training
* upload models and datasets from CodeParrot organization
* upload models and datasets from CodeParrot organization
* Update examples/research_projects/codeparrot/README.md
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* Update examples/research_projects/codeparrot/README.md
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* Update examples/research_projects/codeparrot/README.md
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* Update examples/research_projects/codeparrot/README.md
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* Update examples/research_projects/codeparrot/README.md
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* fix typo and add comment about codeparrot vs megatron
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* Improve docs
* Improve docs of speech one as well
* Apply suggestions from code review
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Removes a duplicated instantiation of device. I removed the second instance of the line to maintain code alignment with the GPT-J implementation of forward.
* Update index
* Translate to Spanish two sections from custom_models
* Translate to Spanish custom models documentation
* Fixing typos and grammatical errors
* Add requested changes from reviewer
* [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial
* Delete docs/source/pt-br directory
* [ fast_tokenizers.mdx ] - Continuing work on file
* [ fast_tokenizers.mdx ] - Continuing work on file
* Add fast tokenizers to _toctree.yml
* Eliminated config and toctree.yml
* Nits in fast_tokenizers.mdx
* Finishing create_a_model
* [ create_a_model.mdx ] finishing create a model in pt-br
* [ Changing _toctree.yml ] adding create a model in pt
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* Ensure value and attn weights have the same dtype
* Remove prints
* Modify decision transformers copied from gpt2
* Nit device
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Fix style
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Add serving_output and serving methods to some vision models
* Add serving outputs for DeiT
* Don't convert hidden states - differing shapes
* Make saveable
* Fix up
* Make swin saveable
* Add in tests
* Fix funnel tests (can't convert to tensor)
* Fix numpy call
* Tidy up a bit
* Add in hidden states - resnet
* Remove numpy
* Fix failing tests - tensor shape and skipping tests
* Remove duplicated function
* PR comments - formatting and var names
* PR comments
Add suggestions made by Joao Gante:
* Use tf.shape instead of shape_list
* Use @tooslow decorator on tests
* Simplify some of the logic
* PR comments
Address Yih-Dar Sheih comments - making tensor names consistent and make types float
* Types consistent with docs; disable test on swin (slow)
* CI trigger
* Change input_features to float32
* Add serving_output for segformer
* Fixup
Co-authored-by: Amy Roberts <amyeroberts@users.noreply.github.com>
* Change how `take_along_axis` is computed in DeBERTa to stop confusing XLA
* Greatly simplify take_along_axis() since the code wasn't using most of it
* First commit
* final changes
* Changed create_model to create_a_model
Translated into crea un'architettura personalizzata in the file it/_toctree.yml
* Added _toctree.yml in the italian translation loca: serialization title Esporta modelli transformers
* Edit translation for create_model.mdx
* t with '#' will be ignored, and an empty message aborts the commit.
* Added file serialization for translation in italian
* Fix toctree serialization position
I checked the eng toctree and realized I made a mistake.
* Update _toctree.yml
Correct spacing
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add: segformer utils and img. classification.
* add: segmentation layer.
* feat: working implementation of segformer.
* chore: remove unused variable.
* add test, remaining modifications.
* remove: unnecessary files.
* add: rest of the files.
Co-authored-by: matt <rocketknight1@gmail.com>
* chore: remove ModuleList comment.
* chore: apply make style.
* chore: apply make fixup-copies.
* add to check_repo.py
* add decode head to IGNORE_NON_TESTED
* chore: run make style.
* chore: PR comments.
* chore: minor changes to model doc.
* tests: reduction across samples.
* add a note on the space.
* sort importats.
* fix: reduction in loss computation.
* chore: align loss function with that of NER.
* chore: correct utils/documentation_tests.txt
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* chore: simplify the interpolation of logits in loss computation.
* chore: return transposed logits when return_dict=False.
* chore: add link to the tf fine-tuning repo.
* address pr comments.
* address niels's comments.
* remove from_pt=True since tf weights are in.
* remove comment from pt model.
* address niels's comments.
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Run_scripts Italian translation gh-17459
* Updated run_scripts gh-17642
* Updated run_scripts gh-17642
Made the text more gender-neutral.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* More informative error message
* raise dynamic error
* remove_excess_nesting application
* incorrect shape assertion for collator & function to remove excess nesting from DatasetDict
* formatting
* eliminating datasets import
* removed and relocated remove_excess_nesting to the datasets library and updated docs accordingly
* independent assert instructions
* inform user of excess nesting
* Add support for Sagemaker Model Parallel >= 1.10 new checkpoint API
* Support loading checkpoints saved with SMP < 1.10 in SMP < 1.10 and SMP >= 1.10
* Support loading checkpoints saved with SMP >= 1.10 in SMP >= 1.10
* Fix bug and styling
* Update based on reviewer feedback
* Initial work
* More work
* Add tests for custom pipelines on the Hub
* Protect import
* Make the test work for TF as well
* Last PyTorch specific bit
* Add documentation
* Style
* Title in toc
* Bad names!
* Update docs/source/en/add_new_pipeline.mdx
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
* Auto stash before merge of "custom_pipeline" and "origin/custom_pipeline"
* Address review comments
* Address more review comments
* Update src/transformers/pipelines/__init__.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
* Adding support for `device_map` directly in `pipeline(..)` function.
* Updating the docstring.
* Adding a better docstring
* Put back type hints.
* Blacked. (`make fixup` didn't work ??!!)
* Report value for a step instead of epoch.
Report an objective function value for a step instead of epoch to optuna.
I made this modification for the following reason:
If "eval_steps" is less than steps per epoch, there maybe warnings like this: "optuna/trial/_trial.py:592: UserWarning: The reported value is ignored because this `step` 0 is already reported.". So "step" are more appropriate than "epoch" here.
* MOD: make style.
Co-authored-by: zhaowei01 <zhaowei01@yuanfudao.com>
* fix tolerance for a bloom slow test
* enhance alibi padding
- get rid of for loops
- deals better with padded batched input
- avoid useless cpu/gpu communication when creating alibi
Co-authored-by: justheuristic <justheuristic@gmail.com>
* optimize attention mask
* fix scaled softmax limit values
* optimize building alibi tensor
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
* fix attention_mask shape when it's None
* minor fixes
- fix docstring + arg names
* remove colons in docstring
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* apply suggestion
* remove unsued arg
* refactor a bit
- use [:, None] for consistency
* refactor attention block
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
* quick fixes
* first attempt
* refactor attention block and fix all tests except "test_simple_generation"
- added comments to better explain attention block
* remove debug lines and add TODO comment
* change `torch.bmm` to `torch.baddbmm`
- fixes `test_simple_generation`but breaks `test_batch_generation_padd`
* styling
* all tests are passing now
- use `bmm`
- add explanation for `allow_fp16_reduced_precision_reduction`
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
* styling
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
* fix support for accelerate
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* remove attn softmax in fp32
* refactor comments
* refactor a bit
- remove warning message
- remove print on test
* refer to pytorch t5
* change the slow tests
- do the tests in fp32
- remove some comments
- keep large comments
* update expected output for `test_simple_generation`
- we now test using fp32
* make style + change comments a bit
* fix dtype padd test
Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
Co-authored-by: Younes Belkada <younesbelkada@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix RESOURCE_EXHAUSTED error for large datasets on Flax example scripts
* using np.permutation for creating batch_idx
* train_samples_idx -> training_samples_idx
* fix type hints
* Fix type issue in using bucketing with Trainer
- Fix type issues in LengthGrouperSampler,
DistributedLengthGroupedSampler
refs: #18003
* Change logging type in LengthGroupedSampler
- Change `logger.warning` to `logger.info`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Change logging type in DistributedLengthGroupedSampler
- Change `logger.warning` to `logger.info`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Remove adundant clause in LengthGroupedSampler
- Use `elif`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Remove adundant clause in DistributedLengthGroupedSampler
- Use `elif`
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply black, isort to modified codes in the script
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Drop columns after loading samples, rather than before, to avoid breaking transforms
* make fixup
* Add workaround so this PR can work with current datasets version
* Return scalar losses instead of per-sample means
* Make loss shape (1,) instead of scalar
* Allow scalar losses in test_loss_computation
* Allow scalar losses in test_loss_computation
* Allow scalar losses in test_loss_computation
* Remove XLA loss function for RAG
* Refactor to inherit from nn.Module instead of nn.ModuleList
* Fix typo
* Empty to trigger CI re-run
Blender Bot tests failing (should be unrelated to this PR) and pass locally). I don't have sufficient permisisons to re-run the CI workflow (totally or from failed)
* Rought TF conversion outline
* Tidy up
* Fix padding differences between layers
* Add back embedder - whoops
* Match test file to main
* Match upstream test file
* Correctly pass and assign image_size parameter
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Add in MainLayer
* Correctly name layer
* Tidy up AdaptivePooler
* Small tidy-up
More accurate type hints and remove whitespaces
* Change AdaptiveAvgPool
Use the AdaptiveAvgPool implementation by @Rocketknight1, which correctly pools if the output shape does not evenly divide by input shape c.f. 9e26607e22 (r900109509)
Co-authored-by: From: matt <rocketknight1@gmail.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Use updated AdaptiveAvgPool
Co-authored-by: matt <rocketknight1@gmail.com>
* Make AdaptiveAvgPool compatible with CPU
* Remove image_size from configuration
* Fixup
* Tensorflow -> TensorFlow
* Fix pt references in tests
* Apply suggestions from code review - grammar and wording
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add TFResNet to doc tests
* PR comments - GlobalAveragePooling and clearer comments
* Remove unused import
* Add in keepdims argument
* Add num_channels check
* grammar fix: by -> of
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Remove transposes - keep NHWC throughout forward pass
* Fixup look sharp
* Add missing layer names
* Final tidy up - remove from_pt now weights on hub
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Exclude Databricks from notebook env only if the runtime is below 11.0
* Dummy commit to trigger CI
* Empty commit to trigger CI
* Empty commit to trigger CI
* Empty commit to trigger CI
* Empty commit to trigger CI
* Empty commit to trigger CI
* Empty commit to trigger CI
* Empty commit to trigger CI
* Shifting labels for causal LM when using label smoother
When training CausalLM, loss is computed within model's foward() function and
labels are shifted internally. However, if label smoothing is applied, loss is
computed in trainer's compute_loss function and labels are not shifted.
This causes unintended confusion during the alignment of labels and corresponding
inputs. This commit is for resolving this confusion.
Resolves#17960
On branch shift_labels_for_causalLM
Changes to be committed:
modified: src/transformers/trainer.py
modified: src/transformers/trainer_pt_utils.py
* Update trainer.py
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Copy inputs to train and test step before modifying them, as this breaks things
* Add XLA tests, fix our loss functions to be XLA-compatible
* make fixup
* Update loss computation test to expect vector of per-sample losses
* Patch loss for TFLED
* Patch loss for TFAlbert
* Add a tf_legacy_loss config flag that enables old loss functions
* Stop using config.get() because it's not a dict
* Skip loss computation test for RAG because its loss is very strange and I'm afraid to rewrite it
* make fixup
* Add XLA-compatible RAG loss
* Fix dtype of loss mask for TFAlbert
* Fix test for XLNet too because it overrides the default one
* make fixup
* Fix config test
* No more depending on GPU NaN behaviour
* Add test, avoid potential zero division
* Fix test item assignment
* Fix loss computation masking test
* make fixup
* Fix dtype bugs
* [Flax] Add remat (gradient checkpointing)
* fix variable naming in test
* flip: checkpoint using a method
* fix naming
* fix class naming
* apply PVP's suggestions from code review
* make fix-copies
* fix big-bird, electra, roberta
* cookie-cutter
* fix flax big-bird
* move test to common
* add onnx support for BLOOM
* use TYPE_CHECKING for type annotations
* fix past_shape for bloom (different from gpt2)
* use logical_or instead of `+` for onnx support
* bigger `atol_for_validation` for larger bloom models
* copied -> taken because it's no longer an exact copy
* remove "copied from" comment
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* sharded conversion; add flag to control max hidden error
* better hidden name matching
* Add test: load TF from PT shards
* fix test (PT data must be local)
* first draft adding Flax-t5-encoder and Flax-mt5-encoder
* imports
* after make fixup
* flax t5 encoder test
* black on test
* make fix-copies
* clean
* all_model_classes -> tuple
* clean test
* is_encoder_decoder=False in t5-enc tester
* remove file docstring before FlaxT5Encoder
* black
* isort
* commit suggestions on src/transformers/models/t5/modeling_flax_t5.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* commit suggestions on src/transformers/models/t5/modeling_flax_t5.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* remove _get_encoder_module
* self.decoder_seq_length -> self.encoder_seq_length as t5-enc does not have decoder
* bugfix - self.module_class is class itself, not instance;
* docs for mt5 and t5
* call -> __call__ in t5 doc
* FlaxMT5EncoderModel to TYPE_HINT
* run doc-builder to allow change the files
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Revert "Skip failing test until they are fixed."
This reverts commit 8f400775fc5bc1011a2674dcfd5408d30d69f678.
* Use `tiny-detr` checkpts from `hf-internal-testing`
* chore: initial commit
Copied the torch implementation of regnets and porting the code to tf step by step. Also introduced an output layer which was needed for regnets.
* chore: porting the rest of the modules to tensorflow
did not change the documentation yet, yet to try the playground on the model
* Fix initilizations (#1)
* fix: code structure in few cases.
* fix: code structure to align tf models.
* fix: layer naming, bn layer still remains.
* chore: change default epsilon and momentum in bn.
* chore: styling nits.
* fix: cross-loading bn params.
* fix: regnet tf model, integration passing.
* add: tests for TF regnet.
* fix: code quality related issues.
* chore: added rest of the files.
* minor additions..
* fix: repo consistency.
* fix: regnet tf tests.
* chore: reorganize dummy_tf_objects for regnet.
* chore: remove checkpoint var.
* chore: remov unnecessary files.
* chore: run make style.
* Update docs/source/en/model_doc/regnet.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* chore: PR feedback I.
* fix: pt test. thanks to @ydshieh.
* New adaptive pooler (#3)
* feat: new adaptive pooler
Co-authored-by: @Rocketknight1
* chore: remove image_size argument.
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: matt <rocketknight1@gmail.com>
* Empty-Commit
* chore: remove image_size comment.
* chore: remove playground_tf.py
* chore: minor changes related to spacing.
* chore: make style.
* Update src/transformers/models/regnet/modeling_tf_regnet.py
Co-authored-by: amyeroberts <aeroberts4444@gmail.com>
* Update src/transformers/models/regnet/modeling_tf_regnet.py
Co-authored-by: amyeroberts <aeroberts4444@gmail.com>
* chore: refactored __init__.
* chore: copied from -> taken from./g
* adaptive pool -> global avg pool, channel check.
* chore: move channel check to stem.
* pr comments - minor refactor and add regnets to doc tests.
* Update src/transformers/models/regnet/modeling_tf_regnet.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* minor fix in the xlayer.
* Empty-Commit
* chore: removed from_pt=True.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: amyeroberts <aeroberts4444@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Fixing a regression with `return_all_scores` introduced in #17606
- The legacy test actually tested `return_all_scores=False` (the actual
default) instead of `return_all_scores=True` (the actual weird case).
This commit adds the correct legacy test and fixes it.
Tmp legacy tests.
Actually fix the regression (also contains lists)
Less diffed code.
* Add a TF in-graph tokenizer for BERT
* Add from_pretrained
* Add proper truncation, option handling to match other tokenizers
* Add proper imports and guards
* Add test, fix all the bugs exposed by said test
* Fix truncation of paired texts in graph mode, more test updates
* Small fixes, add a (very careful) test for savedmodel
* Add tensorflow-text dependency, make fixup
* Update documentation
* Update documentation
* make fixup
* Slight changes to tests
* Add some docstring examples
* Update tests
* Update tests and add proper lowercasing/normalization
* make fixup
* Add docstring for padding!
* Mark slow tests
* make fixup
* Fall back to BertTokenizerFast if BertTokenizer is unavailable
* Fall back to BertTokenizerFast if BertTokenizer is unavailable
* make fixup
* Properly handle tensorflow-text dummies
* Add CodeGen model
* Add missing key and switch order of super()
* Fix torch.ones init with uint8 instead of bool
* Address comments: copy statements and doc
* update tests
* remove old model parallel
* fix batch gen tests
* fix batch gen test
* update test_gpt2_sample_max_time
* fix codgen test and revert gpt2 test change
* Fix incorrect tie_word_embedding value, typo, URL
* Fix model order in README and styling
* Reorder model list alphabetically
* Set tie_word_embedding to False by default
* Apply suggestions from code review
* Better attn mask name & remove attn masked_bias
* add tokenizer for codegen
* quality
* doc tokenizer
* fix-copies
* add CodeGenTokenizer in converter
* make truncation optional
* add test for truncation
* add copyright
* fix-copies
* fix fast tokenizer decode
* Update src/transformers/models/codegen/tokenization_codegen.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* increase vocab_size in tests
Co-authored-by: patil-suraj <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix tests that broke when models used batchnorm
* Initializing the model twice does not actually...
...give you the same weights each time.
I am good at machine learning.
* Fix speed regression
* few fixes:
- hardcode tokenizer padding side
- remove unused args
* few fixes:
- added new attribute on TokenizerTesterMixin
- added new slow test
- remove unused arg on tokenizer class
* make style
* Update src/transformers/models/bloom/tokenization_bloom_fast.py
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* make quality
* apply changes
- remove new attribute
- redefine test on the class
* add comments
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* add skeleton files
* fix cpu inference link
* add hint to make clear that single gpu section contains general info
* add new files to ToC
* update toctree to have subsection for performance
* add "coming soon" to the still empty sections
* fix missing title
* fix typo
* add reference to empty documents
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Feat: add missing type hints for QDQBertModel
* fix: ran black and isort
* feat: Add missing output type for QDQBertModel
* feat: Add type hints for QDQBertLMHeadModel and models starting with QDQBertFor
* fix: add missing return type for QDQBertModel
* fix: remove wrong return type for QDQBertEmbeddings
* fix: readded config argument to load_tf_weights_in_qdqbert
* fix: add BertConfig type to BertEmbeddings config due t checko error in ci
* fix: removed config type hints to avoid copy checks
* Add logits_processor parameter, used by `generate`, to `Seq2SeqTrainer` methods `evaluate` and `predict`
* Add all generate parameters to `Seq2SeqTrainer`, and also to `QuestionAnsweringSeq2SeqTrainer` which overrides it
* Remove `self._num_beams` from trainer classes
* - Run fixup
- Fix "Constraint" not exposed
- Fix synced_gpus to actually read from param
* Use kwargs
* Copy kwargs before making changes to it
* Fix style issues unused imports
- Fix `top_k_top_p_filtering` not passing `filter_value` to
`TopPLogitsWarper` causing any top-p filtered logits to be -inf
instead of specified value
- Add corresponding test
* Add final_layer_norm to OPT model
* Add JAX and TF version
* Fix Keras name
* Woops
* Allow for non breaking change
* Apply suggestions from code review
* add tests
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* rename to check_pt_flax_outputs
* update check_pt_flax_outputs
* use 5e-5 for BigBird PT/Flax test
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Prepare CI for v0.8.0
* pin hfh (revert before merge)
* Revert "pin hfh (revert before merge)"
This reverts commit a0103140e1c77b810ffcb735192968bc03be3e1f.
* Test rc3
* Test latest rc
* Unpin to the RC
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* Fix docstrings and variable names
* Rename x to something better
* Improve messages
* Fix docstrings and add test for greyscale images
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* deduplication draft
* update style
* update style test
* dummy test main
* rename modules
* rename functions
* return extremes in deduplicate_clusters
* update style
* cast str for gzip
* update doc string
* time processing
* use dataset map to compute minhash
* fill value for short token
* remove da map method
* update style
* use share object to multiprocess
* update style
* use f-string and minor fix
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Loubna Ben Allal <44069155+loubnabnl@users.noreply.github.com>
* update style
* use module parameters
* change ds_dedup to ds_filter
* save ds_dedup
* mv test to script tests
* make jaccard threshold a parameter of deduplicate_dataset
* update style
* add doc strings
* update style
* add doc string for DuplicationIndex
* save files into data dir
* update readme
* Update examples/research_projects/codeparrot/README.md
Co-authored-by: Loubna Ben Allal <44069155+loubnabnl@users.noreply.github.com>
* make near deduplication optional
* move near deduplication in README
* Update examples/research_projects/codeparrot/README.md
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* use f string
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Loubna Ben Allal <44069155+loubnabnl@users.noreply.github.com>
On line 180, `torch.tensor(-1.0, xxx)` gives the error "TypeError: 'float' object cannot be interpreted as an integer"
This is because the dtype here is `int64`. For `dtype=int64`, this needs to simply be `-1`.
This impacts the long-t5-tglogbal-x model. It does not impact the long-t5-local-x version which does not appear to call this line.
* Use torch.finfo(self.dtype).min
* for GPTNeoX
* for Albert
* For Splinter
* Update src/transformers/models/data2vec/modeling_data2vec_audio.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix -inf used in Bart-like models
* Fix a few remaining -inf
* more fix
* clean up
* For CLIP
* For FSMT
* clean up
* fix test
* Add dtype argument and use it for LayoutLMv3
* update FlaxLongT5Attention
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Added translation of installation.mdx to Portuguese, as well
as default templates of _toctree.yml and _config.py
* [ build_documentation.yml ] - Updated doc_builder to build
documentation in Portuguese.
[ pipeline_tutorial.mdx ] - Created translation for the pipeline_tutorial.mdx.
* [ build_pr_documentation.yml ] - Added pt language to pr_documentation builder.
[ pipeline_tutorial.mdx ] - Grammar changes.
* [ accelerate.mdx ] - Translated to Portuguese the acceleration tutorial.
* [ multilingual.mdx ] - Added portuguese translation for multilingual tutorial.
[ training.mdx ] - Added portuguese translation for training tutorial.
* [ preprocessing.mdx ] - WIP
* Update _toctree.yml
* Adding Pré-processamento to _toctree.yml
* Update accelerate.mdx
* Nits and eliminate preprocessing file while it is ready
* [ index.mdx ] - Translated to Portuguese the index apresentation page.
* [ docs/source/pt ] - Updated _toctree.yml to match newest translations.
* Fix build_pr_documentation.yml
* Fix index nits
* nits in _toctree
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* Fix eval to compute rouge correctly for rouge_score
* styling
* moving sentence tokenization to utils from run_eval
* saving ckpt in mlflow
* use existing format of args
* fix documentation
Co-authored-by: Swetha Mandava <smandava@nvidia.com>
* Migrate HFDeepSpeedConfig from trfrs to accelerate
* add `accelerate` to testing dep
* addressing comments
* addressing comments
Using `_shared_state` and avoiding object creation. This is necessary as `notebook_launcher` in `launcers.py` checks `len(AcceleratorState._shared_state)>0` to throw an error.
* resolving comments
1. Use simple API from accelerate to manage the deepspeed config integration
2. Update the related documentation
* reverting changes and addressing comments
* docstring correction
* addressing nits
* addressing nits
* addressing nits 3
* bumping up the accelerate version to 0.10.0
* resolving import
* update setup.py to include deepspeed dependencies
* Update dependency_versions_table.py
* fixing imports
* reverting changes to CI dependencies for "run_tests_pipelines_tf*" tests
These changes didn't help with resolving the failures and I believe this needs to be addressed in another PR.
* removing `accelerate` as hard dependency
Resolves issues related to CI Tests
* adding `accelerate` as dependency for building docs
resolves failure in Build PR Documentation test
* adding `accelerate` as dependency in "dev" to resolve doc build issue
* resolving comments
1. adding `accelerate` to extras["all"]
2. Including check for accelerate too before import HFDeepSpeedConfig from there
Co-Authored-By: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* resolving comments
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* rembert: fix python codeblock
* rembert: use correct google/rembert checkpoint name in documentation
* rembert: use correct google/rembert checkpoint name in TF documentation
* add new bloom classes
* (feat) add bloom classification tests; make style
* style: change import in test
* add some typehints to bloom classes
* merge main into branch
* fix: input checking in bloom seq classification
* fix tests
* change model class tests
* fix few tests
- more tests should pass
- one test left
* make token classifier return hidden states
* style: make BLOOM typehints consistent
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Initial commit
* Make some fixes
* Make PT model full forward pass
* Drop TF & Flax implementation, fix copies etc
* Add Flax model and update some corresponding stuff
* Drop some TF things
* Update config and flax local attn
* Add encoder_attention_type to config
* .
* Update docs
* Do some cleansing
* Fix some issues -> make style; add some docs
* Fix position_bias + mask addition + Update tests
* Fix repo consistency
* Fix model consistency by removing flax operation over attn_mask
* [WIP] Add PT TGlobal LongT5
* .
* [WIP] Add flax tglobal model
* [WIP] Update flax model to use the right attention type in the encoder
* Fix flax tglobal model forward pass
* Make the use of global_relative_attention_bias
* Add test suites for TGlobal model
* Fix minor bugs, clean code
* Fix pt-flax equivalence though not convinced with correctness
* Fix LocalAttn implementation to match the original impl. + update READMEs
* Few updates
* Update: [Flax] improve large model init and loading #16148
* Add ckpt conversion script accoring to #16853 + handle torch device placement
* Minor updates to conversion script.
* Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM
* gpu support + dtype fix
* Apply some suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* * Remove (de)parallelize stuff
* Edit shape comments
* Update README.md
* make fix-copies
* Remove caching logic for local & tglobal attention
* Apply another batch of suggestions from code review
* Add missing checkpoints
* Format converting scripts
* Drop (de)parallelize links from longT5 mdx
* Fix converting script + revert config file change
* Revert "Remove caching logic for local & tglobal attention"
This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46.
* Stash caching logic in Flax model
* Make side relative bias used always
* Drop caching logic in PT model
* Return side bias as it was
* Drop all remaining model parallel logic
* Remove clamp statements
* Move test files to the proper place
* Update docs with new version of hf-doc-builder
* Fix test imports
* Make some minor improvements
* Add missing checkpoints to docs
* Make TGlobal model compatible with torch.onnx.export
* Replace some np.ndarray with jnp.ndarray
* Fix TGlobal for ONNX conversion + update docs
* fix _make_global_fixed_block_ids and masked neg value
* update flax model
* style and quality
* fix imports
* remove load_tf_weights_in_longt5 from init and fix copies
* add slow test for TGlobal model
* typo fix
* Drop obsolete is_parallelizable and one warning
* Update __init__ files to fix repo-consistency
* fix pipeline test
* Fix some device placements
* [wip]: Update tests -- need to generate summaries to update expected_summary
* Fix quality
* Update LongT5 model card
* Update (slow) summarization tests
* make style
* rename checkpoitns
* finish
* fix flax tests
Co-authored-by: phungvanduy <pvduy23@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patil-suraj <surajp815@gmail.com>
* enable cpu distribution training using mpirun
*command like
* mpirun -n 2 python3 run_qa.py --no_cuda --xpu_backend ccl xxxx
*MASTER_ADDR and MASTER_PORT should be set as env
*export MASTER_ADDR=127.0.0.1
*export MASTER_PORT=29500
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* fix according to the review comment
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* use accelerate logic for cpu distribution training to set "RANK","LOCAL_RANK","WORLD_SIZE" environment
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* allow scope from trainer arg
* add ray_scope to training args
* escape double quotes
* make style && quality
* attempt to solve doc style issues
* splitting up URLs for style
* make fixup
* Update src/transformers/training_args.py
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
* make style
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
I'm guessing that the intention was to have the `_no_split_modules` class attribute for `GPTNeoXPreTrainedModel` to be set to `["GPTNeoXLayer"]`, akin to how its set as `["GPTJBlock"]` for `GPTJPreTrainedModel`.
If this is incorrect, please feel free to just close the PR.
Thanks!
* Raise RepoNotFoundError in case of 401
* Include changes from revert-17646-skip_repo_not_found
* Add a comment
* 💄 Code quality
* 💚 Update `get_from_cache` test
* 💚 Code quality & skip failing test
VisibleDeprecationWarning is addressed by specifying dtype=object when creating numpy array.
Update code based on review feedback.
Undo whitespace changes to tokenization_utils_base.py.
Co-authored-by: I like data <ilikedata@nym.hush.com>
When we're preparing the tensors for CPU for postprocessing, we need
to upgrade the `float16` to `float32` since CPUs don't have instructions
for `[b]float16`.
* Adding `top_k` and `sort` arguments to `text-classification` pipeline.
- Deprecate `return_all_scores` as `top_k` is more uniform with other
pipelines, and a superset of what `return_all_scores` can do.
BC is maintained though.
`return_all_scores=True` -> `top_k=None`
`return_all_scores=False` -> `top_k=1`
- Using `top_k` will imply sorting the results, but using no argument
will keep the results unsorted for backward compatibility.
* Remove `sort`.
* Fixing the test.
* Remove bad doc.
* Use shape_list to safely get shapes
* Add relevant test
* Tidy and add metrics
* Resolve dynamic shaping issues and move test
* Tidy up and all samples in batch
* Formatting
* adding template
* update model
* model update
* update conf for debug model
* update conversion
* update conversion script
* update conversion script
* fix missing keys check
* add tests to test the tokenizer in the local machine
* Change variable name
* add tests on xnli dataset
* add more description
* add descriptions + clearer code
* clearer code
* adding new tests + skipping few tests because of env problems
* change comment
* add dtype on the configuration
* add test embeddings
* add hardcoded test
* fix dtype issue
* adding torch.float16 to config
* adding more metrics (min, max, mean)
* add sum
* now the test passes with almost equal
* add files for conversion - test passes on cpu gpu
* add final changes
* cleaning code
* add new args in the docstring
* fix one liner function
* remove macros
* remove forward attention
* clean up init funtion
* add comments on the issue
* rm scale mask softmax
* do make style
* fix dtype in init
* fixing for loop on att probs
* fix style with black
* fix style + doc error
* fix and debug CI errors (docs + style)
* some updates
- change new operations
- finally add scaled softmax
- added new args in the config
* make use cache working
* add changes
- save sharded models
- final changes on the modeling script
* add changes
- comment on alibi
- add TODO on seq length
* test commit
- added a text to test the commit
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
* final changes
- attention mask change
- generation works on BS176b
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
* changes - model + conversion
* move to correct dir
* put ,
* fex fixes
* fix tokenizer autodoc
* fix minor CI issues
* fix minor CI issues
* fix minor CI issues
* fix style issue
* fix minor import issues
* fix few issues
* remove def main on the test
* add require torch
* replace decorator with 'with'
* fix style
* change to bloom
* add quick fix tokenizer
* fix tokenizer file
* fix tokenizer
- merge tests
- small fixes
* fix import issue
* add bloom to readme
* fix consistency
* Update docs/source/en/model_doc/bloom.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
fix comment issues on file headers
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix doc issue
* small fix - modeling test
* some changes
- refactor some code
- taking into account reviews
- more tests should pass
- removed pruning tests
* remove useless division
* more tests should pass
* more tests should pass
* more tests should pass
* let's try this one
-add alibi offset
- remove all permutes to make the grad operations work
- finger crossed
* refactor
- refactor code
- style changes
- add new threshold for test
* major changes
- change BLOOM to Bloom
- add quick doc on bloom.mdx
- move embeddings test on modeling test
* modify readme
* small fixes
* small fix
- better threshold for a test
* remove old test file from fetcher
* fix small typo
* major change
- change BloomLMHead to BloomForCausalLM
* remove onnx config
* major changes
- refactor the code
- remove asserts
- change tol for test
* make style
* small change
* adding a slow test + commenting old ones for now
* make style
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make style
* fix duplicates
* cleaning comments on config
* clean a bit conversion file
* refacor a bit modeling file
* refactor tokenizer file
* fix tokenization test issue
* fix tokenization issue #2
* fix tokenization issue second try
* fix test issue
* make style + add suggestions
* change test fetcher
* try this one
- slow tests should pass
- finger crossed
* possible final changes
* make style
* try fix padding side issue
* fix side
* fix padding issue
* fix ko-readme
* fix config auto
* cleaning modeling file
* keep bloom in caps in ko
* update config docs
* remove pretraining_pp
* remove model parallel
* update config
- add correct config files
* fix duplicates
* fix fetcher
* fix refactor issue
- remove divide function
* try to remove alibi
* small fixes
- fix alibi
- remove seq length
- refactor a bit the code
* put correct values
- fix bos and eos token ids
* fix attention mask loop
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
* small fixes:
- remove skip bias add
* small fixes
- fix typo in readme
- fix typos in config
* small changes
- remove a test
- add reconstruction test
- change config
* small changes
- change Scaled Softmax to BloomScaledSoftmax
* small fixes
- fix alibi dtype
* major changes
- removing explicit dtype when loading modules
- fixing test args (torch_dtype=auto)
- add dosctring
* fix readmes
* major changes
- now bloom supports alibi shifting
- refactor a bit the code
- better test tolerance now
* refactor a bit
* refactor a bit
* put correct name on test
* change docstring
* small changes
- fix docstring modeling
- fix test tolerance
* fix small nit
- take dtype from tensors in the conversion script
* minor fix
- fix mdx issue
* minor fix
- change config docstring
* forward contrib credits from PR14084
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* apply modifications
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* resolve softmax upcast
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
* final changes modeling
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Merge commit 'd156898f3b9b2c990e5963f5030a7143d57921a2'
* merge commit
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* apply suggestions
Apply suggestions from Stas comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Fix gradient checkpointing
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* add slow but exact
* add accelerate compatibility
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
* forward contrib credits
Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix torch device on tests
* make style
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix nits
Co-authored-by: patrickvonplaten<patrickvonplaten@users.noreply.github.com>
* remove final nits
* fix doc
- add more details on the doc
- add links to checkpoints
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply suggestions
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
* put test torchscript to false
* Update src/transformers/models/bloom/modeling_bloom.py
Co-authored-by: justheuristic <justheuristic@gmail.com>
* fix alibi
- create alibi only once
* add small doc
* make quality
* replace torch.nn
* remove token type emb
* fix fused op + output bias
* add fused op
- now can control fused operation from config
* remove fused op
* make quality
* small changes
- remove unsed args on config
- removed bias gelu file
- make the model torchscriptable
- add torchscript slow tests
* Update src/transformers/models/bloom/modeling_bloom.py
* fix slow
* make style
* add accelerate support
* add bloom to deepspeed tests
* minor changes
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* minor change
* slow tests pass
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/model_doc/bloom.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* minor changes:
- change docstring
- add link to paper
Co-authored-by: Thomwolf <thomwolf@gmail.com>
Co-authored-by: Thomas Wolf <thomas@huggingface.co>
Co-authored-by: thomasw21 <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sIncerass <sheng.s@berkeley.edu>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com>
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
Co-authored-by: thomasw21 <thomasw21@users.noreply.github.com>
Co-authored-by: sgugger <sgugger@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrickvonplaten@users.noreply.github.com>
Co-authored-by: LysandreJik <LysandreJik@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: justheuristic <justheuristic@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>
* feat: initial implementation of data2vec segmentation model in TF.
* chore: minor corrections to make the segmenter work.
* chore: removed unncessary files.
* chore: add tests and other modifications.
* fix: loss computation for segmentation.
* chore: remove unused variable.
* chore: formatting.
* added a dummy adaptive pooling layer.
* removed unnecessary file.
* potentially add identifiers to layer names.
* fix: layer naming.
* chore: removed unnecessary print.
* Skipping unneeded test
* chore: add logging to debug tolerance.
* fix: segmentation tests for tfdata2vecvision
* chore: make style.
* fix: layer names, assertion to be resolved.
* Bumping test tolerance a bit
* chore: bump the tol in PT test.
Co-authored-by: matt <rocketknight1@gmail.com>
* Stricter pt-to-tf checks; Update docker image for related tests
* check all attributes in the output
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* added cbs to notebooks, made copy-paste error fix in generation_utils
* initial push for mctc model
* mctc feature extractor done
* added processor, tokenizer and their tests for MCTC. Have added an MCTC modeling test, adjusting model code accordingly.
* added processor, tokenizer and their tests for MCTC. Have added an MCTC modeling test, adjusting model code accordingly.
* passing attention, now struggling to figure out how attention masks make sense here
* works when excluding attention masks. ask later how one would integrate attention maskshere
* bizarre configuration error (model prefix comes first in config dict json and messes up the order)
* all passing but bizzarre config dict ordering issue when to_dict
* passing all major tests
* feature extraction, processor, tokenizer added & tests passing
* style & consistency & other logistical fixes
* copy paste fix
* model after feature extraction working
* commiting final feature extraction results; need to fix normalization
* feature extraction passing tests; probably should add tests on the specific flashlight-copied functions?
* delete print ; format code a bit
* fixing tests
* passing major tests
* fixing styles
* completed tokenization test with real example; not sure if these values are entirely correct.
* last test fixes from local
* reverting accidentally included custom setup configs
* remove load tf weights; fix config error
* testing couldnt import featureextractor
* fix docs
* fix docs
* resolving comments
* style fixes
* style fixes
* Update to MCTCConv1dSubSampler
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* relposemb fixes
* conv1d name issue; expecting config fail with paraentheses
* fix config issue
* fix config issue
* fix config issue
* change everything to MCTCT
* fixing naming change errors
* archive list
* copyrights and docs
* copyrights and docs
* copyrights and docs
* merge resolution
* move tests, fix to changed optionaldependency structure
* test directories changed
* fixing tests
* how to avoid tf tests?
* how to avoid tf tests?
* tests passing locally
* allow mctctprocessor imported any env
* allow mctctprocessor imported any env
* fixed second round of feedback, need to fix docs
* doc changes not being applied
* all fixed
* style fix
* feedback fixes
* fix copies and feature extraction style fix
* Update tests/models/visual_bert/test_modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* copy paste huggingface:main visual bert
* added eof newline to visual bert; all tests are passing otherwise
* fix slow tests by adding attention mask
* change model id to speechbrain
* make fix-copies
* fix readme unwanted deletes
* fixing readmes, make fix-copies
* consistent M-CTC-T naming
* Update src/transformers/models/mctct/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* all fixed but variable naming
* adjust double quotes
* fixed variable names
* copyright and mr quilter
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* correct slow tests
* make fix-copies
* Update src/transformers/models/mctct/configuration_mctct.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/mctct/configuration_mctct.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* m-ctc-t not mctct
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Quicktour Portuguese Translation
Translated quicktour.mdx until line 161
* Finished translating quicktour.mdx
Ready to upload and adjust eventual .mdx or translation mistakes.
* Add _toctree.yml and fix nits
* Fixed pt-br mdx syntax problem
Closed <frameworkcontent> instance
* Changed </frameworkcontent> line
* Copied missing block from english version of quicktour.mdx
* Reviwed the entire file once again. It should be working now.
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* Add examples telemetry
* Alternative approach
* Add to all other examples
* Add to templates as well
* Put framework separately
* Same for TensorFlow
* Add method to call to_tf_dataset() with column inference
* Add test for dataset creation
* Add a default arg for data collator
* Fix test
* Fix call with non-dev version of datasets
* Test correct column removal too
* make fixup
* More tests to make sure we remove unwanted columns
* Fix test to avoid predicting on unbuilt models
* Fix test to avoid predicting on unbuilt models
* Fix test to remove unwanted head mask columns from inputs
* Stop pushing your debug breakpoints to the main repo of the $2bn company you work for
* Skip the test in convnext because no grouped conv support
* Drop bools from the dataset dict
* Make style
* Skip the training test for models whose input dicts don't give us labels
* Skip transformerXL in the test because it doesn't return a simple loss
* Skip TFTapas because of some odd NaN losses
* make style
* make fixup
* Add docstring
* fixup
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Remove breakpoint from tests
* Fix assert, add requires_backends
* Protect tokenizer import with if TYPE_CHECKING
* make fixup
* Add noqa, more fixup
* More rearranging for ~* aesthetics *~
* Adding defaults for shuffle and batch_size to match to_tf_dataset()
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add the Italian translation of the file installation.mdx and edit _toctree
* Add the Italian translation of the file installation.mdx and edit _toctree
This PR updates our Expert Acceleration Program image with a new image featuring our experts.
This is similar to our Transformers/README.md image update that has proven to be successful.
* Add gated-silu to t5 architecture to support UL2
* Fix error message
* formatting
* formatting again
* refactor
* fix classnames in _init_weights
* remove is_gated
* add test
* fix test
* Try without the test?
* Add back the test.
* Improve error message.
Co-authored-by: Daniel Hesslow <daniel@lighton.ai>
* Implemented loss for training AudioFrameClassification
* reported changes in wav2vec2 main class and used make copies to propagate
* running black for code formatting
* print more lib. versions and just befor test runs
* update print_env_pt.py
* rename to print_env
* Disable warning + better job name
* print python version
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* add a test for a word only input
* make LukeForMaskedLM work without entity inputs
* update test
* add LukeForMaskedLM to MODEL_FOR_MASKED_LM_MAPPING_NAMES
* restore pyproject.toml
* empty line at the end of pyproject.toml
I think you mean to accept either an instance of `PreTrainedTokenizer` or `PreTrainedTokenizerFast` inside of the `pipeline(...)` factory function, if the `tokenizer` argument isn't a `str`.
* initial commit
* add init file
* update globakl init
* update index and dummy objects
* style
* update modelling auto
* fix initi typo in src/transformers
* fix typo in modeling tf auto, opt was in wrong mapping name
* fixed a slow test : saved_model
* style
* fix positionnal embedding if no position id is provided
* update tf test
* update test flax requirements
* fixed serialization
* update
* update tf name to allow smooth convertion
* update flax tests
* style
* fix test typo
* fix tf typo test
* add xla for generate support in causal LM
* fixed bug
* cleaned tf tests
* style
* removed from PT for slow tests
* fix typp
* opt test as slow
* trying to fix GPT2 undefined
* correct documentation and add to test doc
* update tf doc
* fix doc
* fake commit
* Apply suggestions from code review
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* update test based on review
* merged main layer for functionning test
* fixup + quality
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* update long comment
* make fix copies
Co-authored-by: Arthur <arthur@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [Json dump] Make json prettier
* correct more tokenizeirs
* more patterns
* add aggressive test
* the aggressive test was actually useful :-)
* more tests
* Apply suggestions from code review
* Accumulate tokens into batches in PreTrainedTokenizerBase.add_tokens()
For tokenizers with a small number of special tokens or special tokens
with consecutive token IDs, this reduces the time complexity of creating
the trie from quadratic to linear, see also #16936.
* Extend explanation of batching added tokens
* Setup for Italian translation and add first document
- Add 'it' folder for files translated into Italian
- Add _config.py and _toctree.yml files
- Add translation of quicktour.mdx
* Fix style issue of italian documentation files
* Add 'it' to the languages section in the .github/workflows
* Remove - installation from _toctree for Italian
* Translation for index file
- Add index to _toctree.yml
- Add translation of index.mdx
* Fix typo in docs/source/it/index.mdx
* Translate code comments in docs/source/it/_config.py
Co-authored-by: Martina Fumanelli <martinafumanelli@Martinas-MBP.homenet.telecomitalia.it>
* Add onnx configuration for xlm
* Add supported features for xlm
* Add xlm to models exportable with onnx
* Add xlm architecture to test file
* Modify docs
* Make code quality fixes
* Support for Bart and LayoutLM, and partial support for XLNet
* Support for mbart
* A lot of new models supported
* Support for other models
* LayoutLM fix
* Use strings instead of classes
* Make forward pass work
* More improvements
* Remove unused imports
* Remove timm dependency
* Improve loss calculation of token classifier
* Fix most tests
* Add docs
* Add model integration test
* Make all tests pass
* Add LayoutLMv3FeatureExtractor
* Improve integration test + make fixup
* Add example script
* Fix style
* Add LayoutLMv3Processor
* Fix style
* Add option to add visual labels
* Make more tokenizer tests pass
* Fix more tests
* Make more tests pass
* Fix bug and improve docs
* Fix import of processors
* Improve docstrings
* Fix toctree and improve docs
* Fix auto tokenizer
* Move tests to model folder
* Move tests to model folder
* change default behavior add_prefix_space
* add prefix space for fast
* add_prefix_spcae set to True for Fast
* no space before `unique_no_split` token
* add test to hightligh special treatment of added tokens
* fix `test_batch_encode_dynamic_overflowing` by building a long enough example
* fix `test_full_tokenizer` with add_prefix_token
* Fix tokenizer integration test
* Make the code more readable
* Add tests for LayoutLMv3Processor
* Fix style
* Add model to README and update init
* Apply suggestions from code review
* Replace asserts by value errors
* Add suggestion by @ducviet00
* Add model to doc tests
* Simplify script
* Improve README
* a step ahead to fix
* Update pair_input_test
* Make all tokenizer tests pass - phew
* Make style
* Add LayoutLMv3 to CI job
* Fix auto mapping
* Fix CI job name
* Make all processor tests pass
* Make tests of LayoutLMv2 and LayoutXLM consistent
* Add copied from statements to fast tokenizer
* Add copied from statements to slow tokenizer
* Remove add_visual_labels attribute
* Fix tests
* Add link to notebooks
* Improve docs of LayoutLMv3Processor
* Fix reference to section
Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Initial work
* More or less finished with first draft
* Update src/transformers/modeling_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix randomly initialized weights
* Update src/transformers/modeling_utils.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
* Address review comments
* Rename DeepSpeed folder to temporarily fix the test issue?
* Revert to try if Accelerate fix works
* Use latest Accelerate release
* Quality and fixes
* Style
* Quality
* Add doc
* Test + fix
* More blocks
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
* Fix torch.jit.script and pickling issues
* Fix get_attr issues
* Fix import in function
* Fix GPT-J and T5 tracing for torch=1.11
* Gate graph surgery on torch version
* Modeling minor changes to enable TorchScripting
* Model serialization / deserialization test
* Remove _assert_is_none users
* add inference example to LayoutLMv2ForQuestionAnswering, passing doctest
* add loss example to LayoutLMv2ForQuestionAnswering, passing doctest
* Add correct doctest for LayoutLMv2ForTokenClassification, passing doctest
* add correct doctest for LayoutLMv2ForSequenceClassification, passing test
* add correct doctest for LayoutLMv2Model, passing test
* make fixup
* fix to address review comments
* make style
* fix doctest line break issue, add to documentaiton_tests.txt, address review comments
* move comment about layoutlmv2 dependencies to the doc page
* format doc page as suggested
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* delete extraneous backtick
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* average loss over batches and accumulated steps for tracking
* fix layernorm weight decay
* use AdamW from Pytorch instead of Transformers
* add shuffling of sequences inside the batches
* add shuffling of sequences inside the batches
* add logging dir and reformat code
* fix lr tracking
* remove Mistral scaling
* keep Mistral scaling
* reformat code
* fix error
* fix error
* use shuffling function from Pytorch
* remove argument for shuffling batch sequences as it isn't optional
* update package versions and install accelerate from source
* remove unused package
* Update loss average over accumulated steps
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* Update loss average over accumulated steps
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* use one shuffle buffer argument
* compute avg_loss in one line
Co-authored-by: Loubna ben allal <loubnabenallal@gmail.com>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* [BC] Fixing usage of text pairs
The BC is actually preventing users from misusing the pipeline since
users could have been willing to send text pairs and the pipeline would
instead understand the thing as a batch returning bogus results.
The correct usage of text pairs is preserved in this PR even when that
makes the code clunky.
Adds support for {"text":..,, "text_pair": ...} inputs for both dataset
iteration and more explicit usage to pairs.
* Updating the doc.
* Update src/transformers/pipelines/text_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/pipelines/text_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/pipelines/test_pipelines_text_classification.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* quality.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Fix length in no_trainer examples
* Add setup and teardown
* Use new accelerator config generator to automatically make tests able to run based on environment
* Add information gain filtration algorithm
* Complying with black requirements
* Added author
* Fixed import order
* flake8 corrections
Co-authored-by: Javier Turek <javier.turek@intel.com>
* added type hints to prophetnet
* reformatted with black
* fix bc black misformatted some parts
* fix imports
* fix imports
* Update src/transformers/models/prophetnet/configuration_prophetnet.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* update OPTIONAL type hint and docstring
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* [LED] fixed global_attention_mask not passed for generation + docs clarification for gradient checkpointing
* LED docs clarification
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [LED] gradient_checkpointing=True should be passed to TrainingArguments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [LED] docs: remove wrong word
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [LED] docs fix typo
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
- Add --ignore_mismatched_sizes argument to classification examples
- Expand the error message when loading a model whose head dimensions are different from expected dimensions
* Initial commit
* Better label renaming
* Remove breakpoint before pushing (this is your job)
* Test a lot more in the Keras fit() test
* make fixup
* Clarify the case where we flatten y dicts into tensors
* Clarify the case where we flatten y dicts into tensors
* Extract label name remapping to a method
* Automatically sort auto mappings
* Better class extraction
* Some auto class magic
* Adapt test and underlying behavior
* Remove re-used config
* Quality
* fixed bug run_mlm_flax_stream.py
Fixed bug caused by an update to tokenizer keys introduced in recent transformers versions (between `4.6.2` and `4.18.0`) where additional keys were introduced to the tokenizer output.
* Update run_mlm_flax_stream.py
* adding missing paranthesis
* formatted to black
* remove cols from dataset instead
* reformat to black
* moved rem. columns to map
* formatted to black
Co-authored-by: KennethEnevoldsen <kennethcenevolsen@gmail.com>
* [doc] performance/scalability revamp
* link the new docs
* no :
* mixed precision
* work on the first doc
* expand the main doc
* Trigger CI
* style
* revamp single GPU training section
* work on training performance
* remove files not used anymore or will be added later
* final touches
* fix rebase
* Add hardware section to toctree
* fix toctree again
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* remove `fast_tokenizers` entry that was copied in rebase
* add warning about DP vs DDP
* remove todo
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix missing closure of codeblock
* Update docs/source/en/perf_train_gpu_many.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* sync with #16860
* update toc
Co-authored-by: leandro <leandro.vonwerra@spoud.io>
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial
* Delete docs/source/pt-br directory
* [ fast_tokenizers.mdx ] - Continuing work on file
* [ fast_tokenizers.mdx ] - Continuing work on file
* Add fast tokenizers to _toctree.yml
* Eliminated config and toctree.yml
* Nits in fast_tokenizers.mdx
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* Added translation of installation.mdx to Portuguese, as well
as default templates of _toctree.yml and _config.py
* [ build_documentation.yml ] - Updated doc_builder to build
documentation in Portuguese.
[ pipeline_tutorial.mdx ] - Created translation for the pipeline_tutorial.mdx.
* [ build_pr_documentation.yml ] - Added pt language to pr_documentation builder.
[ pipeline_tutorial.mdx ] - Grammar changes.
* [ accelerate.mdx ] - Translated to Portuguese the acceleration tutorial.
* [ multilingual.mdx ] - Added portuguese translation for multilingual tutorial.
[ training.mdx ] - Added portuguese translation for training tutorial.
* [ preprocessing.mdx ] - WIP
* Update _toctree.yml
* Adding Pré-processamento to _toctree.yml
* Update accelerate.mdx
* Nits and eliminate preprocessing file while it is ready
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* Add test to ensure models can take int64 inputs
* is_integer is an attribute, not a method
* Fix test when some inputs aren't tensors
* Add casts to blenderbot and blenderbot-small
* Add casts to the other failing models
## Motivation
We are going to use a new blob account to store the checkpoints.
## Modification
Modify the azure blob storage URLs for BEiT checkpoints.
* First version - OPT model
* Final changes
- putting use cache to False
* few changes
- remove commented block
* few changes
- remove unecessary files
* fix style issues
* few changes
- remove a test file
- added the logits test
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add gen tests
* few changes
- rm mask filling example on docstring
* few changes
- remove useless args
* some changes
- more tests should pass now
- needs to clean more
- documentation still needs to be done
* fix code quality
* major changes
- change attention architecture to BART-like
- modify some tests
- style fix
* rm useless classes
- remove opt for:
- QA
- cond generation
- seq classif
* Removed autodoc calls to non-existant classes
TOkenizers are not implemented
* Update src/transformers/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/__init__.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/auto/modeling_tf_auto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Replaced OPTTokeniser with GPT2 tokenizer
* added GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokenizer")
* Removed OPTTokenizer
* make style
* Make style replaces
``` ...).unsqueeze(```
by
``` >>>).unsqueeze(```
* make repo consistency
* Removed PretrainedOPTModel
* fix opt.mdx removed other heads
* fix init, removed 3 heads
* removed heads
* finished cleaning head
* removed seauence classif and question answering
* removed unused imports
* removed useless dummy object for QA, SC and CG
* removed tests for removed useless dummy object for QA, SC and CG
* Removed head_mask using encoder layers which don't exist
* fixed test
* fix line
* added OPT to toctree
* Updated model path with pushed weigths
* fix model path
* fixed code quality
* fixed embeddings and generation tests
* update paths
* clean comments
* removed OPTClassificationHead for sentence classification
* renamed hidden layer
* renamed num layers to standard num_hidden_layers
* num_attention_heads fix
* changes for 125m
* add first version for 125m
* add first version - flax
* add new version
* causal LM output
* replace output type with BaseModelOutputWithPastAndCrossAttentions
* revert working config from 150m to 350m
* clean
* removed decoder input ids
* fixed embed dim
* more embed_dim issues
* make style + removed enc_dec test
* update falx model
* removed troublesome copy
* added is_encoder_decoder=False to config
* added set_input emb fuinction to model class
* requires torch on embed test
* use head mask instead of decoder head mask input param solves a test
* 8 test remaining, update
* Updated create_and_check_decoder_model_past_large_inputs
* Make style
* update op tokenizer with condition
* make style
* See if I can push
* some clean up
* remove linear head hack
* save intermediate
* save correct attention
* add copied from from bart
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix part of the reviewss
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* same changes in naming / conversion
* correct mask
* more fixes
* delete FlaxOPT and TfOPT
* clean traces of Flax and Tf
* fix mask
* fixed positionnal embedding length when past key value is provoded
* get 125m, 6.7b to work
* Added do_layer_norm
* solved mismatch in load dictionnary
* clean up preapre opt input dict
* fixed past key value as bool
* fix previus
* fixed return dict False tuple issue
* All tests are passing
* Make style
* Ignore OPTDecoder non tested
* make fix-copies
* make repo consistency
* small fix
* removed uselss @torch.no_grad decorator
* make styl;e
* fix previous opt test
* style
* make style
* added opt documentation
* update OPT_PRETRAINED_MODEL_ARCHIVE_LIST
* up
* more fixes
* model & config work
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* added comment on padding hack (+2)
* cleaup
* review update
* docstring for missing arg
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/opt/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update pretrained map
* update path and tests
* make style
* styling
* make consistency
* add gpt2 tok new
* more tok fixes
* Update src/transformers/models/auto/tokenization_auto.py
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/en/model_doc/opt.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/models/opt/test_modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/opt/modeling_opt.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update based on reviews
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* make style
* make tokenizer auto tests pass
* apply Lysandre suggestion
* finish tests
* add some good tokenizer tests
* improve docs slighly
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Change nits in Spanish for quicktour.mdx
- Add tasks names in English too.
- Fix small nits in Spanish
* Translate index.mdx to Spanish
* Translate body of index.
* Translated the compatible models list (not the papers´ names). Since this should not be updated manually, I can come back to the original text.
* Add models and a dataset for Spanish in the code exmaples
* Replaced the English models to Spanish versions.
* Add index to _toctree.yml and fix Spanish
* Fix double ““ error
* Change negative example in ASR example
* make style
* Debug style in quicktour.mdx
* [WIP] Add FLAVA model
This PR aims to add [FLAVA](ihttps://arxiv.org/abs/2112.04482) model to the transformers repo.
Following checklist delineates the list of things to be done for this PR
to be complete:
[x] Flava init
[x] Flava base models
[x] Flava layers
[x] Flava Configs
[x] Flava encoders
[x] Flava pretraining models
[ ] Flava classification/retrieval models (To be added in a separate PR)
[x] Documentation updates
[x] Imports updates
[x] Argstring updates
[x] Flava pretrained checkpoints
[x] Flava tests
[x] Flava processors
[x] Sanity check
[x] Lint
* add seed worker and set_deterministic_seed_for_cuda function to enforce reproducability
* change function name to enable determinism, add docstrings, reproducability support for tf
* change function name to enable_determinism_for_distributed_training
* revert changes in set_seed and call set_seed within enable_full_determinism
* add one position argument for seed_worker function
* add full_determinism flag in training args and call enable_full_determinism when it is true
* add enable_full_determinism to documentation
* apply make fixup after the last commit
* Update src/transformers/training_args.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* unhardcode pretrained model path, make it a class var
* add tests for mobilebert tokenizer
* allow tempfiles for vocab & merge similarity test to autodelete
* add explanatory comments
* remove unused imports, let make style do its.. thing
* remove inheritance and use BERT tok tests for MobileBERT
* Update tests/mobilebert/test_tokenization_mobilebert.py
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* amend class names, remove unused import, add fix for mobilebert's hub pathname
* unhardcode pretrained model path, make it a class var
* add tests for mobilebert tokenizer
* allow tempfiles for vocab & merge similarity test to autodelete
* add explanatory comments
* remove unused imports, let make style do its.. thing
* remove inheritance and use BERT tok tests for MobileBERT
* Update tests/mobilebert/test_tokenization_mobilebert.py
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* amend class names, remove unused import, add fix for mobilebert's hub pathname
* amend paths for model tests being in models/ subdir of /tests
* explicitly rm test from prev path
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* add support for MLFLOW_FLATTEN_PARAMS
* ensure key is str
* fix style and update warning msg
* Empty commit to trigger CI
* fix bug in check_inits.py
* add unittest for flatten_dict utils
* fix 'NoneType' object is not callable on __del__
* add generic flatten_dict unittest to SPECIAL_MODULE_TO_TEST_MAP
* fix style
* ensure mlflow.end_run() is executed at end of training when mlflow.start_run() was executed by the callback
* add debug msg
* add support for MLFLOW_TAGS, MLFLOW_RUN_ID, and MLFLOW_NESTED_RUN
* update to support python 3.6+
* Validate env variables using ENV_VARS_TRUE_VALUES
* Empty-Commit
* PyTorch FSDP integration in Trainer
* reformatting
make style and make quality are now compliant.
* Updating dependency check
* Trigger CI
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* Add type hints for remaining BigBirdPegasus models
Here I added type hints to the BigBirdPegasusForCausalLM class.
* Add missing type hints for Data2VecText models
Added type hints to the Data2VecTextForCausalLM, Data2VecTextForMaskedLM,
Data2VecTextForMultipleChoice, Data2VecTextForQuestionAnswering,
Data2VecTextForSequenceClassification, and
Data2VecTextForTokenClassification classes.
* add get_overflowing_images function to ensure 1-to-1 mapping between samples and images in LayoutLMv2Processor
* make style
* add test for overflowing_tokens, change assert to ValueError, avoiding unrelated formatting changes
* change line length by passing --preview into black
* Added spanish translation of autoclass_tutorial.
Added 'local' and 'title' fields for autoclass_tutorial.
* Fixed autoclass_tutorial title in _toctree.yml and autoclass_tutorial.mdx
* CLIP Serving
* Add type hints per code review
* Use black, flake8, and isort
* Update src/transformers/models/clip/modeling_tf_clip.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Rollback serving_output and add TODO
* Remove irrelevant portions of failing tests
* Revert "Rollback serving_output and add TODO"
This reverts commit a4abfa6ba3b7875a13538dbc2ddc4eb17dfcca8d.
* Rollback to original test/serving_output
* Fix unused var
* Apply suggestions from code review
* Update formatting with black
* Fix style again from rebase
* Update tests/models/clip/test_modeling_tf_clip.py
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sean Moriarity <sean.l.moriarity.mil@army.mil>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Type hint complete Albert model file.
* Update typing.
* Update src/transformers/models/albert/modeling_albert.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* [T5 Tokenizer] Model has no fixed position ids - there is no hardcoded max length
* [T5 Tokenizer] Model has no fixed position ids - there is no hardcoded max length
* correct t5 tokenizer
* correct t5 tokenizer
* fix test
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* finish
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* First draft
* Add YolosForObjectDetection
* Make forward pass work
* Add mid position embeddings
* Add interpolation of position encodings
* Add expected values
* Add YOLOS to tests
* Add integration test
* Support tiny model as well
* Support all models in conversion script
* Remove mid_pe_size attribute
* Make more tests pass
* Add model to README and fix config
* Add copied from statements
* Rename base_model_prefix to vit
* Add missing YOLOS_PRETRAINED_CONFIG_ARCHIVE_MAP
* Apply suggestions from code review
* Apply more suggestions from code review
* Convert remaining checkpoints
* Improve docstrings
* Add YolosFeatureExtractor
* Add feature extractor to docs
* Add corresponding tests
* Fix style
* Fix docs
* Apply suggestion from code review
* Fix bad rebase
* Fix some more bad rebase
* Fix missing character
* Improve docs and variable names
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Add meta proxy
* Uses meta data to trace data dependent control-flow
* Remove commented class
* Handles torch creating functions
* Added type annotation to fix tracing
* Tracing works for everything but T5 and GPT-J
* Almost all previously supported models pass
* All architectures can be traced except T5
* Intermediate commit to have a trace of the comparison operators for HFProxy
* Everything works, except loss computation
* Everything works
* Removed unused import
* Overriden methods do not use underlying ops (linear and torch.matmul), and model attributes are copied to the traced version
* Fix torch_matmul_override
* Change attributes reference to deepcopy
* Remove breakpoint and add torch_index_override
* Small fix
* Fix typo
* Replace asserts by explicit exceptions
The emoji version must be either 0.5.4 or 0.6.0. Newer emoji versions have been updated to newer versions of the Emoji Charts, thus not consistent with the one used for pre-processing the pre-training Tweet corpus (i.e. not consistent with the vocab).
1. Fixes evaluation errors popping up when you train/eval on squad v2 (one was newly encountered and one that was previously reported Running SQuAD 1.0 sample command raises IndexError #15401 but not completely fixed).
2. Removes boolean arguments that don't use store_true. Please, don't use these: *ANY non-empty string is being converted to True in this case and this clearly is not the desired behavior (and it creates a LOT of confusion).
3. All no-trainer test scripts are now saving metric values in the same way (with the right prefix eval_), which is consistent with the trainer-based versions.
4. Adds forgotten model.eval() in the no-trainer versions. This improved some results, but not everything (see the discussion in the end). Please, see the F1 scores and the discussion below.
* Add first draft
* Improve script and README
* Improve README
* Apply suggestions from code review
* Improve script, add link to resulting model
* Add corresponding test
* Adjust learning rate
* Add doctest BERT
* make fixup
* fix typo
* change checkpoints
* make fixup
* define doctest output value, update doctest for mobilebert
* solve fix-copies
* update QA target start index and end index
* change checkpoint for docs and reuse defined variable
* Update src/transformers/models/bert/modeling_tf_bert.py
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* make fixup
* Add Doctest for Albert and Bigbird
* make fixup
* overwrite examples for Albert and Bigbird
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update longer examples for Bigbird
* using examples from squad_v2
* print out example text
* change name token-classification-big-bird checkpoint to random
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add tflops logging and fix grad accumulation
* add accelerate tracking and checkpointing
* scale loss of last batch correctly
* fix typo
* compress loss computation
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* add resume from checkpoint argument
* add load_state accelerate from checkpoint, register lr scheduler and add tflops function
* reformat code
* reformat code
* add condition on path for resume checkpoint
* combine if conditions
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* add source for tflops formula
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* add gptj to TOKENIZER_MAPPING_NAMES
* fix int32 to float to avoid problem in onnx
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: ChainYo <t.chaigneau.tc@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
- all activations should be fetched through ACT2FN
- it returns ReLU as `nn.Module`, which allows attaching hooks on the activation function and prints it to stdout when `print(model)`
* Adding support for `array` key in raw dictionnaries in ASR pipeline.
* ES .
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Making it work by not popping `array` first.
* Black 22.3
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Correct Logging of Eval metric to Tensorboard
An empty dictionary ``eval_metrics`` was being logged, is replaced by ``eval_metric`` which is the output dictionary of ``metric.compute()``.
* Remove unused variable
* Add doc about `attention_mask` on gpt2
Add a simple sentence describing how `attention_mask` needs to be constructed when ``past_key_values` is used.
* Add doc about attention_mask on gpt2_tf
* clean up style
* remove empty line white spaces
* remove whitespace in empty line
* Add first draft
* Improve README and run fixup
* Make script aligned with other scripts, improve README
* Improve script and add test
* Remove print statement
* Apply suggestions from code review
* Add num_labels to make test pass
* Improve README
* begin do_init
* add params_shape_tree
* raise error if params are accessed when do_init is False
* don't allow do_init=False when keys are missing
* make shape tree a property
* assign self._params at the end
* add test for do_init
* add do_init arg to all flax models
* fix param setting
* disbale do_init for composite models
* update test
* add do_init in FlaxBigBirdForMultipleChoice
* better names and errors
* improve test
* style
* add a warning when do_init=False
* remove extra if
* set params after _required_params
* add test for from_pretrained
* do_init => _do_init
* chage warning to info
* fix typo
* add params in init_weights
* add params to gpt neo init
* add params to init_weights
* update do_init test
* Trigger CI
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update template
* trigger CI
* style
* style
* fix template
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Solved href rendering issue in heading
Markdown references in headings such as '####' don't render well.
Replaced it with <h4>...<a></a></h> banners.
* PhonemeTokenizer optimization using phonemizer lib
The backend should only be initialized once, otherwise it is reloaded.
Added `init_backend` function, intializes a backend attribute.
Phonemize re-uses self.backend.
Should give ~10 times faster phonemization.
* formatted file with make style
* Documentation suggestion
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update /tokenization_wav2vec2_phoneme.py based on PR suggestion
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update CONTRIBUTING.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add first draft from previous PR
* First draft
* Improve README and remove num_labels
* Make script more aligned with other scripts
* Improve README and apply suggestion from code review
* Add passing encoder_outputs as tuple to existing test
* Add check for tuple
* Add check for tuple also for speech and vision
Co-authored-by: jsnfly <jsnfly@gmx.de>
* Improve code
* Fix bugs
* Fix another bug
* Clean up DTP as well
* Update DPT model outputs
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* [trainer / deepspeed] fix hyperparameter_search
* require optuna
* style
* oops
* add dep in the right place
* create deepspeed-testing dep group
* Trigger CI
* Change tracking to store_true
* Remove step param and use it in the log dictionary directly
* use vars(args) when passing args to init_trackers
* Include tracking tests since tensorboard is already a dep
* Improve CTRL doctests
* Fix `CTRLForSequenceClassification` flakiness with inconsistent losses
* Remove unused
* Fixup
* Add CTRL to documentation_tests.txt
* Fix control code not being first
* Add output assertions
* Change from sshleifer/tiny-ctrl -> ctrl
* Run `make fixup`
* apply `list` to output logits shape for clarity
* Reduce output loss precision to make assertion more robust
* Add assertion of control code being first
* Fix docstyle
* upper case sentence following control code
* Weird bug fixes
* Add a better generation example
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Required the values GPTJ unfortunately cannot run the model =)
* Added the file to the doc tests
* Run Fixup and Style
* Fixed with the test versions of gptj. Ran Style and Fixup.
* Trigger ci
* A Minor Change to License
* Fixed spacing added to the benchmark_utils. Then refactored tests to const variables.
* Removed strings that were included as default parameters anyways.
Co-authored-by: ArEnSc <xx.mike.chung.xx@gmail.com>
* Fix setters of *_token_id properties of SpecialTokensMixin
* Test setters of common tokens ids
* Move to a separate test checks of setters of tokens ids
* Add independent test for ByT5
* Add Canine test
* Test speech to text
* Change the chunk_iter function to handle
the subtle cases where the last chunk gets ignored since all the
data is in the `left_strided` data.
We need to remove the right striding on the previous item.
* Remove commented line.
This avoids an unnecessary call and avoids problems during
initialization of class hierarchies.
Co-authored-by: Samuel Melm <samuel.melm@stud.uni-heidelberg.de>
* First Pass All Tests Pass
* WIP
* Adding file to documentation tests
* Change the base model for the example in the doc test.
* Fix Code Styling by running
make fixup
* Called Style
* Reverted to gpt2 model rather than distill gpt2
Then used a token classification model over a sequence model for an example.
* Fix Styling Issue
* Hopefully ignores the formatting issue.
Co-authored-by: ArEnSc <xx.mike.chung.xx@gmail.com>
* Fix t5 shard on TPU Pods
The current script doesn't work properly on a TPU pod because the global batch is not divided correctly per host.
This pull request fixes this issue by dividing the global batch to each host before it is shared on each host.
* fix style
Co-authored-by: ahmed-elnaggar <ahmed.elnaggar@allianz.com>
I create an archive of older checkpoints during training the checkpoint has a name with `f"{checkpoint_prefix}-*.zip/.tar `
previously `glob(f"{checkpoint_prefix}-*")` takes all files/folders starting with the name checkpoint, and later `shutil.rmtree(checkpoint)` takes a folder name; since at some point it my get a zip file; it crashes training; adding this `if os.path.isdir(x)` allows only folders on `glob_checkpoints`
* add simple multi gpu complet
* add human_eval_multi_gpu
* use copy strategy to distribute across gpu, to avoid padding
* add doc string
* update code style
* use task id to arrange output
* truncate input to avoid zero pad
* Stop the copy mechanism
* update style
* restore copies to scale better in distributed mode
* update style
* replace human eval
* Apply suggestions from code review
1. Tokenize all input at the same time
2. use attention_mask to get the input length
3. other small fixes
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* correct typo and update docstring
* update code style
* remove num sample division constraint
* remove max len calculation
* use accelerator.gather once to speed up
* use accelerate set_seed; update accelerate version
* correct gather bug
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* update proto sentencepiece model
* Revert "update proto sentencepiece model"
This reverts commit b07f671747fec35773d0b3d4788b8b15aefa0229.
* add check
* add test
* Revert "Revert "update proto sentencepiece model""
This reverts commit 46108257b8927b73627ec8f4f3eed53a95fc700d.
* test for log level
* test for log level 2
* warning at the warning level
* clean
* format
* add explanation in docstring
* Fixed some bugs involving saving during epochs
* Added tests mimicking the existing examples tests
* Added in json exporting to all `no_trainer` examples for consistency
* Add TapexTokenizer
* Improve docstrings and provide option to provide answer
* Remove option for pretokenized inputs
* Add TAPEX to README
* Fix copies
* Remove option for pretokenized inputs
* Initial commit: add tapex fine-tuning examples on both table-based question answering and table-based fact verification.
* - Draft a README file for running the script and introducing some background.
- Remove unused code lines in tabfact script.
- Disable the deafult `pad_to_max_length` option which is memory-consuming.
* * Support `as_target_tokenizer` function for TapexTokenizer.
* Fix the do_lower_case behaviour of TapexTokenizer.
* Add unit tests for target scenarios and cased/uncased scenarios for both source and target.
* * Replace the label BartTokenizer with TapexTokenizer's as_target_tokenizer function.
* Fix typos in tapex example README.
* * fix the evaluation script - remove the property `task_name`
* * Make the label space more clear for tabfact tasks
* * Using a new fine-tuning script for tapex-base on tabfact.
* * Remove the lowercase code outside the tokenizer - we use the tokenizer to control whether do_lower_case
* Guarantee the hyper-parameter can be run without out-of-memory on 16GB card and report the new reproduced number on wikisql
* * Remove the default tokenizer_name option.
* Provide evaluation command.
* * Support for WikiTableQuestion dataset.
* Fix a typo in README.
* * Fix the datasets's key name in WikiTableQuestions
* Run make fixup and move test to folder
* Fix quality
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Apply suggestions from code review
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply some more suggestions from code review
* Improve docstrings
* Overwrite failing test
* Improve comment in example scripts
* Fix rebase
* Add TAPEX to Auto mapping
* Add TAPEX to auto config mappings
* Put TAPEX higher than BART in auto mapping
* Add TAPEX to doc tests
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
Co-authored-by: SivilTaram <qianlxc@outlook.com>
Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Update README.md Support Image
Updates the Support image linking to our EAP page (to give it a refresh + help avoid image fatigue).
Slack thread checking in with #open-source-internal on this update (https://huggingface.slack.com/archives/C021H1P1HKR/p1648838903316709)
* Compressed Updated Support image
* Improves Support Image Logo + Height
Updated the image based on logo + size feedback. Big thanks to Bibi for making quick edits to this image.
* Add inputs vector to calculate metric method
* Include inputs for evaluation metrics with backwards compatibility
* Prevent inputs create OOM issue and documentation details
* Update style and code documentation
* Fix style formatting issues
* Update files format with make style
* Use CLIP model's config for some fields (if specified) instead of those of vision & text components.
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* 📝 add image/vision classification and asr
* 🖍 minor formatting fixes
* Fixed a typo in legacy seq2seq_trainer.py (#16531)
* Add ONNX export for BeiT (#16498)
* Add beit onnx conversion support
* Updated docs
* Added cross reference to ViT ONNX config
* call on_train_end when trial is pruned (#16536)
* Type hints added (#16529)
* Fix Bart type hints (#16297)
* Add type hints to PLBart PyTorch
* Remove pending merge conflicts
* Fix PLBart Type Hints
* Add changes from review
* Add VisualBert type hints (#16544)
* Adding missing type hints for mBART model (PyTorch) (#16429)
* added type hints for mbart tensorflow tf implementation
* Adding missing type hints for mBART model
Tensorflow Implementation model added with missing type hints
* Missing Type hints - correction
For TF model
* Code fixup using make quality tests
* Hint types - typo error
* make fix-copies and make fixup
* type hints
* updated files
* type hints update
* making dependent modesls coherent
Co-authored-by: matt <rocketknight1@gmail.com>
* Remove MBart subclass of XLMRoberta in tokenzier docs (#16546)
* Remove MBart subclass of XLMRoberta in tokenzier
* Fix style
* Copy docs from MBart50 tokenizer
* Use random_attention_mask for TF tests (#16517)
* use random_attention_mask for TF tests
* Fix for TFCLIP test (for now).
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Improve code example (#16450)
Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home>
* Pin tokenizers version <0.13 (#16539)
* Pin tokenizers version <0.13
* Style
* Add code samples for TF speech models (#16494)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* [FlaxSpeechEncoderDecoder] Fix dtype bug (#16581)
* [FlaxSpeechEncoderDecoder] Fix dtype bug
* more fixes
* Making the impossible to connect error actually report the right URL. (#16446)
* Fix flax import in __init__.py: modeling_xglm -> modeling_flax_xglm (#16556)
* Add utility to find model labels (#16526)
* Add utility to find model labels
* Use it in the Trainer
* Update src/transformers/utils/generic.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Quality
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Enable doc in Spanish (#16518)
* Reorganize doc for multilingual support
* Fix style
* Style
* Toc trees
* Adapt templates
* Add use_auth to load_datasets for private datasets to PT and TF examples (#16521)
* fix formatting and remove use_auth
* Add use_auth_token to Flax examples
* add a test checking the format of `convert_tokens_to_string`'s output (#16540)
* add new tests
* add comment to overridden tests
* TF: Finalize `unpack_inputs`-related changes (#16499)
* Add unpack_inputs to remaining models
* removed kwargs to `call()` in TF models
* fix TF T5 tests
* [SpeechEncoderDecoderModel] Correct Encoder Last Hidden State Output (#16586)
* initialize the default rank set on TrainerState (#16530)
* initialize the default rank set on TrainerState
* fix style
* Trigger doc build
* Fix CI: test_inference_for_pretraining in ViTMAEModelTest (#16591)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* add a template to add missing tokenization test (#16553)
* add a template to add missing tokenization test
* add cookiecutter setting
* improve doc
* Update templates/adding_a_missing_tokenization_test/README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* made _load_pretrained_model_low_mem static + bug fix (#16548)
* handle torch_dtype in low cpu mem usage (#16580)
* [Doctests] Correct filenaming (#16599)
* [Doctests] Correct filenaming
* improve quicktour
* make style
* Adding new train_step logic to make things less confusing for users (#15994)
* Adding new train_step logic to make things less confusing for users
* DO NOT ASK WHY WE NEED THAT SUBCLASS
* Metrics now working, at least for single-output models with type annotations!
* Updates and TODOs for the new train_step
* Make fixup
* Temporary test workaround until T5 has types
* Temporary test workaround until T5 has types
* I think this actually works! Needs a lot of tests though
* MAke style/quality
* Revert changes to T5 tests
* Deleting the aforementioned unmentionable subclass
* Deleting the aforementioned unmentionable subclass
* Adding a Keras API test
* Style fixes
* Removing unneeded TODO and comments
* Update test_step too
* Stop trying to compute metrics with the dummy_loss, patch up test
* Make style
* make fixup
* Docstring cleanup
* make fixup
* make fixup
* Stop expanding 1D input tensors when using dummy loss
* Adjust T5 test given the new compile()
* make fixup
* Skipping test for convnext
* Removing old T5-specific Keras test now that we have a common one
* make fixup
* make fixup
* Only skip convnext test on CPU
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Avoiding TF import issues
* make fixup
* Update compile() to support TF 2.3
* Skipping model.fit() on template classes for now
* Skipping model.fit() on template class tests for now
* Replace ad-hoc solution with find_labels
* make fixup
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Adding missing type hints for BigBird model (#16555)
* added type hints for mbart tensorflow tf implementation
* Adding missing type hints for mBART model
Tensorflow Implementation model added with missing type hints
* Missing Type hints - correction
For TF model
* Code fixup using make quality tests
* Hint types - typo error
* make fix-copies and make fixup
* type hints
* updated files
* type hints update
* making dependent modesls coherent
* Type hints for BigBird
* removing typos
Co-authored-by: matt <rocketknight1@gmail.com>
* [deepspeed] fix typo, adjust config name (#16597)
* 🖍 apply feedback
Co-authored-by: Cathy <815244047@qq.com>
Co-authored-by: Jim Rohrer <jrohrer1@gmail.com>
Co-authored-by: Ferdinand Schlatt <fschlatt@gmail.com>
Co-authored-by: Dahlbomii <101373053+Dahlbomii@users.noreply.github.com>
Co-authored-by: Gunjan Chhablani <chhablani.gunjan@gmail.com>
Co-authored-by: Rishav Chandra Varma <rishavchandra.v16@iiits.in>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Karim Foda <35491698+KMFODA@users.noreply.github.com>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Andres Codas <andrescodas@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
Co-authored-by: Francesco Saverio Zuppichini <francesco.zuppichini@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
If global_attention_mask is found in the models inputs (used by certain
models, like LED) in the prediction_step method of Seq2SeqTrainer,
it is added to the gen_kwargs, which are passed to model.decode().
This allows us to properly set the global attention when decoding.
* added type hints for mbart tensorflow tf implementation
* Adding missing type hints for mBART model
Tensorflow Implementation model added with missing type hints
* Missing Type hints - correction
For TF model
* Code fixup using make quality tests
* Hint types - typo error
* make fix-copies and make fixup
* type hints
* updated files
* type hints update
* making dependent modesls coherent
* Type hints for BigBird
* removing typos
Co-authored-by: matt <rocketknight1@gmail.com>
* Adding new train_step logic to make things less confusing for users
* DO NOT ASK WHY WE NEED THAT SUBCLASS
* Metrics now working, at least for single-output models with type annotations!
* Updates and TODOs for the new train_step
* Make fixup
* Temporary test workaround until T5 has types
* Temporary test workaround until T5 has types
* I think this actually works! Needs a lot of tests though
* MAke style/quality
* Revert changes to T5 tests
* Deleting the aforementioned unmentionable subclass
* Deleting the aforementioned unmentionable subclass
* Adding a Keras API test
* Style fixes
* Removing unneeded TODO and comments
* Update test_step too
* Stop trying to compute metrics with the dummy_loss, patch up test
* Make style
* make fixup
* Docstring cleanup
* make fixup
* make fixup
* Stop expanding 1D input tensors when using dummy loss
* Adjust T5 test given the new compile()
* make fixup
* Skipping test for convnext
* Removing old T5-specific Keras test now that we have a common one
* make fixup
* make fixup
* Only skip convnext test on CPU
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_tf_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Avoiding TF import issues
* make fixup
* Update compile() to support TF 2.3
* Skipping model.fit() on template classes for now
* Skipping model.fit() on template class tests for now
* Replace ad-hoc solution with find_labels
* make fixup
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* added type hints for mbart tensorflow tf implementation
* Adding missing type hints for mBART model
Tensorflow Implementation model added with missing type hints
* Missing Type hints - correction
For TF model
* Code fixup using make quality tests
* Hint types - typo error
* make fix-copies and make fixup
* type hints
* updated files
* type hints update
* making dependent modesls coherent
Co-authored-by: matt <rocketknight1@gmail.com>
* Translate accelerate.mdx from english to spanish
* Update docs/source_es/accelerate.mdx
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* Apply suggestions from code review
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* Apply suggestions from code review
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* Fix nits and finish translation
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* make tuple annotation more specific to avoid failures during symbolic_trace
* make tuple annotation more specific to avoid failures during symbolic_trace
* first proposal
* replace model outputs in various models
* conflicts
* docstring
* update poolformer
* minor change in docstring
* CI
* removed poolformer specific outputs from doc
* removed convnext specific outputs from doc
* CI
* weird char in segformer
* conversations
* reverted docstring for BaseModelOutputWithPooling
* update outputs
* changed docstring in BaseModelOutput
* updated docstring in modeling outputs
* typos :)
* fixed typo after copy & paste it all around
* CI
* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* segformer
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* feature extractor accepts
* resolved conversations
* added examples in test for ADE20K
* num_classes -> num_labels
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* resolving conversations
* resolving conversations
* removed ADE
* CI
* minor changes in conversion script
* reduce_labels in feature extractor
* minor changes
* correct preprocess for instace segmentation maps
* minor changes
* minor changes
* CI
* debugging
* better padding
* going to update labels inside the model
* going to update labels inside the model
* minor changes
* tests
* removed changes in feature_extractor_utils
* conversation
* conversation
* example in feature extractor
* more docstring in modeling
* test
* make style
* doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add unpack_inputs decorator to Main Layer
* add unpack_inputs decorator to Model
* add unpack_inputs decorator to LMHead Model
* add unpack_inputs decorator to Double Head Model
* add unpack_inputs decorator to Sequence Classification Model
* run fixup recipe
* make unpack_inputs the first decorator
* ported TFViTMAEIntermediate and TFViTMAEOutput.
* added TFViTMAEModel and TFViTMAEDecoder.
* feat: added a noise argument in the implementation for reproducibility.
* feat: vit mae models with an additional noise argument for reproducibility.
Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add type hints for UniSpeech
* Added type hints for UniSpeechSat
* Added type hints for Wave2Vec2 (PT)
* Added type hints for models dependent of wave2vec
* Fix for test_mixed_precision
* Fix test_saved_model_creation by using shape_list instead of shape
* skit test_model_from_pretrained on GPU for now to avoid GPU OOM
* skip test_gptj_sample_max_time for now
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Added type hints for PyTorch T5 model
* removed a type hint
* ran make style
* added type hints for ibert pytorch
* added type hints for lxmert pytorch
* removed kwargs type hint and fixed arguments order
* Add missing type hints for ConvBERT flavored models.
* Update src/transformers/models/convbert/modeling_convbert.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Type hints and TF decorator added
* Re-add XLA generation method
* Re-add lines that were deleted by conflicting updates
* Re-add lines that were deleted by conflicting updates
* Re-add lines that were deleted by conflicting updates
Co-authored-by: matt <rocketknight1@gmail.com>
* Make BigBird model compatiable to fp16 dtype.
* Use tree_map instead of map
* Reformat the code
* Fix import order
* Convert masks to the correct dtype
* Fix format issue
* Address comments.
* Created the Decision Transformer Modle
* updating tests, copy to other machine
* Added last hidden size to Decision Transformer modelling outputs
* Removed copy of original DT file
* made a temporary change to gpt2 to have it conform with the Decision Transformer version
* Updated tests
* Ignoring a file used to test the DT model
* added comments to config file
* added comments and argument descriptions to decision transformer file
* Updated doc
* Ran "make style"
* Remove old model imports
* Removed unused imports, cleaned up init file
* Update docs/source/model_doc/decision_transformer.mdx
added my username
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Reverted changes made to gpt2
* Removed datasets submodule
* Update the modeling outputs to include gpt2 attentions, hidden states and last hidden states
* Added support for return of hidden states, attentions and return dict of gpt2 model.
* Updated tests to include many of the ModelTesterMixin tests.
The following tests are skipped: test_generate_without_input_ids, test_pruning, test_resize_embeddings, test_head_masking, test_attention_outputs, test_hidden_states_output, test_inputs_embeds, test_model_common_attributes
* Added missing line to the end of gpt2 file
* Added an integration test for the Decision Transformer
Test performs and autoregressive evaluation for two time steps
* Set done and info to _ to fix failing test
* Updated integration test to be deterministic and check expected outputs
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Removed unnecessary config options
* Cleaned up commented code and old comments.
* Cleaned up commented code.
* Changed DecisionTransformer to Decision Transformer
* Added Decision Transformer to the main README file
* Added copy of GTP2 called DecisionTranformerGPT2Model
* isorted imports
* isorted imports
* Added model to non-English README files
* Ran make fix-copies and corrected some cases.
* Updated index file to include Decision Transformer
* Added gpt2 model as copy inside the Decision Transformer model file
* Added the unit test file to the list of TEST_FILES_WITH_NO_COMMON_TESTS
* Deleted redundant checkpoint files (I don't know how these got committed)
* Removed testing files. (These should have never been committed)
* Removed accidentally committed files
* Moved the Decision Transformer test to its own directory
* Add type hints for Pegasus (#16324)
* Funnel type hints (#16323)
* add pt funnel type hints
* add tf funnel type hints
* Add type hints for ProphetNet PyTorch (#16272)
* [GLPN] Improve docs (#16331)
* Add link to notebook
* Add link
* Fix bug
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Added type hints for Pytorch Marian calls (#16200)
* Added type hinting for forward functions in pytorch marian
* typo correction
* Removed type hints on functions from BART per Suraj Patil request
* fix import pb
* fix typo
* corrected tuple call
* ran black
* after fix-copies
Some optional tags on primitives were removed, past_key_values in MarianForCausalLM changed from Tuple of Tuple to List
* Fixing copies to roformer and pegasus
Co-authored-by: Clementine Fourrier <cfourrie@inria.fr>
Co-authored-by: matt <rocketknight1@gmail.com>
* Moved DecisionTransformOutput to modeling_decision_transformer
* Moved the example usage to research project and cleaned comments
* Made tests ignore the copy of gpt2 in Decision Transformer
* Added module output to modelling decision transformer
* removed copied gpt2 model from list of transformers models
* Updated tests and created __init__ file for new test location
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Removed unneeded summary type from config file
* Fixed copies
* Updated pretrained config map to refer to hopper-medium checkpoint
* done (#16340)
* Added Decision transformer to model docs
* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add type annotations for Rembert/Splinter and copies (#16338)
* undo black autoformat
* minor fix to rembert forward with default
* make fix-copies, make quality
* Adding types to template model
* Removing List from the template types
* Remove `Optional` from a couple of types that don't accept `None`
Co-authored-by: matt <rocketknight1@gmail.com>
* [Bug template] Shift responsibilities for long-range (#16344)
* Fix code repetition in serialization guide (#16346)
* Adopt framework-specific blocks for content (#16342)
* ✨ refactor code samples with framework-specific blocks
* ✨ update training.mdx
* 🖍 apply feedback
* Updates the default branch from master to main (#16326)
* Updates the default branch from master to main
* Links from `master` to `main`
* Typo
* Update examples/flax/README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Updated model with custom docstring example
* Created the Decision Transformer Modle
* updating tests, copy to other machine
* Added last hidden size to Decision Transformer modelling outputs
* Removed copy of original DT file
* made a temporary change to gpt2 to have it conform with the Decision Transformer version
* Updated tests
* Ignoring a file used to test the DT model
* added comments to config file
* added comments and argument descriptions to decision transformer file
* Updated doc
* Ran "make style"
* Remove old model imports
* Removed unused imports, cleaned up init file
* Update docs/source/model_doc/decision_transformer.mdx
added my username
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Reverted changes made to gpt2
* Removed datasets submodule
* Update the modeling outputs to include gpt2 attentions, hidden states and last hidden states
* Added support for return of hidden states, attentions and return dict of gpt2 model.
* Updated tests to include many of the ModelTesterMixin tests.
The following tests are skipped: test_generate_without_input_ids, test_pruning, test_resize_embeddings, test_head_masking, test_attention_outputs, test_hidden_states_output, test_inputs_embeds, test_model_common_attributes
* Added missing line to the end of gpt2 file
* Added an integration test for the Decision Transformer
Test performs and autoregressive evaluation for two time steps
* Set done and info to _ to fix failing test
* Updated integration test to be deterministic and check expected outputs
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Removed unnecessary config options
* Cleaned up commented code and old comments.
* Cleaned up commented code.
* Changed DecisionTransformer to Decision Transformer
* Added Decision Transformer to the main README file
* Added copy of GTP2 called DecisionTranformerGPT2Model
* isorted imports
* isorted imports
* Added model to non-English README files
* Ran make fix-copies and corrected some cases.
* Updated index file to include Decision Transformer
* Added gpt2 model as copy inside the Decision Transformer model file
* Added the unit test file to the list of TEST_FILES_WITH_NO_COMMON_TESTS
* Deleted redundant checkpoint files (I don't know how these got committed)
* Removed testing files. (These should have never been committed)
* Removed accidentally committed files
* Moved the Decision Transformer test to its own directory
* Moved DecisionTransformOutput to modeling_decision_transformer
* Moved the example usage to research project and cleaned comments
* Made tests ignore the copy of gpt2 in Decision Transformer
* Added module output to modelling decision transformer
* removed copied gpt2 model from list of transformers models
* Updated tests and created __init__ file for new test location
* Update README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Removed unneeded summary type from config file
* Fixed copies
* Updated pretrained config map to refer to hopper-medium checkpoint
* Added Decision transformer to model docs
* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/modeling_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/decision_transformer/configuration_decision_transformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Updated model with custom docstring example
* Updated copies, config auto, and readme files.
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Dan Tegzes <48134725+Tegzes@users.noreply.github.com>
Co-authored-by: Adam Montgomerie <adam@avanssion.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: Clementine Fourrier <cfourrie@inria.fr>
Co-authored-by: matt <rocketknight1@gmail.com>
Co-authored-by: Francesco Saverio Zuppichini <francesco.zuppichini@gmail.com>
Co-authored-by: Jacob Dineen <54680234+jacobdineen@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
* Make Transformers use cache files when hf.co is down
* Fix tests
* Was there a random circleCI failure?
* Isolate patches
* Style
* Comment out the failure since it doesn't fail anymore
* Better comment
* added type hints for mbart tensorflow tf implementation
* Adding missing type hints for mBART model
Tensorflow Implementation model added with missing type hints
* Missing Type hints - correction
For TF model
* Code fixup using make quality tests
* Hint types - typo error
* make fix-copies and make fixup
* type hints
* updated files
Co-authored-by: matt <rocketknight1@gmail.com>
* Initial commit
* Reversed signs, adjusted log entery.
* Check only when
* Cleanup checks
* Only trigger if we want to eval
* Run
* Move changes to callback
* Split file_utils in several submodules
* Fixes
* Add back more objects
* More fixes
* Who exactly decided to import that from there?
* Second suggestion to code with code review
* Revert wront move
* Fix imports
* Adapt all imports
* Adapt all imports everywhere
* Revert this import, will fix in a separate commit
* undo black autoformat
* minor fix to rembert forward with default
* make fix-copies, make quality
* Adding types to template model
* Removing List from the template types
* Remove `Optional` from a couple of types that don't accept `None`
Co-authored-by: matt <rocketknight1@gmail.com>
* Added type hinting for forward functions in pytorch marian
* typo correction
* Removed type hints on functions from BART per Suraj Patil request
* fix import pb
* fix typo
* corrected tuple call
* ran black
* after fix-copies
Some optional tags on primitives were removed, past_key_values in MarianForCausalLM changed from Tuple of Tuple to List
* Fixing copies to roformer and pegasus
Co-authored-by: Clementine Fourrier <cfourrie@inria.fr>
Co-authored-by: matt <rocketknight1@gmail.com>
* Add type annotations for TF Longformer
* Update docstring data types to include numpy array
* Implement unpack_inputs decorator
* fixup after decorator updates
* Numpy array -> np.ndarray in docstring
Co-authored-by: Johnny Greco <johnny.greco@radpartners.com>
* Add Flaubert to ONNX to make it available for conversion.
* Fixed features for FlauBERT. fixup command remove flaubert to docs list.
Co-authored-by: ChainYo <t.chaigneau.tc@gmail.com>
* Remove unused attributes
* Add link to blog and add clarification about input size
* Improve readability of the code
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* Add typing hints for base model class
* Add typing hints for causal LM model class
* Add typing hints for double heads model class
* Add typing hints for sequence classification model class
* Add typing hints for Main Layer
* Run fixup
* added type hints for BART model
* make fixup, adding imports to copied files
* Adding some missing types to cookiecutter
* Adding some missing types to cookiecutter
* Adding some missing types to cookiecutter
Co-authored-by: matt <rocketknight1@gmail.com>
* Update training.mdx
Fixed Error Raised Due to Wrongly Accessing Training Sample
* Ran make style
* Revert to Old Commit
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* up
* up
* up
* fix
* yeh
* ups
* Empty test commit
* correct quicktour
* correct
* correct
* up
* up
* uP
* uP
* up
* up
* uP
* up
* up
* up
* up
* up
* up
* up
* up
* up
* up
* Update src/transformers/models/van/modeling_van.py
* finish
* apply suggestions
* remove folder
* revert to daily testing
* Aggressive PT/TF equivalence test on PT side
* Ugly fix for `TFTapasForQuestionAnswering`
* apply review suggestions
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* value check for typical sampling
* value check for typical sampling
* change from float to int comparison
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Override _pad in LEDTokenizer
* Override _pad in LEDTokenizerFast
* add Copied from
* calling the super method
* add comment about -1
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Attention mask is important in the case of batching...
* Improve the fix.
* Making the sentence different enough that they exhibit different
predictions.
* Update expected slices for pillow > 9
* Add expected slices depending on pillow version
* Add different slices depending on pillow version for other models
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
* add types annotations for Beit (PyTorch)
* add types annotations for ViT (PyTorch)
* add types annotations for Deit (PyTorch)
* change Optional[bool] to bool into some places at Beit
* change Optional[bool] to bool into some places at ViT
* padding done
* correctly return one attention per layer
* almost correct, attentions are not flatten one tuple per stage
* tests green
* doc
* conversations
* reshaping hidden_states
* view in the test
* reshape_hidden_states in Encoder and Model
* new outputs with reshaped_hidden_states
* conversations
* doc
* Update docs/source/model_doc/swin.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* conversations
* fix tests
* minor changes
* resolved conversations
* attentions one per stage
* typo
* typos
* typos
* function signature
* CI
* clean up tests
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Minor fixes
* Fix vocab union
* Update examples/research_projects/xtreme-s/README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update README
* unused import
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix the last element in `hidden_states`
* fix missing elements in outputs for FlaxWav2Vec2EncoderLayerStableLayerNormCollection
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add type hints for FlauBERT PyTorch Base model. Others downstream tasks are inherited from XLM RoBERTa.
* Add type hints for FlaubERT Tensorflow models.
* fix output for TFFlaubertWithLMHeadModel
* First attempt at TF XLA generation
* Fix comments
* Update XLA greedy generate with direct XLA calls
* Support attention mask, prepare_inputs_for_generation no longer hardcoded for greedy
* Handle position_ids correctly
* make xla generate work for non xla case
* force using xla generate
* refactor
* more fixes
* finish cleaning
* finish
* finish
* clean gpt2 tests
* add gpt2 tests
* correct more cases
* up
* finish
* finish
* more fixes
* flake 8 stuff
* final rag fix
* Update src/transformers/models/rag/modeling_tf_rag.py
* finish t5 as well
* finish
* Update src/transformers/generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix inconsistent example variable naming
- Example code for a sequence classification in Tensorflow had spelling mistakes and incorrect and inconsistent naming
- Changed variable naming to be consistent with the two other TF examples
* Fix incorrect incorrect training examples
* Added spanish translation of quicktour.mdx
* Suggestions applied in the revision of the translation
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* first commit
* ResNet model correctly implemented.
basic modeling + weights conversion is done
removed unused doc
mdx file
doc and conversion script
added feature_extractor to auto
test
minor changes + style + quality
doc
test
Delete process.yml
A left over from my attempt of running circleci locally
* minor changes
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* new test format
* minor changes from conversations
* minor changes from conversations
* make style + quality
* readded the tests
* test + README
* minor changes from conversations
* error in README
* make fix-copies
* removed regression for classification head
* make quality
* fixed loss control flow
* fixed loss control flow
* resolved conversations
* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* READMEs
* index.mdx
* minor changes
* updated tests and models
* unused import
* outputs
* Update docs/source/model_doc/resnet.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* added embeddings_size
* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* conversation
* added push to hub
* test
* embedding_size
* make fix-copies
* resolved conversations
* CI
* changed organization
* minor changes
* CI
* minor changes
* conversations
* conversation
* doc
* tests
* removed unused docstring
* conversation
* removed unused outputs
* CI
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add type hints for LukeModel
* Add type hints for entitypairclassification
* Remove blank space
Co-authored-by: bhavika <bhavika@debian-BULLSEYE-live-builder-AMD64>
* Add type hints for TFDistilBert
* Update src/transformers/models/distilbert/modeling_tf_distilbert.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Spanish translation of the file training.mdx
* Settings - Spanish translation of the file training.mdx
* Latest changes to the Spanish translation of the training.mdx file
* Delete Hugging.mdx
* Last changes to the training fil Espanish version
* Latest modifications
* Latest changes, document ready for PR
* Nits
Co-authored-by: Yhary Arias <yharystefa@gmail.com>
Co-authored-by: Omar U. Espejel <espejelomar@gmail.com>
* Change unpacking of TF mobilebert inputs to use decorator
* Move unpack_inputs as the top decorator
* make fixup
Co-authored-by: ChienVM <chien_vm@detomo.co.jp>
* Make TF pt-tf equivalence test more aggressive
* Fix for TFConvNextModelTest and TFTransfoXLModelTest
* fix kwargs for outputs
* clean-up
* Add docstring for check_outputs()
* remove: need to rename encoder-decoder
* clean-up
* send PyTorch things to the correct device
* Add back the accidentally removed test case in test_pt_tf_model_equivalence()
* Fix: change to tuple before calling check_outputs()
* Fix: tfo could be a list
* use to_tuple()
* allow tfo only to be tuple or tensor
* allow tfo to be list or tuple for now + style change
* minor fix
* remove np.copy and update comments
* tfo -> tf_output, same for pt
* Add more detailed comment
* remove the incorrect comment
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add missing type hints for all flavors of LayoutLMv2 PyTorch models.
* Fixed return types and added type hints for LayoutLM.
* Fix removed arguments which breaks tests.
* add possibility to softly regulate length when using sampling method in model.generate() function
* fix test config, fix formatting
* fix rag integration, fix docstyling
* fix wrong docstring
* change param to tuple, add test
* fix old param in rag_model, remove unused import
* change test according to new param
* fix formatting
* fix test case
* fix doc style
* move start_length calculation to Logitprocessor
* add possibility to softly regulate length when using sampling method in model.generate() function
* fix rag integration, fix docstyling
* fix test config, fix formatting
* change param to tuple, add test
* fix old param in rag_model, remove unused import
* add possibility to softly regulate length when using sampling method in model.generate() function
* change param to tuple, add test
* fix old param in rag_model, remove unused import
* remove unused import
* fix small errors
* fix test
* add possibility to softly regulate length when using sampling method in model.generate() function
* fix test config, fix formatting
* fix rag integration, fix docstyling
* change param to tuple, add test
* fix old param in rag_model, remove unused import
* change test according to new param
* fix test case
* move start_length calculation to Logitprocessor
* add possibility to softly regulate length when using sampling method in model.generate() function
* fix rag integration, fix docstyling
* fix test config, fix formatting
* change param to tuple, add test
* fix old param in rag_model, remove unused import
* add possibility to softly regulate length when using sampling method in model.generate() function
* fix test config, fix formatting
* fix rag integration, fix docstyling
* add possibility to softly regulate length when using sampling method in model.generate() function
* fix rag integration, fix docstyling
* change param to tuple, add test
* fix old param in rag_model, remove unused import
* fix small errors
* Update src/transformers/generation_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/generation_utils.py
* Update src/transformers/generation_utils.py
* fix docstring, add type ind model rag
* fix docstrings
* introduce seq_length variable for cleaner code
* fix black formatting
* add input_ids_seq_length to modeling_rag
* add input_ids_seq_length to test
* retrigger checks
* retrigger checks
Co-authored-by: Kevin Bondzio <kev@AIM-LAP-02.local>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Kevin Bondzio <kev@AIM-LAP-02.fritz.box>
* Fix duplicate arguments passed to dummy inputs in ONNX export
* Fix M2M100 ONNX config
* Ensure we check PreTrained model only if torch is available
* Remove TensorFlow tests for models without PyTorch parity
* MVP
* apply decorator to TFBertModel
* finish updating bert
* update rembert (copy-linked to bert)
* update roberta (copy-linked to bert); Fix args
* Now working for non-text modalities
* Build the doc in a seperate folder then move it
* Allow job
* Is this it?
* Dislike comments?
* Copy instead of move
* Removing version built
* Typos
* No variable
* Take _versions.yml into account
* Finish main job and add dev job
* Forgot the run
* Fix syntax error
* Execute builder from the repo
* Typo
* Add ONNX support for ViT
* Refactor to use generic preprocessor
* Add vision dep to tests
* Extend ONNX slow tests to ViT
* Add dummy image generator
* Use model_type to determine modality
* Add deprecation warnings for tokenizer argument
* Add warning when overwriting the preprocessor
* Add optional args to docstrings
* Add minimum PyTorch version to OnnxConfig
* Refactor OnnxConfig class variables from CONSTANT_NAME to snake_case
* Add reasonable value for default atol
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* test
* up
* up
* Empty test commit
* up
* update tests
* up
* fix some vision models
* correct
* correct docs
* Trigger notification
* finalize
* check
* correct quicktour
* Apply suggestions from code review
* improve doctests
* Trigger Build
* next try
* next try
* and again
* Output current clone information
* Output current clone information
* Correct path
* add tf round again
* revert to daily job
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Seed get_train_sampler's generator with arg seed to improve reproducibility
and make the world_size<=1 code path more similar to the others
* move test file into trainer test explicitly
* dumb typo
* make style lint happy
* per discussion, switch to data_seed
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* added classes to get started with constrained beam search
* in progress, think i can directly force tokens now but not yet with the round robin
* think now i have total control, now need to code the bank selection
* technically works as desired, need to optimize and fix design choices leading to undersirable outputs
* complete PR #1 without disjunctive decoding
* removed incorrect tests
* Delete k.txt
* Delete test.py
* Delete test.sh
* revert changes to test scripts
* genutils
* full implementation with testing, no disjunctive yet
* shifted docs
* passing all tests realistically ran locally
* removing accidentally included print statements
* fixed source of error in initial PR test
* fixing the get_device() vs device trap
* fixed documentation docstrings about constrained_beam_search
* fixed tests having failing for Speech2TextModel's floating point inputs
* fix cuda long tensor
* added examples and testing for them and founx & fixed a bug in beam_search and constrained_beam_search
* deleted accidentally added test halting code with assert False
* code reformat
* Update tests/test_generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_generation_utils.py
* fixing based on comments on PR
* took out the testing code that should but work fails without the beam search moditification ; style changes
* fixing comments issues
* docstrings for ConstraintListState
* typo in PhrsalConstraint docstring
* docstrings improvements
* finished adding what is sort of an opinionated implementation of disjunctive generation, but it revealed errors in inner beam search logic during testing.
* fixed bug found in constrained beam search that used beam_idx that were not global across all the batches
* disjunctive constraint working 100% correctly
* passing all tests
* Accidentally included mlruns
* Update src/transformers/generation_beam_constraints.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/generation_beam_constraints.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* complete overhaul of type complexities and other nits
* strict type checks in generate()
* fixing second round of feedback by narsil
* fixed failing generation test because of type check overhaul
* generation test fail fix
* fixing test fails
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Do not change the output from tuple to list - to match PT's version
* Fix the same issues for 5 other models and the template
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix to support fast tokenizer with `CLIPProcessor`
* Update CLIPProcessor test for fast tokenizer
* Fix Docstring Style
* Rename into meaningful Variable name in test code
* send PyTorch inputs to the correct device
* Fix: TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Update delete-dev-doc job to match build-dev-doc
* More debug info
* More debug info
* Stash if needed
* Remove the comment update
* Fix paths
* Wtf is going on..
* Fix git status test
* Try another way
* I don't understand what's happening
* Bash shell
* What's happening now...
* What's happening now...
* Try like this
* Back to trying to use bash
* And like that?
* Refine tests
* Stash after adding new files
* Stash after adding new files
* Proper commit sha and PR number
* Address review comments
* Add TF logits wrappers
* Add sample method
* add tests for TF logit wrappers
* TF generate sample tests now run on CPU
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* maskformer
* conflicts
* conflicts
* minor fixes
* feature extractor test fix
refactor MaskFormerLoss following conversation
MaskFormer related types should not trigger a module time import error
missed one
removed all the types that are not used
update config mapping
minor updates in the doc
resolved conversation that doesn't need a discussion
minor changes
resolved conversations
fixed DetrDecoder
* minor changes
minor changes
fixed mdx file
test feature_extractor return types
functional losses -> classes
removed the return type test for the feature extractor
minor changes + style + quality
* conflicts?
* rebase master
* readme
* added missing files
* deleded poolformers test that where in the wrong palce
* CI
* minor changes
* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* resolved conversations
* minor changes
* conversations
[Unispeech] Fix slow tests (#15818)
* remove soundfile old way of loading audio
* Adapt slow test
[Barthez Tokenizer] Fix saving (#15815)
[TFXLNet] Correct tf xlnet generate (#15822)
* [TFXLNet] Correct tf xlnet
* adapt test comment
Fix the push run (#15807)
Fix semantic segmentation pipeline test (#15826)
Fix dummy_inputs() to dummy_inputs in symbolic_trace doc (#15776)
Add model specific output classes to PoolFormer model docs (#15746)
* Added model specific output classes to poolformer docs
* Fixed Segformer typo in Poolformer docs
Adding the option to return_timestamps on pure CTC ASR models. (#15792)
* Adding the option to return_timestamps on pure CTC ASR models.
* Remove `math.prod` which was introduced in Python 3.8
* int are not floats.
* Reworking the PR to support "char" vs "word" output.
* Fixup!
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Quality.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
HFTracer.trace should use/return self.graph to be compatible with torch.fx.Tracer (#15824)
Fix tf.concatenate + test past_key_values for TF models (#15774)
* fix wrong method name tf.concatenate
* add tests related to causal LM / decoder
* make style and quality
* clean-up
* Fix TFBertModel's extended_attention_mask when past_key_values is provided
* Fix tests
* fix copies
* More tf.int8 -> tf.int32 in TF test template
* clean-up
* Update TF test template
* revert the previous commit + update the TF test template
* Fix TF template extended_attention_mask when past_key_values is provided
* Fix some styles manually
* clean-up
* Fix ValueError: too many values to unpack in the test
* Fix more: too many values to unpack in the test
* Add a comment for extended_attention_mask when there is past_key_values
* Fix TFElectra extended_attention_mask when past_key_values is provided
* Add tests to other TF models
* Fix for TF Electra test: add prepare_config_and_inputs_for_decoder
* Fix not passing training arg to lm_head in TFRobertaForCausalLM
* Fix tests (with past) for TF Roberta
* add testing for pask_key_values for TFElectra model
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
[examples/summarization and translation] fix readme (#15833)
Add ONNX Runtime quantization for text classification notebook (#15817)
Re-enable doctests for the quicktour (#15828)
* Re-enable doctests for the quicktour
* Re-enable doctests for task_summary (#15830)
* Remove &
Framework split model report (#15825)
Add TFConvNextModel (#15750)
* feat: initial implementation of convnext in tensorflow.
* fix: sample code for the classification model.
* chore: added checked for from the classification model.
* chore: set bias initializer in the classification head.
* chore: updated license terms.
* chore: removed ununsed imports
* feat: enabled argument during using drop_path.
* chore: replaced tf.identity with layers.Activation(linear).
* chore: edited default checkpoint.
* fix: minor bugs in the initializations.
* partial-fix: tf model errors for loading pretrained pt weights.
* partial-fix: call method updated
* partial-fix: cross loading of weights (4x3 variables to be matched)
* chore: removed unneeded comment.
* removed playground.py
* rebasing
* rebasing and removing playground.py.
* fix: renaming TFConvNextStage conv and layer norm layers
* chore: added initializers and other minor additions.
* chore: added initializers and other minor additions.
* add: tests for convnext.
* fix: integration tester class.
* fix: issues mentioned in pr feedback (round 1).
* fix: how output_hidden_states arg is propoagated inside the network.
* feat: handling of arg for pure cnn models.
* chore: added a note on equal contribution in model docs.
* rebasing
* rebasing and removing playground.py.
* feat: encapsulation for the convnext trunk.
* Fix variable naming; Test-related corrections; Run make fixup
* chore: added Joao as a contributor to convnext.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* chore: corrected copyright year and added comment on NHWC.
* chore: fixed the black version and ran formatting.
* chore: ran make style.
* chore: removed from_pt argument from test, ran make style.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* fix: tests in the convnext subclass, ran make style.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* chore: moved convnext test to the correct location
* fix: locations for the test file of convnext.
* fix: convnext tests.
* chore: applied sgugger's suggestion for dealing w/ output_attentions.
* chore: added comments.
* chore: applied updated quality enviornment style.
* chore: applied formatting with quality enviornment.
* chore: revert to the previous tests/test_modeling_common.py.
* chore: revert to the original test_modeling_common.py
* chore: revert to previous states for test_modeling_tf_common.py and modeling_tf_utils.py
* fix: tests for convnext.
* chore: removed output_attentions argument from convnext config.
* chore: revert to the earlier tf utils.
* fix: output shapes of the hidden states
* chore: removed unnecessary comment
* chore: reverting to the right test_modeling_tf_common.py.
* Styling nits
Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* minor changes
* doc fix in feature extractor
* doc
* typose
* removed detr logic from config
* removed detr logic from config
* removed num_labels
* small fix in the config
* auxilary -> auxiliary
* make style
* some test is failing
* fix a weird char in config prevending doc-builder
* retry to fix the doc-builder issue
* make style
* new try to fix the doc builder
* CI
* change weights to facebook
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* Create optimizer after model creation for SMP
* update dp_rank to rdp_rank for opt_state_dict
* update world_size and process_index for smp
* Address comments
* Lint fix
Co-authored-by: Cavdar <dcavdar@a07817b12d7e.ant.amazon.com>
* Add data2vec model cloned from roberta
* Add checkpoint conversion script
* Fix copies
* Update docs
* Add checkpoint conversion script
* Remove fairseq data2vec_text script and fix format
* Add comment on where to get data2vec_text.py
* Remove mock implementation cheat.py and fix style
* Fix copies
* Remove TF and Flax classes from init
* Add back copy from fairseq data2vec_text.py and fix style
* Update model name in docs/source/index.mdx to be CamelCase
* Revert model name in table to lower-case to get check_table test to pass
* Update src/transformers/models/data2vec/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update docs/source/model_doc/data2vec.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/model_doc/data2vec.mdx
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/auto/configuration_auto.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/configuration_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/test_modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/configuration_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update documentation
* Copy-paste Data2VecConfig from BertConfig
* Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency
* Update config special tokens to match RoBERTa
* Split multiple assertions and add individual error messages
* Rename Data2VecModel to Data2VecForTextModel
* Add Data2Vec to _toctree.yml
* Rename Data2VecEmbeddings to Data2VecForTextEmbeddings
* Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding).
* finish audio model
* finish audio file
* Update names and fix style, quality and repo consistency
* Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files.
* add inputs to logits to data2vec'
* correct autio models
* correct config auto
* correct tok auto
* Update utils/tests_fetcher.py
* delete unnecessary files
* delete unnecessary files
* further renaming
* make all tests pass
* finish
* remove useless test file
* Update tests/test_modeling_common.py
* Update utils/check_repo.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec_text.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix copies
* Update docs
* Remove fairseq data2vec_text script and fix format
* Add comment on where to get data2vec_text.py
* Remove mock implementation cheat.py and fix style
* Fix copies
* Remove TF and Flax classes from init
* Add back copy from fairseq data2vec_text.py and fix style
* Update model name in docs/source/index.mdx to be CamelCase
* Revert model name in table to lower-case to get check_table test to pass
* Update documentation
* Update src/transformers/models/data2vec/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/convert_data2vec_original_pytorch_checkpoint_to_pytorch.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/auto/configuration_auto.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/configuration_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/test_modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/configuration_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/data2vec/modeling_data2vec.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Copy-paste Data2VecConfig from BertConfig
* Update config checkpoint to point to edugp/data2vec-nlp-base. Fix style and repo-consistency
* Update config special tokens to match RoBERTa
* Split multiple assertions and add individual error messages
* Rename Data2VecModel to Data2VecForTextModel
* Add Data2Vec to _toctree.yml
* Rename Data2VecEmbeddings to Data2VecForTextEmbeddings
* Add initial Data2VecForAudio model (unfinished). Only matching fairseq's implementation up to the feature encoder (before positional encoding).
* finish audio model
* finish audio file
* add inputs to logits to data2vec'
* Update names and fix style, quality and repo consistency
* Remove Data2VecAudioForPretraining. Add tests for Data2VecAudio, mimicking the Wav2Vec2 test suite. Fix bias initilization in positional conv layers. Move back configurations for audio and text to separate files.
* correct autio models
* correct config auto
* correct tok auto
* delete unnecessary files
* delete unnecessary files
* Update utils/tests_fetcher.py
* further renaming
* make all tests pass
* finish
* remove useless test file
* Update tests/test_modeling_common.py
* Update utils/check_repo.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/models/data2vec/modeling_data2vec_text.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Move data2vec tests to new structure
* Fix test imports for text tests
* Remove fairseq files
* Change paper link to arxiv
* Modify Data2Vec documentation to reflect that the encoder is not shared across the audio and text models in the current implementation.
* Update text model checkpoint to be facebook/data2vec-text-base
* Add 'Copy from' statements and update paper links and docs
* fix copy from statements
* improve copied from
* correct more copied from statements
* finish copied from stuff
* make style
* add model to README
* add to master
Co-authored-by: Eduardo Gonzalez Ponferrada <eduardo@ferrumhealth.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fixing the timestamps with chunking.
* The changes modified (and fixed) the striding tests.
* Adding a tokenizer test.
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Defense -> comment.
* Update src/transformers/models/wav2vec2/tokenization_wav2vec2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* rebase
* Delete shift tokens func
* downsample decoder input seq len for init
* correct attention mask
* add tests
* pt flax cross test
* make fixup
* init file for import
* change pt-flax cross test threshold
* pt-flax test logits only
* move tests
* make repo-consistency
* consistent indentation
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* feat: initial implementation of convnext in tensorflow.
* fix: sample code for the classification model.
* chore: added checked for from the classification model.
* chore: set bias initializer in the classification head.
* chore: updated license terms.
* chore: removed ununsed imports
* feat: enabled argument during using drop_path.
* chore: replaced tf.identity with layers.Activation(linear).
* chore: edited default checkpoint.
* fix: minor bugs in the initializations.
* partial-fix: tf model errors for loading pretrained pt weights.
* partial-fix: call method updated
* partial-fix: cross loading of weights (4x3 variables to be matched)
* chore: removed unneeded comment.
* removed playground.py
* rebasing
* rebasing and removing playground.py.
* fix: renaming TFConvNextStage conv and layer norm layers
* chore: added initializers and other minor additions.
* chore: added initializers and other minor additions.
* add: tests for convnext.
* fix: integration tester class.
* fix: issues mentioned in pr feedback (round 1).
* fix: how output_hidden_states arg is propoagated inside the network.
* feat: handling of arg for pure cnn models.
* chore: added a note on equal contribution in model docs.
* rebasing
* rebasing and removing playground.py.
* feat: encapsulation for the convnext trunk.
* Fix variable naming; Test-related corrections; Run make fixup
* chore: added Joao as a contributor to convnext.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* chore: corrected copyright year and added comment on NHWC.
* chore: fixed the black version and ran formatting.
* chore: ran make style.
* chore: removed from_pt argument from test, ran make style.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* fix: tests in the convnext subclass, ran make style.
* rebasing
* rebasing and removing playground.py.
* rebasing
* rebasing and removing playground.py.
* chore: moved convnext test to the correct location
* fix: locations for the test file of convnext.
* fix: convnext tests.
* chore: applied sgugger's suggestion for dealing w/ output_attentions.
* chore: added comments.
* chore: applied updated quality enviornment style.
* chore: applied formatting with quality enviornment.
* chore: revert to the previous tests/test_modeling_common.py.
* chore: revert to the original test_modeling_common.py
* chore: revert to previous states for test_modeling_tf_common.py and modeling_tf_utils.py
* fix: tests for convnext.
* chore: removed output_attentions argument from convnext config.
* chore: revert to the earlier tf utils.
* fix: output shapes of the hidden states
* chore: removed unnecessary comment
* chore: reverting to the right test_modeling_tf_common.py.
* Styling nits
Co-authored-by: ariG23498 <aritra.born2fly@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
* fix wrong method name tf.concatenate
* add tests related to causal LM / decoder
* make style and quality
* clean-up
* Fix TFBertModel's extended_attention_mask when past_key_values is provided
* Fix tests
* fix copies
* More tf.int8 -> tf.int32 in TF test template
* clean-up
* Update TF test template
* revert the previous commit + update the TF test template
* Fix TF template extended_attention_mask when past_key_values is provided
* Fix some styles manually
* clean-up
* Fix ValueError: too many values to unpack in the test
* Fix more: too many values to unpack in the test
* Add a comment for extended_attention_mask when there is past_key_values
* Fix TFElectra extended_attention_mask when past_key_values is provided
* Add tests to other TF models
* Fix for TF Electra test: add prepare_config_and_inputs_for_decoder
* Fix not passing training arg to lm_head in TFRobertaForCausalLM
* Fix tests (with past) for TF Roberta
* add testing for pask_key_values for TFElectra model
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Adding the option to return_timestamps on pure CTC ASR models.
* Remove `math.prod` which was introduced in Python 3.8
* int are not floats.
* Reworking the PR to support "char" vs "word" output.
* Fixup!
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Quality.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Enabling Beit SegFormer to `image-segmentation`.
* Fixing the score.
* Fix import ?
* Missing in type hint.
* Multiple test fixes:
- Add `raw_image` support. It should be the default IMHO since in Python
world it doesn't make any sense to base64 encode the image (Sorry
@mishig, didn't catch that in my review). I really think we should
consider breaking BC here.
- Add support for Segformer tiny test (needed
`SegformerModelTester.get_config` to enable TinyConfig
@NielsRogge)
- Add the check that `batch_size` works correctly on that pipeline.
Uncovered that it doesn't for Detr, which IMO is OK since images
after `feature_extractor` don't have the same size. Comment should
explain.
* Type hint as a string.
* Make fixup + update black.
* torch+vision protections.
* Don't use torchvision, use F.interpolate instead (no new dep).
* Last fixes for Segformer.
* Update test to reflect new image (which was broken)
* Update tests.
* Major BC modification:
- Removed the string compressed PNG string, that's a job for users
`transformers` stays in python land.
- Removed the `score` for semantic segmentation. It has hardly a meaning
on its own in this context.
- Don't include the grayscale with logits for now (which could enable
users to get a sense of confidence). Might be done later.
- Don't include the surface of the mask (could be used for sorting by
users, to filter out small masks). It's already calculable, and
it's easier to add later, than to add now and break later if we need.
* `make fixup`.
* Small changes.
* Rebase + doc fixup.
* [Proposal] Adding ZeroShotImageClassificationPipeline
- Based on CLIP
* WIP, Resurection in progress.
* Resurrection... achieved.
* Reword handling different `padding_value` for `feature_extractor` and
`tokenizer`.
* Thanks doc-builder !
* Adding docs + global namespace `ZeroShotImageClassificationPipeline`.
* Fixing templates.
* Make the test pass and be robust to floating error.
* Adressing suraj's comments on docs mostly.
* Tf support start.
* TF support.
* Update src/transformers/pipelines/zero_shot_image_classification.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* begin script
* update script
* fix features and data args
* main
* add requirements
* add column name args
* fix captions
* don't jit transforms
* fix caption
* fix labels, handle attention mask
* convert pixel values to numpy
* labels => input_ids
* transform images on the fly
* use AutoModel class, create the hybird model outside of the script
* fix version message
* add readme
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* adderss review comments
* add more comments
* allow freezing vision and text models
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix bug in PT speech-encoder-decoder
* add pt test for `inputs is not None`
* fix test
* new pt test
* Update tests/test_modeling_speech_encoder_decoder.py
* make fixup
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Very big changes concerning the tokenizer fast of CLIP which did not correspond to the tokenizer slow of CLIP
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Added all files, PoolFormerFeatureExtractor still failing tests
* Fixed PoolFormerFeatureExtractor not being able to import
* Completed Poolformer doc
* Applied Suggested fixes
* Fixed errors in modeling_auto.py
* Fix feature extractor, convert docs to Markdown, styling of code
* Remove PoolFormer from check_repo and fix integration test
* Remove Poolformer from check_repo
* Fixed configuration_poolformer.py docs and removed inference.py from poolformer
* Ran with black v22
* Added PoolFormer to _toctree.yml
* Updated poolformer doc
* Applied suggested fixes and added on README.md
* Did make fixup and make fix-copies, tests should pass now
* Changed PoolFormer weights conversion script name and fixed README
* Applied fixes in test_modeling_poolformer.py and modeling_poolformer.py
* Added PoolFormerFeatureExtractor to AutoFeatureExtractor API
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MBP.localdomain>
* Implement activations as pytorch modules
* Apply fixup
* Add missing tests for activations
* Update docstring
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* TF generate start refactor
* Add tf tests for sample generate
* re-organize
* boom boom
* Apply suggestions from code review
* re-add
* add all code
* make random greedy pass
* make encoder-decoder random work
* further improvements
* delete bogus file
* make gpt2 and t5 tests work
* finish logits tests
* correct logits processors
* correct past / encoder_outputs drama
* refactor some methods
* another fix
* refactor shape_list
* fix more shape list
* import shape
_list
* finish docs
* fix imports
* make style
* correct tf utils
* Fix TFRag as well
* Apply Lysandre's and Sylvais suggestions
* Update tests/test_generation_tf_logits_process.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Update src/transformers/tf_utils.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* remove cpu according to gante
* correct logit processor
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Fix loading pipelines with wav2vec models with lm when in local paths
* Adding tests
* Fix test
* Adding tests
* Flake8 fixes
* Removing conflict files :(
* Adding task type to test
* Remove unnecessary test and imports
* Rework AutoFeatureExtractor.from_pretrained internal
* Custom feature extractor
* Add more tests
* Add support for custom feature extractor code
* Clean up
* Add register API to AutoFeatureExtractor
* Compute loss independent from decoder (as 14139)
* fix expected seq_len + style
* Apply the same change to TFVisionEncoderDecoderModel
* fix style
* Add case with labels in equivalence test
* uncomment
* Add case with labels in equivalence test
* add decoder_token_labels
* use hf_compute_loss
* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add copied from
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Add TensorFlow support for ONNX export
* Change documentation to mention conversion with Tensorflow
* Refactor export into export_pytorch and export_tensorflow
* Check model's type instead of framework installation to choose between TF and Pytorch
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Alberto Bégué <alberto.begue@della.ai>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* added classes to get started with constrained beam search
* in progress, think i can directly force tokens now but not yet with the round robin
* think now i have total control, now need to code the bank selection
* technically works as desired, need to optimize and fix design choices leading to undersirable outputs
* complete PR #1 without disjunctive decoding
* removed incorrect tests
* Delete k.txt
* Delete test.py
* Delete test.sh
* revert changes to test scripts
* genutils
* full implementation with testing, no disjunctive yet
* shifted docs
* passing all tests realistically ran locally
* removing accidentally included print statements
* fixed source of error in initial PR test
* fixing the get_device() vs device trap
* fixed documentation docstrings about constrained_beam_search
* fixed tests having failing for Speech2TextModel's floating point inputs
* fix cuda long tensor
* added examples and testing for them and founx & fixed a bug in beam_search and constrained_beam_search
* deleted accidentally added test halting code with assert False
* code reformat
* Update tests/test_generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_generation_utils.py
* fixing based on comments on PR
* took out the testing code that should but work fails without the beam search moditification ; style changes
* fixing comments issues
* docstrings for ConstraintListState
* typo in PhrsalConstraint docstring
* docstrings improvements
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* typical decoding
* changing arg name
* add test config params
* forgotten arg rename
* fix edge case where scores are same
* test for typical logits warper
* code quality fixes
* Add wrapper classes
* convert inner layers to tf
* Add TF Encoder and Decoder layers
* TFSpeech2Text models
* Loadable model
* TF model with same outputs as PT model
* test skeleton
* correct tests and run the fixup
* correct attention expansion
* TFSpeech2Text pask_key_values with TF format
* use_cache = False for PT models if labels is passed
* Fix for BigBirdPegasusForConditionalGeneration
* add warning if users specify use_cache=True
* Use logger.warning instead of warnings.warn
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* electra is added to onnx supported model
* add google/electra-base-generator for test onnx module
Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
* Change the way tracing happens, enabling dynamic axes out of the box
* Update the tests and modeling xlnet
* Add the non recoding of leaf modules to avoid recording more values for the methods to record than what will be seen at tracing time (which would otherwise desynchronize the recorded values and the values that need to be given to the proxies during tracing, causing errors).
* Comments and making tracing work for gpt-j and xlnet
* Refactore things related to num_choices (and batch_size, sequence_length)
* Update fx to work on PyTorch 1.10
* Postpone autowrap_function feature usage for later
* Add copyrights
* Remove unnecessary file
* Fix issue with add_new_model_like
* Apply suggestions
* Wav2Vec2 models must either throw or deal with add_apater
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add pre-add_adapter backwards compatibility
* Add pre-add_adapter backwards compatibility
* Fix issue in tests/test_modeling_wav2vec2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
# Add support for W&B hyperparameter sweep
This PR:
* allows using wandb for running hyperparameter search.
* The runs are visualized on W&B sweeps dashboard
* This supports runnning sweeps on parallel devices, all reporting to the same central dashboard.
### Usage
**To run new a hyperparameter search:**
```
trainer.hyperparameter_search(
backend="wandb",
project="transformers_sweep", # name of the project
n_trials=5,
metric="eval/loss", # metric to be optimized, default 'eval/loss'. A warning is raised if the passed metric is not found
)
```
This outputs a sweep id. Eg. `my_project/sweep_id`
**To run sweeps on parallel devices:**
Just pass sweep id which you want to run parallel
```
trainer.hyperparameter_search(
backend="wandb",
sweep_id = "my_project/sweep_id"
)
```
* Adding support for `microphone` streaming within pipeline.
- Uses `ffmpeg` to get microphone data.
- Makes sure alignment is made to `size_of_sample`.
- Works by sending `{"raw": ..data.., "stride": (n, left, right),
"partial": bool}`
directly to the pipeline enabling to stream partial results and still
get inference.
- Let's `partial` information flow through the pipeline to enable caller
to get it back and choose to display text or not.
- The striding reconstitution is bound to have errors since CTC does not
keep previous state. Currently most of the errors are we don't know if
there's a space or not between two chunks.
Since we have some left striding info, we could use that during decoding
to choose what to do with those spaces and even extra letters maybe (if
the stride is long enough, it's bound to cover at least a few symbols)
Fixing tests.
Protecting with `require_torch`.
`raw_ctc` support for nicer demo.
Post rebase fixes.
Revamp to split raw_mic_data from it's live chunking.
- Requires a refactor to make everything a bit cleaner.
Automatic resampling.
Small fix.
Small fix.
* Post rebase fix (need to let super handle more logic, reorder args.)
* Update docstrings
* Docstring format.
* Remove print.
* Prevent flow of `input_values`.
* Fixing `stride` too.
* Fixing the PR by removing `raw_ctc`.
* Better docstrings.
* Fixing init.
* Update src/transformers/pipelines/audio_utils.py
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
* Update tests/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
* Quality.
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
* Add torchvision's resize
* Rename torch_resize to default_to_square
* Apply suggestions from code review
* Add support for default_to_square and tuple of length 1
* add new test
* update test
* remove `tokenizer_file` from `additional_files_names` in `tokenization_utils_base.py`
* add `tokenizer_file` for the fast only tokenizer
* change global variables layoutxml
* remove `"tokenizer_file"` from DPR tokenizer's Global variables
* remove `tokenizer_file` from herbert slow tokenizer init
* `"tokenizer_file"` from LED tokenizer's Global variables
* remove `tokenizer_file` from mbart slow tokenizer init
* remove `tokenizer_file` from slow tokenizer template
* adapt to versioning
* adapt the `test_tokenizer_mismatch_warning` test
* clean test
* clarify `VOCAB_FILES_NAMES` in tokenization_utils_fast.py
* Revert "remove `tokenizer_file` from mbart slow tokenizer init"
This reverts commit 0dbb723fa9c7599d4640fe30b3647a74eb4a64e1.
* Revert "`"tokenizer_file"` from LED tokenizer's Global variables"
This reverts commit 5a3f879bdd651233f3d74a3d1146c34cde82b0c2.
* Revert "remove `tokenizer_file` from herbert slow tokenizer init"
This reverts commit f5e10007b7b0ec5345e015b9de7ffec72c5407fd.
* Revert "remove `"tokenizer_file"` from DPR tokenizer's Global variables"
This reverts commit da0895330bedfafc81ae3073470a9348c669f032.
* set `tokenizer_file` in super `__init__` of mbart
* replace assert with exception for `padding_side` arg in `PreTrainedTokenizerBase` `__init__`
* add test
* fix kwargs
* reformat test
* format
* format
* fix typo to render the documentation
* Update modeling_wav2vec2.py
With very tiny sound files (less than 0.1 seconds) the num_masked_span can be too long. The issue is described in issue #15366 and discussed with @patrickvonplaten.
* correct errors with mask time indices
* remove bogus file
* make fix-copies
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix loss calculation in TFFunnelForTokenClassification
* revert the change in TFFunnelForTokenClassification
* fix FunnelForTokenClassification loss
* fix other TokenClassification loss
* fix more
* fix more
* add num_labels to ElectraForTokenClassification
* revert the change to research projects
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add Luke training
* Fix true label tags
* Fix true label tags
* Fix true label tags
* Update the data collator for Luke
* Some training refactor for Luke
* Improve data collator for Luke
* Fix import
* Fix datasets concatenation
* Add the --max_entity_length argument for Luke models
* Remove unused code
* Fix style issues
* Fix style issues
* Move the Luke training into a separate folder
* Fix style
* Fix naming
* Fix filtering
* Fix filtering
* Fix filter
* Update some preprocessing
* Move luke to research_projects
* Checkstyle
* Address comments
* Fix style
* Fix the inconsistency of loss calculation between PT/TF XLNetLMHeadModel
* overwrite test_loss_computation
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* add xlm roberta xl
* add convert xlm xl fairseq checkpoint to pytorch
* fix init and documents for xlm-roberta-xl
* fix indention
* add test for XLM-R xl,xxl
* fix model hub name
* fix some stuff
* up
* correct init
* fix more
* fix as suggestions
* add torch_device
* fix default values of doc strings
* fix leftovers
* merge to master
* up
* correct hub names
* fix docs
* fix model
* up
* finalize
* last fix
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add copied from
* make style
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* clean commit of changes
* apply review feedback, make edits
* fix backticks, minor formatting
* 🖍 make fixup and minor edits
* 🖍 fix # in header
* 📝 update code sample without from_pt
* 📝 final review
* [deepspeed] saving checkpoint fallback when fp16 weights aren't saved
* Bump required deepspeed version to match usage when saving checkpoints
* update version
Co-authored-by: Mihai Balint <balint.mihai@gmail.com>
* Fixing support `batch_size` and `num_return_Sequences` in
`text-generation` pipeline
And `text2text-generation` too.
The bug was caused by the batch_size containing both the incoming batch
**and** the generated `num_sequences`.
The fix simply consists into splitting both of these again into
different dimensions.
* TF support.
* Odd backward compatibility script in the way.
* add new test
* add a feature to same the sentencepiece tokenizer model when the init file was deleted
* update marian
* update m2m_100
* fix marian
* update speech to text
* override test for layoutxlm
* fix saving bartpho
* remove harcoded values bartpho
* special token string version
* finish bartpho
* override layoutxml test
* add mbart
* move special tokens list
* format
* Revert "format"
This reverts commit 37a40df37903a932c2f951cbd33acb684246bae7.
* simplify list of string of special tokens
* Re-write `self.fairseq_tokens_to_ids ` initialization logic with special tokens
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
* Fix prediction with generate() and the inference of column names
Should now have very few differences with the PyTorch implementation
* Minor edit to parent class
* Update src/transformers/keras_callbacks.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Explaining the dict conversion
* Putting main_input_name back
* Fixes to main_input_name
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix_torch_device_generate_test
* remove @
* doc tests
* up
* up
* fix doctests
* adapt files
* finish refactor
* up
* save intermediate
* add more logic
* new change
* improve
* next try
* next try
* next try
* next try
* fix final spaces
* fix final spaces
* improve
* renaming
* correct more bugs
* finish wavlm
* add comment
* run on test runner
* finish all speech models
* adapt
* finish
* Added missing code in exemplary notebook - custom datasets fine-tuning
Added missing code in tokenize_and_align_labels function in the exemplary notebook on custom datasets - token classification.
The missing code concerns adding labels for all but first token in a single word.
The added code was taken directly from huggingface official example - this [colab notebook](https://github.com/huggingface/notebooks/blob/master/transformers_doc/custom_datasets.ipynb).
* Changes requested in the review - keep the code as simple as possible
* Avoid using get_list_of_files in config
* Wip, change tokenizer file getter
* Remove call in tokenizer files
* Remove last call to get_list_model_files
* Better tests
* Unit tests for new function
* Document bad API
* Add new model like command
* Bad doc-styler
* black and doc-styler, stop fighting!
* black and doc-styler, stop fighting!
* At last
* Clean up
* Typo
* Bad doc-styler
* Bad doc-styler
* All good maybe?
* Use constants
* Add doc and type hints
* More cleaning
* Add doc
* Fix Copied from
* Doc template
* Use typing.Pattern instead
* Framework-specific files
* Fixes
* Select frameworks clean model init
* Deal with frameworks in main init
* fixes
* Last fix
* Prompt user for info
* Delete exemple config
* Last fixes
* Add test config
* Fix bug with model_type included in each other
* Fixes
* More fixes
* More fixes
* Adapt config
* Remove print statements
* Will fix tokenization later, leave it broken for now
* Add test
* Quality
* Try this way
* Debug
* Maybe by setting the path?
* Let's try another way
* It should go better when actually passing the arg...
* Remove debug statements and style
* Fix config
* Add tests
* Test require the three backends
* intermediate commit
* Revamp pattern replacements and start work on feature extractors
* Adapt model info
* Finalize code for processors
* Fix in main init additions
* Finish questionnaire for processing classes
* Fix file name
* Fix for real
* Fix patterns
* Style
* Remove needless warnings
* Copied from should work now.
* Include Copied form in blocks
* Add test
* More fixes and tests
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comment
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* better
* save intermediate
* finish code
* up
* docs
* Apply suggestions from code review
* up
* add compute transition beam scores function to model and make sure scores are correct with eos
* apply nicos comments
* Apply suggestions from code review
* another fix
* Refine errors for pretrained objects
* PoC to avoid using get_list_of_files
* Adapt tests to use new errors
* Quality + Fix PoC
* Revert "PoC to avoid using get_list_of_files"
This reverts commit cb93b7cae8504ef837c2a7663cb7955e714f323e.
* Revert "Quality + Fix PoC"
This reverts commit 3ba6d0d4ca546708b31d355baa9e68ba9736508f.
* Fix doc
* Revert PoC
* Add feature extractors
* More tests and PT model
* Adapt error message
* Feature extractor tests
* TF model
* Flax model and test
* Merge flax auto tests
* Add tokenization
* Fix test
* Add missing __spec__ for transformers.models.auto
* Moves the __spec__-test to the UnitTest class
* Adds module_spec to all instances of _LazyModule
* Refactors an old test from pytest to unittest
* [EncoderDecoder] Add test for usage of extra kwargs
* [EncoderDecoder] Fix usage of extra kwargs in from pretrained
* [EncoderDecoder] apply suggested changes (passing **kwargs_encoder)
* [EncoderDecoder] create new test function and make sure it passes
Co-authored-by: jonas <jsnfly@gmx.de>
* [WIP] Make chuking smartly (long files) work on asr ctc_with_lm.
* Slow test with functionality.
* Fixing regular test.
* fix for batch size 1
* Handling batch outside `rescale_Stride`.
- Renamed to `rescale_stride`.
* Disable equality in the test.
* Remove print.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* First commit
* Add conversion script
* Make conversion script work for base model
* More improvements
* Update conversion script, works for vqa
* Add indexing argument to meshgrid
* Make conversion script work for ViltForPreTraining
* Add ViltForPreTraining to docs
* Fix device issue
* Add processor
* Add MinMaxResize to feature extractor
* Implement call method of ViltProcessor
* Fix tests
* Add integration test
* Add loss calculation for VQA
* Improve tests
* Improve some more tests
* Debug tests
* Small improvements
* Add support for attention_mask
* Remove mask_it
* Add pixel_mask
* Add tests for ViltFeatureExtractor
* Improve tests
* Add ViltForNaturalLanguageVisualReasoning
* Add ViltForNaturalLanguageVisualReasoning to conversion script
* Minor fixes
* Add support for image_embeds, update docstrings to markdown
* Update docs to markdown
* Improve conversion script
* Rename ViltForPreTraining to ViltForMaskedLM
* Improve conversion script
* Convert docstrings to markdown
* Fix code example of retrieval model
* Properly convert masked language model
* Add integration test for nlvr
* Fix code quality
* Apply suggestions from code review
* Add copied from statements
* Fix pretrained_config_archive_map
* Fix docs
* Add model to README
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply more suggestions from code review
* Make code more readable
* Add ViltForNaturalLanguageVisualReasoning to the tests
* Rename ViltForVisualQuestionAnswering to ViltForQuestionAnswering
* Replace pixel_values_2 by single tensor
* Add hidden_states and attentions
* Fix one more test
* Fix all tests
* Update year
* Fix rebase issues
* Fix another rebase issue
* Remove ViltForPreTraining from auto mapping
* Rename ViltForImageRetrievalTextRetrieval to ViltForImageAndTextRetrieval
* Make it possible to use BertTokenizerFast in the processor
* Use BertTokenizerFast by default
* Rename ViltForNaturalLanguageVisualReasoning, define custom model output
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Rename compute_loss to hf_compute_loss to avoid conflicts with the new Keras method
* make style
* Adding deprecation warning to `compute_loss`
* Fix sneaky reference to compute_loss
* Replace logger.warning with warnings.warn
* Clarifying warning and deprecation timeline
* up
* improve readme
* up
* up
* more info
* up
* up
* Apply suggestions from code review
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
* add more stuff for eval
* update
* up
* Update README.md
* Update examples/research_projects/xls_r/README.md
Co-authored-by: Omar Sanseviero <osanseviero@users.noreply.github.com>
* apply omar's suggestions
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Co-authored-by: Omar Sanseviero <osanseviero@users.noreply.github.com>
* fix doc example - MarianForCausalLM example
* try to keep copies
* fix copies
* fix more similar doc examples
* fix more
* fix style
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* First draft
* More improvements
* More improvements
* More improvements
* Fix embeddings
* Add conversion script
* Finish conversion script
* More improvements
* Fix forward pass
* Remove print statements
* Add weights initialization
* Add initialization of decoder weights
* Add support for other models in the conversion script
* Fix patch_size for huge model
* Fix most of the tests
* Fix integration test
* Fix docs
* Fix archive_list
* Apply suggestions from code review
* Improve documentation
* Apply more suggestions
* Skip some tests due to non-deterministic behaviour
* Fix test_initialization
* Remove unneccessary initialization of nn.Embedding
* Improve docs
* Fix dummies
* Remove ViTMAEFeatureExtractor from docs
* Add model to README and table of contents
* Delete inference file
* Fix deprecation warnings for int div
Co-authored-by: mgoldey <matthew.goldey@gmail.com>
* Fix import
* ensure that tensor output is python scalar
* make backward compatible
* make code more readable
* adapt test functions
Co-authored-by: mgoldey <matthew.goldey@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix doc example - NameError: name 'PATH' is not defined
* fix name 'TFRagModel' is not defined
* correct TFRagRagSequenceForGeneration
* fix name 'tf' is not defined
* fix style
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* update XLMProphetNet link
* update DPR link
* change prophetnet link
* change link MBART
* change link GPT
* update gpt2 link
* ctrl update link
* update Transformer-XL link
* Update Reformer link
* update xlnet link
* bert update link
* udpate albert link
* roberta update link
* update distilbert link
* update convbert link
* update XLM link
* xlm roberta update link
* update Flaubert link
* update electra link
* update funnel transformer and longformer
* bart update link
* pegasus update link
* udpate marianmt link
* t5 update link
* mt5 update link
* Add ONNX classes to main package
* Remove permalinks from ONNX guide
* Fix ToC entry
* Revert "Add ONNX classes to main package"
This reverts commit eb794a5b00d66b0b4eab234987301676d8357630.
* Add ONNX classes to main doc
* Fix syntax highlighting in doc
* Fix text
* Add FeaturesManager to doc
* Use paths to reference ONNX classes
* Add FeaturesManager to init
* Add missing ONNX paths
* Pipeline ASR with LM.
* Revamped into `self.decoder`.
* Fixing.
* 2nd fix.
* Update src/transformers/pipelines/__init__.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fixing.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
All specific tokenizer config properties must be passed to its base
class (XLMTokenizer) in order to be saved. This was not the case for
do_lowercase config. Thus it was not saved by save_pretrained() method
and saving and reloading the tokenizer changed its behaviour.
This commit fixes it.
* Add IBertOnnxConfig and tests
* add all the supported features for IBERT and remove outputs in IbertOnnxConfig
* use OnnxConfig
* fix codestyle
* remove serialization.rst
* codestyle
* Start the work on TFVisionEncoderDecoderModel
* Expose TFVisionEncoderDecoderModel
* fix import
* Add modeling_tf_vision_encoder_decoder to _ignore_modules in get_model_modules()
* reorder
* Apply the fix for checkpoint loading as in #14016
* remove attention_mask + fix VISION_DUMMY_INPUTS
* A minimal change to make TF generate() work for vision models as encoder in encoder-decoder setting
* fix wrong condition: shape_list(input_ids) == 2
* add tests
* use personal TFViTModel checkpoint (for now)
* Add equivalence tests + projection layer
* style
* make sure projection layer can run
* Add examples
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Clean comments (need to work on TODOs for PyTorch models)
* Remove TF -> PT in check_pt_tf_equivalence for TFVisionEncoderDecoderModel
* fixes
* Revert changes in PT code.
* Update tests/test_modeling_tf_vision_encoder_decoder.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add test_inference_coco_en for TF test
* fix quality
* fix name
* build doc
* add main_input_name
* Fix ckpt name in test
* fix diff between master and this PR
* fix doc
* fix style and quality
* fix missing doc
* fix labels handling
* Delete auto.rst
* Add the changes done in #14016
* fix prefix
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make style
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Make OpenAIGPTTokenizer work with SpaCy 3.x
SpaCy 3.x introduced an API change to creating the tokenizer that
breaks OpenAIGPTTokenizer. The old API for creating the tokenizer in
SpaCy 2.x no longer works under SpaCy 3.x, but the new API for creating
the tokenizer in SpaCy 3.x DOES work under SpaCy 2.x. Switching to the
new API should allow OpenAIGPTTokenizer to work under both SpaCy 2.x and
SpaCy 3.x versions.
* Add is_spacy_available and is_ftfy_available methods to file utils
* Add spacy and ftfy unittest decorator to testing utils
* Add tests for OpenAIGPTTokenizer that require spacy and ftfy
* Modify CircleCI config to run tests that require spacy and ftfy
* Remove unneeded unittest decorators are reuse test code
* Run make fixup
* fix doc example - AttributeError: 'numpy.ndarray' object has no attribute 'to'
* fix more
* Apply suggestions from code review
* Update src/transformers/models/unispeech/modeling_unispeech.py
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Hotfix `chunk_length_s` instead of `_ms`.
* Adding fix of `pad_token` which should be last/previous token for CTC
proper decoding
* Fixing ChunkPipeline unwrapping.
* Adding a PackIterator specific test.
* Add FlaxRoFormer
* Clean code + make quality
* Fix output pooling for FlaxRoFormerForMultipleChoiceModule
* Apply suggestions from code review
* add flax model to repos
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Enabling `tokenizers` upgrade.
* Moved ugly comment.
* Tokenizers==0.11.1 needs an update to keep borrow checker
happy in highly contiguous calls.
* Support both 0.11.1 and 0.11.0
* [AutoProcessor] Correct AutoProcessor and automatically add processor class
* up
* up
* up
* up
* up
* up
* up
* up
* continue tomorrow
* up
* up
* up
* make processor class private
* fix loop
* start
* add gradient checkpointing and feature extractor freezing
* Apply suggestions from code review
* up
* up
* up
* correct
* up
* more changes
* up
* up
* up
* remove rst
* Fix bad examples
* Add black formatting to style_doc
* Use first nonempty line
* Put it at the right place
* Don't add spaces to empty lines
* Better templates
* Deal with triple quotes in docstrings
* Result of style_doc
* Enable mdx treatment and fix code examples in MDXs
* Result of doc styler on doc source files
* Last fixes
* Break copy from
* New doc styler
* Fix issue with args at the start
* Code sample fixes
* Style code examples in MDX
* Fix more patterns
* Typo
* Typo
* More patterns
* Do without black for now
* Get more info in error
* Docstring style
* Re-enable check
* Quality
* Fix add_end_docstring decorator
* Fix docstring
* Fix duplicate call to save_checkpoint when using deepspeed / stage3_gather_fp16_weights_on_model_save
* Revert "Fix duplicate call to save_checkpoint when using deepspeed / stage3_gather_fp16_weights_on_model_save"
This reverts commit 6a3dec0397723a8417351dc38fdebf14ab17756c.
* Delete correct duplicate invocation of deepspeed save_checkpoint
* Add ElectraForCausalLM and cover some basic tests & need to fix a few tests
* Fix bugs
* make style
* make fix-copies
* Update doc
* Change docstring to markdown format
* Remove redundant update_keys_to_ignore
* Pipeline chunks.
* Batching for Chunking pipelines ?
* Batching for `question-answering` and `zero-shot-cls`.
* Fixing for FNet.
* Making ASR a chunk pipeline.
* Chunking ASR API.
* doc style.
* Fixing ASR test.
* Fixing QA eror (p_mask, padding is 1, not 0).
* Enable both vad and simple chunking.
* Max length for vad.
* remove inference mode, crashing on s2t.
* Revert ChunkPipeline for ASRpipeline.
Too many knobs for simple integration within the pipeline, better stick
to external convenience functions instead, more control to be had,
simpler pipeline and also easier to replace with other things later.
* Drop necessity for PT for these.
* Enabling generators.
* Add mic + cleanup.
* Typo.
* Typo2.
* Remove ASR work, it does not belong in this PR anymore.
* Update src/transformers/pipelines/pt_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/pipelines/zero_shot_classification.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Adding many comments.
* Doc quality.
* `hidden_states` handling.
* Adding doc.
* Bad rebase.
* Autofixing docs.
* Fixing CRITICAL bug in the new Zerocls pipeline.
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
As `jax` cuda requires special instructions to be installed correctly add a link to jax installation instructions.
Note: Flax install page only covers cpu jax installation info.
* First commit to add MarianMT to ONNX
* Now MarianModel.forward() automatically generates decoder_input_ids, like BartModel.forward()
* Adjusted MarianOnnxConfig.inputs and outputs to work with seq2seq-lm feature
* Style fix
* Added support for other features for already supported models
* Partial support for causal and seq2seq models
* Partial support for causal and seq2seq models
* Add default task for MarianMT ONNX
* Remove automatic creation of decoder_input_ids
* Extend inputs and outputs for MarianMT ONNX config
* Add MarianMT to ONNX unit tests
* Refactor
* OnnxSeq2SeqConfigWithPast to support seq2seq models
* Parameterized the onnx tests
* Restored run_mlm.py
* Restored run_mlm.py
* [WIP] BART update
* BART and MBART
* Add past_key_values and fix dummy decoder inputs
Using a sequence length of 1 in generate_dummy_outputs() produces large discrepancies, presumably due to some hidden optimisations.
* Refactor MarianOnnxConfig to remove custom past_key_values logic
* Fix quality
* Revert "Revert "Added support for other features for already supported models (#14358)" (#14679)"
This reverts commit 0f4e39c5599523c110cd713f60a3bfa145dad807.
* is_torch_available test to avoid failing imports
* sorting parameterize parameters to solve ERROR gw0 gw1
* tests fix
* tests fix
* GPT2 with past fix
* Fixed stateful class attribute change that was breaking things when converting multiple models sequentially
* Removed onnx file
* Refactor Marian export to account for base changes
* Fix copies
* Implemented suggestions
* Extend support for causal LM
* Revert "Revert "Added support for other features for already supported models (#14358)" (#14679)"
This reverts commit 0f4e39c5599523c110cd713f60a3bfa145dad807.
* is_torch_available test to avoid failing imports
* sorting parameterize parameters to solve ERROR gw0 gw1
* tests fix
* tests fix
* GPT2 with past fix
* Fixed stateful class attribute change that was breaking things when converting multiple models sequentially
* Removed onnx file
* Implemented suggestions
* Fixed __init__ to resolve conflict with master
* Revert "Revert "Added support for other features for already supported models (#14358)" (#14679)"
This reverts commit 0f4e39c5599523c110cd713f60a3bfa145dad807.
* is_torch_available test to avoid failing imports
* sorting parameterize parameters to solve ERROR gw0 gw1
* tests fix
* tests fix
* GPT2 with past fix
* Fixed stateful class attribute change that was breaking things when converting multiple models sequentially
* Removed onnx file
* Implemented suggestions
* Fixed __init__ to resolve conflict with master
* Remove commented import
* Remove ONNX model
* Remove redundant class method
* Tidy up imports
* Fix quality
* Refactor dummy input function
* Add copied from statements to Marian config functions
* Remove false copied from comments
* Fix copy from comment
Co-authored-by: Massimiliano Bruni <massimiliano.bruni@hcl.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* Working on splitting out labels
* First working version
* Fixed concatenation of outputs and labels
* val_dataset -> eval_dataset
* Only pass input arrays in tokenizer.model_input_names
* Only pass input arrays in tokenizer.model_input_names
* Only remove unexpected keys when predict_with_generate is True
* Adding proper docstring
* Adding example to docstring
* Add a proper ROUGE metric example
* Add a proper ROUGE metric example
* Add version checking
* Update src/transformers/keras_callbacks.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/keras_callbacks.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/keras_callbacks.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/keras_callbacks.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Remove requirement for tokenizer with predict_with_generate
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Revert "Revert "Added support for other features for already supported models (#14358)" (#14679)"
This reverts commit 0f4e39c5599523c110cd713f60a3bfa145dad807.
* is_torch_available test to avoid failing imports
* sorting parameterize parameters to solve ERROR gw0 gw1
* tests fix
* tests fix
* GPT2 with past fix
* Fixed stateful class attribute change that was breaking things when converting multiple models sequentially
* Removed onnx file
* Implemented suggestions
* Fixed __init__ to resolve conflict with master
* Remove commented import
* add tests
* change post-processor, pre-tokenizer and decoder (can't update decoder)
* update test (remove decoder which doesn't depend on trim and add_prefix)
* just update the post_processor
* fix change
* `trim_offsets` has no influence on `pre_tokenizer`
* remove a test that need some input from the `tokenizers` lib maintainers
* format
* add new test offsets roberta
* polish comments
* Convert docstrings of all configurations and tokenizers
* Processors and fixes
* Last modeling files and fixes to models
* Pipeline modules
* Utils files
* Data submodule
* All the other files
* Style
* Missing examples
* Style again
* Fix copies
* Say bye bye to rst docstrings forever
* add custom `stopping_criteria` and `logits_processor` to `generate`
* add tests for custom `stopping_criteria` and `logits_processor`
* fix typo in RAG
* address reviewer comments
* improve custom logits processor/stopping criteria error message
* fix types in merge function signature
* change default for custom list from `None` to empty list
* fix rag generate
* add string split suggestion
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Convert file_utils docstrings to Markdown
* Test on BERT
* Return block indent
* Temporarily disable doc styler
* Remove from quality checks as well
* Remove doc styler mess
* Remove check from circleCI
* Fix typo
* Convert file_utils docstrings to Markdown
* Test on BERT
* Return block indent
* Temporarily disable doc styler
* Remove from quality checks as well
* Remove doc styler mess
* Remove check from circleCI
* Fix typo
* Let's go on all other model files
* Add templates too
* Styling and quality
* Add a main_input_name attribute to all models
* Fix tests
* Wtf Vs Code?
* Update src/transformers/models/imagegpt/modeling_imagegpt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Style
* Fix copies
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Implement head_mask for Flax BERT and other models copied from BERT
* Remove `from jax._src.nn.functions import sigmoid`
Remove `from jax._src.nn.functions import sigmoid` unintentionally added by IDE
* Remove no more valid copy statement
* Apply patil-suraj's suggestions from code review
* Apply suggestions from the code review
* Update Flax template
* Fix a typo
* Also update template for CausalLM modules
* PoC for conserving old links
* Do the same for other links
* remap the redirects section
* add instructions on how to move sections
* improve
Co-authored-by: Stas Bekman <stas@stason.org>
* Initial commit for Keras model cards
* Revert accidental change
* make style
* make style
* make style
* Fix PR comments
* Move repo creation to __init__
* Fixes to README.md creation
* Partial progress for proper card creation on `push_to_hub`
* Proper card creation from `push_to_hub` plus fixes for malformed model cards
* Fixes for model card creation outside the callback
* Adding a model card creation test
* Putting the model card creation test in the right file.
Good job, Matt.
* make style
* Fix model card test temp dir usage
* Fix model card creation when no optimizer present
* Fixes for when training history not present
* Fix accidental edit to test_modeling_common
* Adding support for multiple mask tokens.
- Original implem: https://github.com/huggingface/transformers/pull/10222
Co-authored-by: njafer <naveen.jafer@oracle.com>
* In order to accomodate optionally multimodal models like Perceiver
we add information to the tasks to specify tasks where we know for sure
if we need the tokenizer/feature_extractor or not.
* Adding info in the documentation about multi masks.
+ marked as experimental.
* Add a copy() to prevent overriding the same tensor over and over.
* Fixup.
* Adding small test for multi mask with real values..
Co-authored-by: njafer <naveen.jafer@oracle.com>
* Adding some slow test to check for perceiver at least from a high level.
* Re-enabling fast tests for Perceiver ImageClassification.
* Perceiver might try to run without Tokenizer (Fast doesn't exist) and
with FeatureExtractor some text only pipelines.
* Oops.
* Adding a comment for `update_config_with_model_class`.
* Remove `model_architecture` to get `tiny_config`.
* Finalize rebase.
* Smarter way to handle undefined FastTokenizer.
* Remove old code.
* Addressing some nits.
* Don't instantiate `None`.
* Fix doc examples: cannot import name
* remove copy because of some necessary minor changes (maybe add copy to the individual methods instead)
* Keep copy with some modifications
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
- Do not run image-classification pipeline (_CHECKPOINT_FOR_DOC uses the checkpoint for
langage, which cannot load a FeatureExtractor so current logic fails).
- Add a safeguard to not run tests when `tokenizer_class` or
`feature_extractor_class` **are** defined, but cannot be loaded
This happens for Perceiver for the "FastTokenizer" (which doesn't exist
so None) and FeatureExtractor (which does exist but cannot be loaded
because the checkpoint doesn't define one which is reasonable for the
said checkpoint)
- Added `get_vocab` function to `PerceiverTokenizer` since it is used by
`fill-mask` pipeline when the argument `targets` is used to narrow a
subset of possible values.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* Add some nicety flags for better controlling evaluation.
* Fix dependency issue with outdated requirement
* Add additional flag to example to ensure eval is done
* Wrap code into main function for accelerate launcher to find
* Fix valid batch size flag in readme
* Add note to install git-lfs when initializing/training the model
* Update examples/research_projects/codeparrot/scripts/arguments.py
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* Update examples/research_projects/codeparrot/README.md
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* Revert "Wrap code into main function for accelerate launcher to find"
This reverts commit ff11df1c810d4df198d04b827538eb4572147ba3.
* Fix formatting issue
* Move git-lfs instructions to installation section
* Add a quick check before code generation for code evaluation
* Fix styling issue
* Update examples/research_projects/codeparrot/scripts/human_eval.py
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* Make iterable dataset use passed in tokenizer rather than globally defined one
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
Co-authored-by: ncoop57 <nac33@students.uwf.edu>
* Test workflow
* Build doc
* Make a clean build
* Add doc config
* Restore other workflows
* Final job
* Print something in else statements
* Pull before making changes
* Fix doc examples: name '...' is not defined
* remove >>> and ... in some docstrings in visual_bert
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* change args to address overwriting issue
* remove project name from args
* remove passing args as kwargs to experiment object
* remove passing args as kwargs to offline experiment
* fix offline directory assignment in experiment kwargs
* log checkpoint folder on training end
* log entire output_dir as asset folder
* log asset folder recursively
* end experiment at the end of training
* clean up
* clean up
* Default to always log training assets to Comet when using CometCallback
* change logging training assets to be true when running callback setup
* fix so that experiment always ends when training ends
* styling and quality fixes
* update docstring for COMET_LOG_ASSETS environment variable
* run styling and quality checks
* clean up to docstring
* remove merge markers
* change asset logging to false to avoid hitting max assets per experiment limit
* update training asset description
* fix styling
* fix: verify jsonl in run_translation (#14660)
* fix(run_translation.py): json/jsonl validation
Both json and jsonl are to be accepted as valid jsonlines file extension
* fix(run_translation.py): make black happy
* Ran make style
* Convert a few docs
* And another
* Last tutorials
* New syntax for colab links
* Convert a few docs
* And another
* Last tutorials
* New syntax for colab links
* Added support for other features for already supported models
* Partial support for causal and seq2seq models
* Partial support for causal and seq2seq models
* OnnxSeq2SeqConfigWithPast to support seq2seq models
* Parameterized the onnx tests
* Restored run_mlm.py
* Restored run_mlm.py
* [WIP] BART update
* BART and MBART
* Added comments
* Another sequence length of the past_key_values
* First draft
* Style and remove mlm
* Make forward pass work
* More improvements
* More improvements
* Fix bug
* More improvements
* More improvements
* Add PerceiverTokenizer first draft
* Improve conversion script
* More improvements
* Make conversion script work for the encoder
* Make conversion script work with local pickle files
* Style & quality, fix-copies
* Add dummy input to conversion script
* Add absolute position embeddings to TextPreProcessor
* Make forward pass of encoder work
* More improvements
* Move text preprocessor to separate script
* More improvements
* More improvements
* Add post processor
* Make MLM model work
* Style
* Add PerceiverForMaskedLM
* Add PerceiverImagePreprocessor
* Make style
* Make PerceiverForImageClassification work
* More improvements
* More improvements
* Use tokenizer in conversion script
* Use PerceiverForMaskedLM in conversion script
* Define custom PerceiverModelOutput
* Improve PerceiverAttention to make it work for both MLM and image classification
* More improvements
* More improvements
* More improvements to the conversion script
* Make conversion script work for both MLM and image classification
* Add PerceiverFeatureExtractor
* More improvements
* Style and quality
* Add center cropping
* Fix bug
* Small fix
* Add print statement
* Fix bug in image preprocessor
* Fix bug with conversion script
* Make output position embeddings an nn.Parameter layer instead of nn.Embedding
* Comment out print statements
* Add position encoding classes
* More improvements
* Use position_encoding_kwargs
* Add PerceiverForImageClassificationFourier
* Make style & quality
* Add PerceiverForImageClassificationConvProcessing
* Style & quality
* Add flow model
* Move processors to modeling file
* Make position encodings modular
* Make basic decoder use modular position encodings
* Add PerceiverForOpticalFlow to conversion script
* Add AudioPreprocessor
* Make it possible for the basic decoder to use Fourier position embeddings
* Add PerceiverForMultimodalAutoencoding
* Improve model for optical flow
* Improve _build_network_inputs method
* Add print statement
* Fix device issue
* Fix device of Fourier embeddings
* Add print statements for debugging
* Add another print statement
* Add another print statement
* Add another print statement
* Add another print statement
* Improve PerceiverAudioPreprocessor
* Improve conversion script for multimodal modal
* More improvements
* More improvements
* Improve multimodal model
* Make forward pass multimodal model work
* More improvements
* Improve tests
* Fix some more tests
* Add output dataclasses
* Make more tests pass
* Add print statements for debuggin
* Add tests for image classification
* Add PerceiverClassifierOutput
* More improvements
* Make more tests pass for the optical flow model
* Make style & quality
* Small improvements
* Don't support training for optical flow model for now
* Fix _prepare_for_class for tests
* Make more tests pass, add some docs
* Add multimodal model to tests
* Minor fixes
* Fix tests
* Improve conversion script
* Make fixup
* Remove pos_dim argument
* Fix device issue
* Potential fix for OOM
* Revert previous commit
* Fix test_initialization
* Add print statements for debugging
* Fix print statement
* Add print statement
* Add print statement
* Add print statement
* Add print statement
* Add print statement
* Add print statement
* Remove need for output_shape
* Comment out output_shape
* Remove unnecessary code
* Improve docs
* Fix make fixup
* Remove PerceiverTextProcessor from init
* Improve docs
* Small improvement
* Apply first batch of suggestions from code review
* Apply more suggestions from code review
* Update docstrings
* Define dicts beforehand for readability
* Rename task to architecture in conversion script, include PerceiverModel in tests
* Add print statements for debugging
* Fix tests on GPU
* Remove preprocessors, postprocessors and decoders from main init
* Add integration test
* Fix docs
* Replace einops by torch
* Update for new docs frontend
* Rename PerceiverForImageClassification
* Improve docs
* Improve docs
* Improve docs of PerceiverModel
* Fix some more tests
* Improve center_crop
* Add PerceiverForSequenceClassification
* Small improvements
* Fix tests
* Add integration test for optical flow model
* Clean up
* Add tests for tokenizer
* Fix tokenizer by adding special tokens properly
* Fix CI
* up
* up
* up
* make it cleaner
* correct
* make styhahalal
* add more tests
* finish
* small fix
* make style
* up
* tryout to solve cicrle ci
* up
* fix more tests
* fix more tests
* apply sylvains suggestions
* fix import
* correct docs
* add pyctcdecode only to speech tests
* fix more tests
* add tf, flax and pt tests
* add pt
* fix last tests
* fix more tests
* Apply suggestions from code review
* change lines
* Apply suggestions from code review
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
* correct tests
* correct tests
* add doc string
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
* fix bug, trainer_seq2seq.py, Line 172, generation_inputs must be a dict before feeding into self.model.generation()
* fix bug, trainer_seq2seq.py, Line 172, generation_inputs must be a dict before feeding into self.model.generation()
* quick fix SummarizationPipeline error messages
Fix error messages to avoid spam errors, and errors of type:
`Your max_length is set to 50, but you input_length is only 46. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=50)`
* correcto SummarizationPipeline error messages fixes
* implement MLukeTokenizer and LukeForMaskedLM
* update tests
* update docs
* add LukeForMaskedLM to check_repo.py
* update README
* fix test and specify the entity pad id in tokenization_(m)luke
* fix EntityPredictionHeadTransform
* add cross_attention_hidden_size to text-2-text encoder-decoder models (PT/Flax)
* for TFEncoderDecoderModel
* add equivalence test for TFEncoderDecoderModel
* fix
* fix failed equivalence tests
* remove unused import
* add detailed comment
* Fix check_equivalence_tf_to_pt by using encoder/decoder
* cleaning
* Use cross_attention_hidden_size in speech-to-text
* clean fast init logging msg in encoder decoder models
* increase tol from 1e-5 to 1e-3 for tf test
* style
* style
* make sure projection layer can run
* remove type conversion + add check
* fix conflict (config.output_hidden_size)
* Remove TF -> PT in check_pt_tf_equivalence for TFEncoderDecoderModel
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add AutoProcessor class
Init and tests
Add doc
Fix init
Update src/transformers/models/auto/processing_auto.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Reverts to tokenizer or feature extractor when available
Adapt test
* Revert "Adapt test"
This reverts commit bbdde5fab02465f24b54b227390073082cb32093.
* Revert "Reverts to tokenizer or feature extractor when available"
This reverts commit 77659ff5d21b6cc0baf6f443017e35e056a525bb.
* Don't revert everything Lysandre!
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
* Update code to resolve comments left in previous PR.
* Add README.md file for this example.
* Update examples/onnx/pytorch/translation/README.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update examples/onnx/pytorch/translation/README.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update examples/onnx/pytorch/translation/README.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Update README.md file to resolve comments.
* Add a section name.
* Update examples/onnx/pytorch/translation/README.md
Co-authored-by: Gary Miguel <garymm@garymm.org>
* Add more comments for _convert_past_list_to_tuple().
* Change the default file name to a consistent one.
* Fix a format issue.
* Update examples/onnx/pytorch/translation/README.md
Co-authored-by: Gary Miguel <garymm@garymm.org>
* Update examples/onnx/pytorch/translation/run_onnx_exporter.py
Co-authored-by: Gary Miguel <garymm@garymm.org>
* Update examples/onnx/pytorch/translation/README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Change the folder to summarization and address some other coments.
* Update the torch version.
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Gary Miguel <garymm@garymm.org>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* add test for glue
* add tests for clm
* fix clm test
* add summrization tests
* more tests
* fix few tests
* add test for t5 mlm
* fix t5 mlm test
* fix tests for multi device
* cleanup
* ci job
* fix metric file name
* make t5 more robust
* Make DefaultDataCollator importable from root
* Add documentation for DefaultDataCollator and add return_tensors argument to all class docstrings
* make style
* Add DefaultDataCollator to data_collator.rst
* Add DefaultDataCollator to data_collator.rst
* fix#14524 (IndexError when mask prob is too low)
* fix formatting
* correct documentation, add option for setting min_num_masks
* change the semantic meaning of `mask_prob` in _compute_mask_indices
With this commit the meaing of `mask_prob` actually adhered to the probability for each
vector to be the start of a masked span of length.
* fix check_copies test
* fix documentation to semantic meaning of `upper bound of overall masking percentage`, revert changes to _compute_mask_indices
* fix typo
* started bf16 integration
* minor changes
* code now runs
* style
* lay foundation for bf16 testing
* lay foundation for bf16 testing
* start the tests
* better bf16 check
* style
* 2 separate checkers - one for bf16 support, another for bf16+autocast
* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* a couple of comment resolutions
* more comment resolutions
* resolved a small bug
* just some print statemtns
* added todo marking
* added a todo
* adjust for API change s/fast_dtype/dtype/
* fix style
* merge 2 bf16 util functions
* bf16 now does scaling too
* Add support for bfloat16
* Revert T5 layernorm to float32
This is based on the comment at https://github.com/huggingface/transformers/pull/14448/files#r752660929 and the PyTorch PR https://github.com/pytorch/pytorch/pull/66920 .
* Add comment about conversion to float32 before returning the numpy data
* Add comment about AMP-bfloat16 incompatibility
* Fix formatting
* typo
* reformer / bf16
* cleanup
* require at least pt-1.10
* fix
* will deal with deepspeed separately
* cleanup
* revert
* cleanup
* fp16_full_eval and bf16_full_eval are separate modes
* proper deprecation
* cleanup
* test and fixes
* spelling
* cleanup
* add a note that this API is experimental
Co-authored-by: jamie <jamie@cortx.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: suriya <suriya@cortx.com>
Co-authored-by: Manuel R. Ciosici <manuelrciosici@gmail.com>
* Init Flax implementation for Blenderbot
* Add a majority of stuff except for tests
* make style quality
* Add tests and fix some bugs
* Add tests
* Clean source code and fix some bugs
* Fix copies and docs
* Fix jax device condition for tests
* Fix layer norm in the encoder
* Fix a few typos in the test file
* make fix-copies
* make fix-copies
* fix layer norm
* Fix Flax params dtype (#13090)
* Fix PR reference (#13098)
* make fix-copies
* Update tests/test_modeling_flax_blenderbot.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* TF Tapas first commit
* updated docs
* updated logger message
* updated pytorch weight conversion
script to support scalar array
* added use_cache to tapas model config to
work properly with tf input_processing
* 1. rm embeddings_sum
2. added # Copied
3. + TFTapasMLMHead
4. and lot other small fixes
* updated docs
* + test for tapas
* updated testing_utils to check
is_tensorflow_probability_available
* converted model logits post processing using
numpy to work with both PT and TF models
* + TFAutoModelForTableQuestionAnswering
* added TF support
* added test for
TFAutoModelForTableQuestionAnswering
* added test for
TFAutoModelForTableQuestionAnswering pipeline
* updated auto model docs
* fixed typo in import
* added tensorflow_probability to run tests
* updated MLM head
* updated tapas.rst with TF model docs
* fixed optimizer import in docs
* updated convert to np
data from pt model is not
`transformers.tokenization_utils_base.BatchEncoding`
after pipeline upgrade
* updated pipeline:
1. with torch.no_gard removed, pipeline forward handles
2. token_type_ids converted to numpy
* updated docs.
* removed `use_cache` from config
* removed floats_tensor
* updated code comment
* updated Copyright Year and
logits_aggregation Optional
* updated docs and comments
* updated docstring
* fixed model weight loading
* make fixup
* fix indentation
* added tf slow pipeline test
* pip upgrade
* upgrade python to 3.7
* removed from_pt from tests
* revert commit f18cfa9
* Added the lang argument to apply_tesseract in feature_extraction_layoutlmv2.py, which is used in pytesseract.image_to_data.
* Added ocr_lang argument to LayoutLMv2FeatureExtractor.__init__, which is used when calling apply_tesseract
* Updated the documentation of the LayoutLMv2FeatureExtractor
* Specified in the documentation of the LayoutLMv2FeatureExtractor that the ocr_lang argument should be a language code.
* Update src/transformers/models/layoutlmv2/feature_extraction_layoutlmv2.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Split comment into two lines to adhere to the max line size limit.
* Update src/transformers/models/layoutlmv2/feature_extraction_layoutlmv2.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* added save_directories for _psave_pretrained_pt and _tf, changed model to tf_model and pt_model, enable the notebook to run cleanly from top to bottom without error
* Update quicktour.rst
* added >>>
* dependencies
* added space
When loading a pretrained tokenizer, a verification is done to ensure
that the actual tokenizer class matches the class it was called from.
If the tokenizer is absent, its config file is loaded from the repo.
However, the cache_dir for downloading is not provided, which leads to
ignoring of the user-specified cache_dir, storing files in several
places and and may result in incorrect warnings when the default
cache_dir is unreachsble.
This commit fixes that.
* test: make sure model configs are jsonifiable
* fix: return python dict instead of config object
* fix: accept pretrained config and use correct class
* Re-enabling slow tests and applying them to core models only
* Re-enabling slow tests and applying them to core models only
* Add new test file to fetcher
* Remove tooslow tests from test_modeling_tf_common.py
* make style
* Style fixes
* Style fixes
* Style fixes
* Style fixes
* Adding core tests to GPT2 and BART
* Removing unused imports
Co-authored-by: niklas.fruehauf <niklas.fruehauf@sovanta.com>
Co-authored-by: matt <rocketknight1@gmail.com>
* add new wav2vec2 translation
* correct
* up
* add tests
* correct end copy
* correct more
* up
* correct unispeech sat
* finish
* finalize
* finish
* up
* stop training when a finite IterableDataset is exhausted
when using an iterable dataset num_epochs is set to
sys.maxsize to make sure all data is consumed
likewise we want to set max_steps high enough
but still stop when all data is consumed
(cherry picked from commit 6f0e1d6363153da9051e93acffe1cbab3a3f3b12)
* fix typo flase -> false
* add test for stopping training on exhausted finite iterable dataset
* remove redundant gradient_accumulation_steps
* run make style
reformat training_args docstring
* Fix gradient_checkpointing backward compatibility
* Remove needless line
* make sure mask prob is big enough and length small enough
* Fix tests
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* Adding support for raw python `generator` in addition to `Dataset`
The main goal is to ease the create of streaming data to the pipe.
`Dataset` is more involved and pytorch specific.
This PR, provides a way to use a python iterator too.
This enabled #14250 but can be proposed as a standalone PR.
```python
from transformers import pipeline
def read_data(filename):
with open(filename, 'r') as f:
for line in f:
yield f
pipe = pipeline("text-classification")
for classified in pipe(read_data("large_file.txt")):
print("Success ! ", classified)
```
The main caveat of this, is the interaction with `DataLoader` with
`num_workers>1`. When you have multiple workers, each receive a copy
of the generator (like `IterableDataset`). That means the naive Iterator
will fail since all workers iterate on all items of the generator.
There are ways to do clever "skipping", but it could be bad still
because all workers still do have to pass through all items of the
generator (they just ignore items they don't handle), depending on
the case it might be bad.
Using `num_workers=1` is the simplest fix and if the cost of loading
your data is small enough should be good enough. In the above example
trying to do smart tricks to skip some lines is unlikely to be a net
positive for instance.
If there are better ways to do "jumps" on some data, then using
`Dataset` is more advised (since then differents workers can just jump
themselves).
* Adding iterator support for `tf` too.
* fix loading flax bf16 weights in pt
* fix clip test
* fix t5 test
* add logging statement
* Update src/transformers/modeling_flax_pytorch_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* switch back to native any
* fix check for bf16 weights
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Start the work for TFViTModel
* Convert to TF code - need to check in the follow up commits
* Clean up model code
* Expose TFViTModel
* make style
* make quality
* Add test
* make style & quality
* Fix some imports
* fix wrong usage - *kwargs => ** kwargs
* Fix Conv2D weight loading (PT->TF) issue
* Add tests for images with different sizes + fix model
* Fix some common tests for TFViTModel
* Use inputs instead of input_ids in test_compile_tf_model
* Add a comment about transpose and Conv2D in convert_tf_weight_name_to_pt_weight_name
* Avoid transpose in TFViT call
* Fix Conv2D issue in load_tf2_weights_in_pytorch_model
* Use tf.keras.layers.Conv2D instead of tf.nn.conv2d
* Using simpler heuristic to detect Conv2D layer
* Change convert_tf_weight_name_to_pt_weight_name to return TransposeType
* Check tf_weight_shape is not None before using it
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix missing comma
* fix input dtype
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* correct order of overflowing tokens for LayoutLmV2 tokenizer
* test to check order of overflowing_tokens for a seq of input_ids
* fix up quality
* added suggested changes
* check that tests the bbox sequence
* pair_input test added
* pass quality test
* check bbox sequence added
* unittest method
* comments added
* add overflowing bbox test
* improved "seq_1"
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* improve code quality
Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* minor modification to the wav2vec2 modeling file to support tensor-parallelism with DeepSpeed on this HuggingFace model
* refine the comments
* synch changes
* fix comments
* refine comments
* fix format
* Start PR doc
* Cleanup the quality checks and document them
* Add reference in the contributing guide
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Rename file as per review suggestion
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Fix of issue #13327: Wrong weight initialization for TF t5 model
* run black formatter
* fix typo
* remove my name tag from comments
Co-authored-by: Shirron <dan.shirron@intel.com>
* Adding support for `truncation` parameter on `feature-extraction`
pipeline.
Fixes#14183
* Fixing tests on ibert, longformer, and roberta.
* Rebase fix.
* minimal fixes to run DataCollatorForWholeWordMask with return_tensors="np" and return_tensors="tf"
* more consinstent implementation for numpy_mask_tokens
* Add cross attentions to TFGPT2Model
* change to is_pt_tf_cross_test
* A minor correction to a comment
* Remove n_ctx when creating self.crossattention
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add Beit model ouput class
* inherting from BaseModelOuputWithPooling
* updated docs if use_mean_pooling is False
* added beit specific outputs in model docs
* changed the import path
* Fix docs
Co-authored-by: Niels Rogge <niels.rogge1@gmail.com>
* check test_configuration_tie
* Fix test_configuration_tie
* make test slow again
* Remove property and use model.module.bind
* revert to slow test
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Add first draft
* Make forward pass work
* Improve conversion script
* Add notebook that checks if it works
* Add BeitForSemanticSegmentation to the tests
* More improvements
* Make BeitForSemanticSegmentation consistent with Segformer
* Small bug fix
* Add BeitForSemanticSegmentation to docs
* Make sure model doesn't output hidden states when the user doesn't want to
* Make it possible to convert the large model
* Fix issue
* Fix conversion script for large model
* Add auxiliary_head option to semantic segmentation model
* Apply suggestions from @sgugger's review
* Apply suggestions from code review
* Fix failing test
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Fixing image segmentation for inference mode.
* Update src/transformers/pipelines/base.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Adding `handle_long_generation` paramters for `text-generation` pipeline.
* More error handling
* Fixing tests by dropping tf support on this functionality, it needs
`max_new_tokens` to make it possible to understand user's intent.
Otherwise, `max_length` == `tokenizer.model_max_length` <
input_ids.shape[0].
* Fixing doc ?
* Doc ?
* Remove link from doc.
* Catched an issue on roberta.
* Damn doc.
* Non BC proposal ?
* Cleaning the fix ?
* Finally using only a test override.
* Don't need to modify this.
* Bad print.
* Add the support for the fast (rust) implementation of BlenbderbotTokenizer
* Fix a converter and a typo in a doc
* Apply the patil-suraj's suggestion
* (Nitpick) Fast tokenization -> Fast Tokenization in doc
* Apply the SaulLu's suggestion
* Apply Narsil's suggestion to fix test pipelines
* Add encoder_no_repeat_ngram_size according to the Narsil's suggestion
* Revert the last (unnecessary) commit
* Override pipeline config for Blenderbot to allow for larger pos. emb.
* make fix-copies
* Remove n_ctx from configs
* Fix GPTJ and OpenAIGPT, both are acceptable breaking changes as there are no configs such that it breaks
* Remove unecessary n_positions from TFOpenAIGPT
* First draft
* Make tuple output more readable
* Replace assertions by value errors
* Make it possible to predict_with_generate for vision and speech models
* Adapt Seq2SeqTrainer to work with VisionEncoderDecoder/SpeechEncoderDecoder
* Add deprecation warning
* Add copied from statements to vision and speech encoder decoders
* Fix failing test
* Apply @patrickvonplaten's suggestion
* Use reshape instead of view for consistency
* First draft
* Make style & quality
* Improve conversion script
* Add print statement to see actual slice
* Make absolute tolerance smaller
* Fix image classification models
* Add post_process_semantic method
* Disable padding
* Improve conversion script
* Rename to ForSemanticSegmentation, add integration test, remove post_process methods
* Improve docs
* Fix code quality
* Fix feature extractor tests
* Fix tests for image classification model
* Delete file
* Add is_torch_available to feature extractor
* Improve documentation of feature extractor methods
* Apply suggestions from @sgugger's code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply some more suggestions of code review
* Rebase with master
* Fix rebase issues
* Make sure model only outputs hidden states when the user wants to
* Apply suggestions from code review
* Add pad method
* Support padding of 2d images
* Add print statement
* Add print statement
* Move padding method to SegformerFeatureExtractor
* Fix issue
* Add casting of segmentation maps
* Add test for padding
* Add small note about padding
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* unispeech
* add copy from
* remove hubert copy from
* finish for today
* add unispeech-sat
* adapt more
* up
* up
* up
* up
* add modeling
* add tests
* up
* up
* finish
* up
* Apply suggestions from code review
* up
* up
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* up
* up
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Include KerasTensor in allowed types
- This allows propagating symbolic tensors through TFBert models and layers' call(),
which allows converting the subclass models to functional models.
* Style pass
Co-authored-by: Sergio Valcarcel Macua <sergiov@graphcore.ai>
Co-authored-by: matt <rocketknight1@gmail.com>
* Add seq2seq example for QnA on SQuAD Dataset.
* Changes from review - Fixing styling mistakes.
* Added how to example in README, simplified the access to dataset's preprocess function.
* Added tests for the seq2seq QA example.
* Change dataset column name to fix tests.
* Fix test command mistake.
* Add missing argument 'ignore_pad_token_for_loss' from DataTrainingArguments.
* Add missing argument 'num_beams' from DataTrainingArguments.
* Fix processing of output predicted token ids so that tokenizer decode gets appropriate input. Updated assertion conditions on the tests.
* TF Model train and eval step metrics for seq2seq models.
When using a model with a seq2seq output compute metrics against logits.
* Removing vestigial code
Co-authored-by: matt <rocketknight1@gmail.com>
* Add API to register a new object in auto classes
* Fix test
* Documentation
* Add to tokenizers and test
* Add cleanup after tests
* Be more careful
* Move import
* Move import
* Cleanup in TF test too
* Add consistency check
* Add documentation
* Style
* Update docs/source/model_doc/auto.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/models/auto/auto_factory.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* First draft
* Update self-attention of RoBERTa as proposition
* Improve conversion script
* Add TrOCR decoder-only model
* More improvements
* Make forward pass with pretrained weights work
* More improvements
* Some more improvements
* More improvements
* Make conversion work
* Clean up print statements
* Add documentation, processor
* Add test files
* Small improvements
* Some more improvements
* Make fix-copies, improve docs
* Make all vision encoder decoder model tests pass
* Make conversion script support other models
* Update URL for OCR image
* Update conversion script
* Fix style & quality
* Add support for the large-printed model
* Fix some issues
* Add print statement for debugging
* Add print statements for debugging
* Make possible fix for sinusoidal embedding
* Further debugging
* Potential fix v2
* Add more print statements for debugging
* Add more print statements for debugging
* Deubg more
* Comment out print statements
* Make conversion of large printed model possible, address review comments
* Make it possible to convert the stage1 checkpoints
* Clean up code, apply suggestions from code review
* Apply suggestions from code review, use Microsoft models in tests
* Rename encoder_hidden_size to cross_attention_hidden_size
* Improve docs
* Add cross attentions to TFGPT2Model
* Add TFEncoderDecoderModel
* Add TFBaseModelOutputWithPoolingAndCrossAttentions
* Add cross attentions to TFBertModel
* Fix past or past_key_values argument issue
* Fix generation
* Fix save and load
* Add some checks and comments
* Clean the code that deals with past keys/values
* Add kwargs to processing_inputs
* Add serving_output to TFEncoderDecoderModel
* Some cleaning + fix use_cache value issue
* Fix tests + add bert2bert/bert2gpt2 tests
* Fix more tests
* Ignore crossattention.bias when loading GPT2 weights into TFGPT2
* Fix return_dict_in_generate in tf generation
* Fix is_token_logit_eos_token bug in tf generation
* Finalize the tests after fixing some bugs
* Fix another is_token_logit_eos_token bug in tf generation
* Add/Update docs
* Add TFBertEncoderDecoderModelTest
* Clean test script
* Add TFEncoderDecoderModel to the library
* Add cross attentions to TFRobertaModel
* Add TFRobertaEncoderDecoderModelTest
* make style
* Change the way of position_ids computation
* bug fix
* Fix copies in tf_albert
* Remove some copied from and apply some fix-copies
* Remove some copied
* Add cross attentions to some other TF models
* Remove encoder_hidden_states from TFLayoutLMModel.call for now
* Make style
* Fix TFRemBertForCausalLM
* Revert the change to longformer + Remove copies
* Revert the change to albert and convbert + Remove copies
* make quality
* make style
* Add TFRembertEncoderDecoderModelTest
* make quality and fix-copies
* test TFRobertaForCausalLM
* Fixes for failed tests
* Fixes for failed tests
* fix more tests
* Fixes for failed tests
* Fix Auto mapping order
* Fix TFRemBertEncoder return value
* fix tf_rembert
* Check copies are OK
* Fix missing TFBaseModelOutputWithPastAndCrossAttentions is not defined
* Add TFEncoderDecoderModelSaveLoadTests
* fix tf weight loading
* check the change of use_cache
* Revert the change
* Add missing test_for_causal_lm for TFRobertaModelTest
* Try cleaning past
* fix _reorder_cache
* Revert some files to original versions
* Keep as many copies as possible
* Apply suggested changes - Use raise ValueError instead of assert
* Move import to top
* Fix wrong require_torch
* Replace more assert by raise ValueError
* Add test_pt_tf_model_equivalence (the test won't pass for now)
* add test for loading/saving
* finish
* finish
* Remove test_pt_tf_model_equivalence
* Update tf modeling template
* Remove pooling, added in the prev. commit, from MainLayer
* Update tf modeling test template
* Move inputs["use_cache"] = False to modeling_tf_utils.py
* Fix torch.Tensor in the comment
* fix use_cache
* Fix missing use_cache in ElectraConfig
* Add a note to from_pretrained
* Fix style
* Change test_encoder_decoder_save_load_from_encoder_decoder_from_pt
* Fix TFMLP (in TFGPT2) activation issue
* Fix None past_key_values value in serving_output
* Don't call get_encoderdecoder_model in TFEncoderDecoderModelTest.test_configuration_tie until we have a TF checkpoint on Hub
* Apply review suggestions - style for cross_attns in serving_output
* Apply review suggestions - change assert + docstrings
* break the error message to respect the char limit
* deprecate the argument past
* fix docstring style
* Update the encoder-decoder rst file
* fix Unknown interpreted text role "method"
* fix typo
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* adapt wav2vec2
* add example
* add files
* adapt
* remove bogus file
* Apply suggestions from code review
* adapt files more
* upload changes
* del old files
* up
* up
* up
* up
* up
* correct gradient checkpoitning
* add readme
* finish
* finish
* up
* more fixes
* up
* up
* add demo run to readme
* up
* Replace all assert by ValueError in src/transformers/models/electra
* Reformat with black to pass check_code_quality test
* Change some assert to ValueError of modeling_bert & modeling_tf_albert
* Change some assert in multiples models
* Change multiples models assertion to ValueError in order to validate
check_code_style test and models template test.
* Black reformat
* Change some more asserts in multiples models
* Change assert to ValueError in modeling_layoutlm.py to fix copy error in code_style_check
* Add proper message to ValueError in modeling_tf_albert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Simplify logic in models/bert/modeling_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add ValueError message to models/convbert/modeling_tf_convbert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add error message for ValueError to modeling_tf_electra.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Simplify logic in models/tapas/modeling_tapas.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Simplify logic in models/electra/modeling_electra.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add ValueError message in src/transformers/models/bert/modeling_tf_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Simplify logic in src/transformers/models/rembert/modeling_rembert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Simplify logic in src/transformers/models/albert/modeling_albert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* #12789 Replace assert statements with exceptions
* fix-copies: made copy changes to utils_qa.py in examples/pytorch/question-answering and examples/tensorflow/question-answering
* minor refactor for clarity
* enabling using past_key_values together with labels when training in T5ForConditionalGeneration
* test
* Enable past_key_values in T5ForconditionalGeneration while training.
* delete comments
* Tmp.
* Fixing BC for question answering with long context.
* Capping model_max_length to avoid tf overflow.
* Bad workaround bugged roberta.
* Fixing name.
* Symbolic trace dynamic axes support for BERT like models (albert, bert, distilbert, mobilebert, electra, megatron-bert)
* Sanity checks before tracing that make sure the model to trace is supported
* Adapted to PyTorch 1.9
Co-authored-by: Michael Benayoun <michael@huggingface.co>
* update no_* argument
Changes the order so that the no_* argument is created after the original argument AND sets the default for this no_* argument to False
* import copy
* update test
* make style
* Use kwargs to set default=False
* make style
In BartForConditionalGeneration.forward, if labels are provided,
decoder_input_ids are set to the labels shifted to the right.
This is problematic: if decoder_inputs_embeds is also set,
the call to self.model, which eventually gets to BartDecoder.forward,
will raise an error.
The fix is quite simple, similar to what is there already in
BartModel.forward. Mainly, we should not
compute decoder_input_ids if decoder_inputs_embeds is provided.
Co-authored-by: Silviu Vlad Oprea <silviuvo@amazon.co.uk>
* Keras callback to push to hub each epoch, or after N steps
* Reworked the callback to use Repository
* Use an Enum for save_strategy
* Style pass
* Correct type for tokenizer
* Update src/transformers/keras_callbacks.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/keras_callbacks.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/keras_callbacks.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/keras_callbacks.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/keras_callbacks.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/keras_callbacks.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Adding print message to the final upload
* Adding print message to the final upload
* Change how we wait for the last process to finish
* is_done is a property, not a method, derp
* Docstrings and documentation
* Style pass
* Style edit
* Docstring reformat
* Docstring rewrite
* Replacing print with internal logger
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
We use `torch.unique` here only to check whether every elements have
the same value.
Therefore, we can use `torch.unique_consecutive` here.
This function eliminates all but the first element from every consecutive
group of equivalent elements.
Like, if we apply this function to `[1, 2, 2, 1]`, it will result in
`[1, 2, 1]`.
As you could see, this is enough for checking whether every elements
have the same value.
Since `torch.unique_consecutive` do less thing, it is much more faster.
On my computer, it is 25x faster on GPU and 15x faster on CPU.
This moves the assertion on checking input dimensions into a block that will only be called if the function is actually going to do chunking forward. This is often not the case at inference time and PyTorch tracing a model with this assertion in it leads to a tracing warning.
TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
input_tensor.shape[chunk_dim] == tensor_shape for input_tensor in input_tensors
* add sigopt hpo to transformers.
Signed-off-by: Ding, Ke <ke.ding@intel.com>
* extend sigopt changes to test code and others..
Signed-off-by: Ding, Ke <ke.ding@intel.com>
* Style.
* fix style for sigopt integration.
Signed-off-by: Ding, Ke <ke.ding@intel.com>
* Add necessary information to run unittests on SigOpt.
Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* one possible solution
* low mem from_pretrained
* edge cases
* solve the persistent buffers
* style
* parametrize
* for later
* proper solution
* cleanup
* refactor; rework based on suggestions
* revert splitting into 2 parts, move checks into main func
* Use fp16 checkpoints
* Style
* Fix outputs and disable OOM tests
* Correct another output
* Use a random smaller model for generation tests
* repo quickfix
* fix gradient checkpointing
* Raise exceptions instead of using assertions for control flow #12789
* # coding=utf-8
* Raise exceptions instead of using assertions for control flow
* Raise exceptions instead of using assertions for control flow
* Update src/transformers/tokenization_utils.py
Raise exceptions instead of using assertions for control flow
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Update src/transformers/tokenization_utils.py
Raise exceptions instead of using assertions for control flow
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Raise exceptions instead of using assertions for control flow
* test
* Raise exceptions instead of using assertions for control flow
Co-authored-by: MocktaiLEngineer <kavinarasu22@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Make gradient_checkpointing a training argument
* Update src/transformers/modeling_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update src/transformers/configuration_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Fix tests
* Style
* document Gradient Checkpointing as a performance feature
* Small rename
* PoC for not using the config
* Adapt BC to new PoC
* Forgot to save
* Rollout changes to all other models
* Fix typo
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
* Add support for exporting PyTorch LayoutLM to ONNX
* Added tests for converting LayoutLM to ONNX
* Add support for exporting PyTorch LayoutLM to ONNX
* Added tests for converting LayoutLM to ONNX
* cleanup
* Removed regression/ folder
* Add support for exporting PyTorch LayoutLM to ONNX
* Added tests for converting LayoutLM to ONNX
* cleanup
* Fixed import error
* Remove unnecessary import statements
* Changed max_2d_positions from class variable to instance variable of the config class
* Add support for exporting PyTorch LayoutLM to ONNX
* Added tests for converting LayoutLM to ONNX
* cleanup
* Add support for exporting PyTorch LayoutLM to ONNX
* cleanup
* Fixed import error
* Changed max_2d_positions from class variable to instance variable of the config class
* Use super class generate_dummy_inputs method
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* Add support for Masked LM, sequence classification and token classification
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* Removed uncessary import and method
* Fixed code styling
* Raise error if PyTorch is not installed
* Remove unnecessary import statement
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
* beit-flax
* updated FLAX_BEIT_MLM_DOCSTRING
* removed bool_masked_pos from classification
* updated Copyright
* code refactoring: x -> embeddings
* updated test: rm from_pt
* Update docs/source/model_doc/beit.rst
* model code dtype updates and
other changes according to review
* relative_position_bias
revert back to pytorch design
* Init FNet
* Update config
* Fix config
* Update model classes
* Update tokenizers to use sentencepiece
* Fix errors in model
* Fix defaults in config
* Remove position embedding type completely
* Fix typo and take only real numbers
* Fix type vocab size in configuration
* Add projection layer to embeddings
* Fix position ids bug in embeddings
* Add minor changes
* Add conversion script and remove CausalLM vestiges
* Fix conversion script
* Fix conversion script
* Remove CausalLM Test
* Update checkpoint names to dummy checkpoints
* Add tokenizer mapping
* Fix modeling file and corresponding tests
* Add tokenization test file
* Add PreTraining model test
* Make style and quality
* Make tokenization base tests work
* Update docs
* Add FastTokenizer tests
* Fix fast tokenizer special tokens
* Fix style and quality
* Remove load_tf_weights vestiges
* Add FNet to main README
* Fix configuration example indentation
* Comment tokenization slow test
* Fix style
* Add changes from review
* Fix style
* Remove bos and eos tokens from tokenizers
* Add tokenizer slow test, TPU transforms, NSP
* Add scipy check
* Add scipy availabilty check to test
* Fix tokenizer and use correct inputs
* Remove remaining TODOs
* Fix tests
* Fix tests
* Comment Fourier Test
* Uncomment Fourier Test
* Change to google checkpoint
* Add changes from review
* Fix activation function
* Fix model integration test
* Add more integration tests
* Add comparison steps to MLM integration test
* Fix style
* Add masked tokenization fix
* Improve mask tokenization fix
* Fix index docs
* Add changes from review
* Fix issue
* Fix failing import in test
* some more fixes
* correct fast tokenizer
* finalize
* make style
* Remove additional tokenization logic
* Set do_lower_case to False
* Allow keeping accents
* Fix tokenization test
* Fix FNet Tokenizer Fast
* fix tests
* make style
* Add tips to FNet docs
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* Removed misfiring warnings
* Revert "Removed misfiring warnings"
This reverts commit cea90de325056b9c1cbcda2bd2613a785c1639ce.
* Retain the warning, but only when the user actually overrides things
* Fix accidentally breaking just about every model on the hub simultaneously
* Style pass
* Fix special tokens not correctly tokenized
* Add testing
* Fix
* Fix
* Use user workflows instead of directly assigning variables
* Enable test of fast tokenizers
* Update test of canine tokenizer
* Optimize Token Classification models for TPU
As per the XLA document XLA cannot handle masked indexing well. So token classification
models for BERT and others use an implementation based on `torch.where`. This implementation
works well on TPU.
ALBERT token classification model uses the masked indexing which causes performance issues
on TPU. This PR fixes this issue by following the BERT implementation.
* Same fix for ELECTRA
* Same fix for LayoutLM
* Properly use test_fetcher for examples
* Fake example modification
* Fake modeling file modification
* Clean fake modifications
* Run example tests for any modification.
* Fix issue when labels are supplied as Numpy array instead of list
* Fix issue when labels are supplied as Numpy array instead of list
* Fix same issue in the `TokenClassification` data collator
* Style pass
Update GPT Neo ONNX config to match the changes implied by the simplification of the local attention
Co-authored-by: Michael Benayoun <michael@huggingface.co>
* Add long-overdue link to the Google TRC project
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
* Enabling dataset iteration on pipelines.
Enabling dataset iteration on pipelines.
Unifying parameters under `set_parameters` function.
Small fix.
Last fixes after rebase
Remove print.
Fixing text2text `generate_kwargs`
No more `self.max_length`.
Fixing tf only conversational.
Consistency in start/stop index over TF/PT.
Speeding up drastically on TF (nasty bug where max_length would increase
a ton.)
Adding test for support for non fast tokenizers.
Fixign GPU usage on zero-shot.
Fix working on Tf.
Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Small cleanup.
Remove all asserts + simple format.
* Fixing audio-classification for large PR.
* Overly explicity null checking.
* Encapsulating GPU/CPU pytorch manipulation directly within `base.py`.
* Removed internal state for parameters of the pipeline.
Instead of overriding implicitly internal state, we moved
to real named arguments on every `preprocess`, `_forward`,
`postprocess` function.
Instead `_sanitize_parameters` will be used to split all kwargs
of both __init__ and __call__ into the 3 kinds of named parameters.
* Move import warnings.
* Small fixes.
* Quality.
* Another small fix, using the CI to debug faster.
* Last fixes.
* Last fix.
* Small cleanup of tensor moving.
* is not None.
* Adding a bunch of docs + a iteration test.
* Fixing doc style.
* KeyDataset = None guard.
* RRemoving the Cuda test for pipelines (was testing).
* Even more simple iteration test.
* Correct import .
* Long day.
* Fixes in docs.
* [WIP] migrating object detection.
* Fixed the target_size bug.
* Fixup.
* Bad variable name.
* Fixing `ensure_on_device` respects original ModelOutput.
* Moving slow tokenizer to the Trie world.
* Adding more docstrings to the Trie.
* Fixing doctest (incompatible wiht our format? )
* Update src/transformers/tokenization_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Adding a lot more comment into the internals of this algorithm.
* Cleaner doc.
* Fixing the namings.
* Update src/transformers/tokenization_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* quality.
* Fixing longest first match.
* Small improvements to cuts + more test + canine resistant test.
* Fixing fast test.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* [docs] update dead quickstart link on resuing past for GPT2
Thed dead link have been replaced by two links of forward and call methods of the GPT2 class for torch and tensorflow respectively.
* [docs] fix formatting for gpt2 page update
* refactor GPT Config to allow dyn. properties
* make attribute_map a class attribute
* remove old code
* update unit test to test config: Add test for common properties setter
* update unit test to test config: Add test for common properties passed as parameters to __init__
* update to black code format
* Allow that setters are not defined for certain config classes
* update config classes to implement attribute_map
* bugfix lxmert config - id2labels was not defined when num_labels was set
* update broken configs - add attribute_maps
* update bart config
* update black codestyle
* update documentation on common config attributes
* update GPTJ config to new attribute map
* update docs on common attributes
* gptj config: add max_position_embeddings
* gptj config: format with black
* update speech to text 2 config
* format doc file to max_len 119
* update config template
* [docs] Update perplexity.rst to use negative log likelihood
Model `forward` returns the negative log likelihood. The document correctly defines and calculates perplexity, but the description and variable names are inconsistent, which might cause confusion.
* [docs] restyle perplexity.rst
* correct order of overflowing_tokens for slow tokenizer (issue fix#13148)
* python 3.9 requires sentencepiece version 0.1.94 or above
* slicing of ids fixed in truncated_sequence()
* Update setup.py
* Correct order of overflowing tokens for pair of sentences
* code reformatted
* Update tokenization_utils_base.py
* reformatting file
* test to check single_input added
* missing function restored
* test to check pair_input overflowing tokens order
* test to check pair_input overflowing tokens order
* test to check pair_input overflowing tokens order
* added an error message for pair of seq and longest_first strategy
* test for pair_input modified
* variable name corrected
* fixed a typo in error message
* requested changes implemented
* required test added
* Corrected the message to match test message
* added error message for Luke Tokenizer
* lost test recovered
* docstring for truncate_sequences and prepare_for_model updated
* docstring for luke tokenizer updated
* updated ENCODE_PLUS_ADDITIONAL_KWARGS_DOCSTRING
* aligned text and fixed puncuatations
* improved style and quality of code
* fixed error_msg in truncate_sequences
* replaced encode_plus method with regular call method
* clean up
* rephrased the docstring
* Update clip loss calculation
Hello, I'm the author of the blog you took the snippet from. I think this way of calculating is possibly slightly more accurate for calculation.
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* add test in trainer and test tokenizer saving wi
th trainer
* quality
* reverse trainer changes
* replace test in test_trainer by a test for all the tokenizers
* format
* add can_save_slow_tokenizer attribute to all tokenizers
* fix Herbert
* format
* Change comment in error
* add comments and a new assert
* Update src/transformers/models/albert/tokenization_albert_fast.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* change ValueError barthez
* change ValueError BigBird
* change ValueError Camembert
* change ValueError Mbart50
* change ValueError Pegasus
* change ValueError ReFormer
* change ValueError T5
* change ValueError RoBERTa
* XLNET fast
* Update tests/test_tokenization_common.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* change `assert` into `self.assertIn`
* format
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix_torch_device_generate_test
* remove @
* up
* correct some bugs
* correct model
* finish speech2text extension
* up
* up
* up
* up
* Update utils/custom_init_isort.py
* up
* up
* update with tokenizer
* correct old tok
* correct old tok
* fix bug
* up
* up
* add more tests
* up
* fix docs
* up
* fix some more tests
* add better config
* correct some more things
"
* fix tests
* improve docs
* Apply suggestions from code review
* Apply suggestions from code review
* final fixes
* finalize
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* apply suggestions Lysandre and Sylvain
* apply nicos suggestions
* upload everything
* finish
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: your_github_username <your_github_email>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* added token_type_ids buffer to fix the issue #5664
* Handling the case that position_id buffer is not registered
* added token_type_ids buffer to fix the issue #5664
* modified to support device conversion when the model is traced
* registered buffer for position-ids to address issues similar to issue#5664
* added comment
* added the flag to prevent from adding the buffer into the state_dict
* Add the audio classification pipeline
* Remove autoconfig exception
* Mark ffmpeg test as slow
* Rearrange pipeline tests
* Add small test
* Replace asserts with ValueError
* Add option to add flax
* Add flax template for __init__.py
* Add flax template for .rst
* Copy TF modeling template
* Add a missing line in modeling_tf_... template
* Update first half of modeling_flax_..
* Update encoder flax template
* Copy test_modeling_tf... as test_modeling_flax...
* Replace some TF to Flax in test_modeling_flax_...
* Replace tf to np
some function might not work, like _assert_tensors_equal
* Replace remaining tf to np (might not work)
* Fix cookiecutter
* Add Flax in to_replace_... template
* Update transformers-cli add-new-model
* Save generate_flax in configuration.json
This will be read by transformers-cli
* Fix to_replace_... and cli
* Fix replace cli
* Fix cookiecutter name
* Move docstring earlier to avoid not defined error
* Fix a missing Module
* Add encoder-decoder flax template from bart
* Fix flax test
* Make style
* Fix endif
* Fix replace all "utf-8 -> unp-8"
* Update comment
* Fix flax template (add missing ..._DOCSTRING)
* Use flax_bart imports in template (was t5)
* Fix unp
* Update templates/adding_a_new_model/tests
* Revert "Fix unp"
This reverts commit dc9002a41d902c4f9b07343eab1cb350c8b7fd57.
* Remove one line of copied from to suppress CI error
* Use generate_tensorflow_pytorch_and_flax
* Add a missing part
* fix typo
* fix flax config
* add examples for flax
* small rename
* correct modeling imports
* correct auto loading
* corrects some flax tests
* correct small typo
* correct as type
* finish modif
* correct more templates
* final fixes
* add file testers
* up
* make sure tests match template regex
* correct pytorch
* correct tf
* correct more tf
* correct imports
* minor error
* minor error
* correct init
* more fixes
* correct more flax tests
* correct flax test
* more fixes
* correct docs
* update
* fix
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Adding a TF variant of the DataCollatorForTokenClassification to get feedback
* Added a Numpy variant and a post_init check to fail early if a missing import is found
* Fixed call to Numpy variant
* Added a couple more of the collators
* Update src/transformers/data/data_collator.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fixes, style pass, finished DataCollatorForSeqToSeq
* Added all the LanguageModeling DataCollators, except SOP and PermutationLanguageModeling
* Adding DataCollatorForPermutationLanguageModeling
* Style pass
* Add missing `__call__` for PLM
* Remove `post_init` checks for frameworks because the imports inside them were making us fail code quality checks
* Remove unused imports
* First attempt at some TF tests
* A second attempt to make any of those tests actually work
* TF tests, round three
* TF tests, round four
* TF tests, round five
* TF tests, all enabled!
* Style pass
* Merging tests into `test_data_collator.py`
* Merging tests into `test_data_collator.py`
* Fixing up test imports
* Fixing up test imports
* Trying shuffling the conditionals around
* Commenting out non-functional old tests
* Completed all tests for all three frameworks
* Style pass
* Fixed test typo
* Style pass
* Move standard `__call__` method to mixin
* Rearranged imports for `test_data_collator`
* Fix data collator typo "torch" -> "pt"
* Fixed the most embarrassingly obvious bug
* Update src/transformers/data/data_collator.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Renaming mixin
* Updating docs
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Dalton Walker <dalton_walker@icloud.com>
Co-authored-by: Andrew Romans <andrew.romans@hotmail.com>
* Deberta_v2 tf
* added new line at the end of file, make style
* +V2, typo
* remove never executed branch of code
* rm cmnt and fixed typo in url filter
* cleanup according to review comments
* added #Copied from
* added missing __spec__ to _LazyModule
* test __spec__ is not None after module import
* changed module_spec arg to be optional in _LazyModule
* fix style issue
* added module spec test to test_file_utils
* Correct outdated function signatures on website.
* Upgrade sphinx to 3.5.4 (latest 3.x)
* Test
* Test
* Test
* Test
* Test
* Test
* Revert unnecessary changes.
* Change sphinx version to 3.5.4"
* Test python 3.7.11
* First commit
* Make style
* Fix dummy objects
* Add Detectron2 config
* Add LayoutLMv2 pooler
* More improvements, add documentation
* More improvements
* Add model tests
* Add clarification regarding image input
* Improve integration test
* Fix bug
* Fix another bug
* Fix another bug
* Fix another bug
* More improvements
* Make more tests pass
* Make more tests pass
* Improve integration test
* Remove gradient checkpointing and add head masking
* Add integration test
* Add LayoutLMv2ForSequenceClassification to the tests
* Add LayoutLMv2ForQuestionAnswering
* More improvements
* More improvements
* Small improvements
* Fix _LazyModule
* Fix fast tokenizer
* Move sync_batch_norm to a separate method
* Replace dummies by requires_backends
* Move calculation of visual bounding boxes to separate method + update README
* Add models to main init
* First draft
* More improvements
* More improvements
* More improvements
* More improvements
* More improvements
* Remove is_split_into_words
* More improvements
* Simply tesseract - no use of pandas anymore
* Add LayoutLMv2Processor
* Update is_pytesseract_available
* Fix bugs
* Improve feature extractor
* Fix bug
* Add print statement
* Add truncation of bounding boxes
* Add tests for LayoutLMv2FeatureExtractor and LayoutLMv2Tokenizer
* Improve tokenizer tests
* Make more tokenizer tests pass
* Make more tests pass, add integration tests
* Finish integration tests
* More improvements
* More improvements - update API of the tokenizer
* More improvements
* Remove support for VQA training
* Remove some files
* Improve feature extractor
* Improve documentation and one more tokenizer test
* Make quality and small docs improvements
* Add batched tests for LayoutLMv2Processor, remove fast tokenizer
* Add truncation of labels
* Apply suggestions from code review
* Improve processor tests
* Fix failing tests and add suggestion from code review
* Fix tokenizer test
* Add detectron2 CI job
* Simplify CI job
* Comment out non-detectron2 jobs and specify number of processes
* Add pip install torchvision
* Add durations to see which tests are slow
* Fix tokenizer test and make model tests smaller
* Frist draft
* Use setattr
* Possible fix
* Proposal with configuration
* First draft of fast tokenizer
* More improvements
* Enable fast tokenizer tests
* Make more tests pass
* Make more tests pass
* More improvements
* Addd padding to fast tokenizer
* Mkae more tests pass
* Make more tests pass
* Make all tests pass for fast tokenizer
* Make fast tokenizer support overflowing boxes and labels
* Add support for overflowing_labels to slow tokenizer
* Add support for fast tokenizer to the processor
* Update processor tests for both slow and fast tokenizers
* Add head models to model mappings
* Make style & quality
* Remove Detectron2 config file
* Add configurable option to label all subwords
* Fix test
* Skip visual segment embeddings in test
* Use ResNet-18 backbone in tests instead of ResNet-101
* Proposal
* Re-enable all jobs on CI
* Fix installation of tesseract
* Fix failing test
* Fix index table
* Add LayoutXLM doc page, first draft of code examples
* Improve documentation a lot
* Update expected boxes for Tesseract 4.0.0 beta
* Use offsets to create labels instead of checking if they start with ##
* Update expected boxes for Tesseract 4.1.1
* Fix conflict
* Make variable names cleaner, add docstring, add link to notebooks
* Revert "Fix conflict"
This reverts commit a9b46ce9afe47ebfcfe7b45e6a121d49e74ef2c5.
* Revert to make integration test pass
* Apply suggestions from @LysandreJik's review
* Address @patrickvonplaten's comments
* Remove fixtures DocVQA in favor of dataset on the hub
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* examples: only use keep_linebreaks when reading TXT files for all CLM examples
* examples: only use keep_linebreaks when reading TXT files for all CLM examples
* examples: only use keep_linebreaks when reading TXT files for all CLM examples
* Add hubert classifier + tests
* Add hubert classifier + tests
* Dummies for all classification tests
* Wav2Vec2 classifier + ER test
* Fix hubert integration tests
* Add hubert IC
* Pass tests for all classification tasks on Hubert
* Pass all tests + copies
* Move models to the SUPERB org
* Moving `zero-shot-classification` pipeline to new testing.
* Cleaning up old mixins.
* Fixing tests
`sshleifer/tiny-distilbert-base-uncased-finetuned-sst-2-english` is
corrupted in PT.
* Adding warning.
* examples: add keep_linebreaks option to text dataset loader for all CLM examples
* examples: introduce new keep_linebreaks option as data argument in CLM examples
* First commit
* Add interpolation of patch embeddings
* Comment out code
* Fix bug
* Fix another bug
* Fix bug
* Fix another bug
* Remove print statements
* Update conversion script
* Use the official vit implementation
* Add support for converting dino_vits8
* Add DINO to docs of ViT
* Remove assertion
* Add interpolation of position encodings
* Fix bug
* Add align_corners
* Add interpolate_pos_encoding option to forward pass of ViTModel
* Improve interpolate_pos_encoding method
* Add docstring
- Enforce `test_small_models_{tf,pt}` methods to exist (enforce checking
actual values in small tests)
- Add support for non RGB image for the pipeline.
* New test format for conversational.
* Putting back old mixin.
* Re-enabling auto tests with LazyLoading.
* Feature extraction tests.
* Remove feature-extraction.
* Feature extraction with feature_extractor (No pun intended).
* Update check_model_type for fill-mask.
* fix AutoModel.from_pretrained(..., torch_dtype=...)
* fix to_diff_dict
* add better test
* torch is not always available when a model has self.torch_dtype
* make flax gpt2 working with cross attention
* Remove encoder->decoder projection layer
* A draft (incomplete) for FlaxEncoderDecoderModel
* Add the method from_encoder_decoder_pretrained + the docstrings
* Fix the mistakes of using EncoderDecoderModel
* Fix style
* Add FlaxEncoderDecoderModel to the library
* Fix cyclic imports
* Add FlaxEncoderDecoderModel to modeling_flax_auto.py
* Remove question comments
* add tests for FlaxEncoderDecoderModel
* add flax_encoder_decoder to the lists of ignored entries in check_repo.py
* fix missing required positional arguments
* Remove **kwargs when creating FlaxEncoderDecoderModel in from_encoder_decoder_pretrained()
Also fix generation eos/pad tokens issue
* Fix: Use sequences from the generated_output
* Change a check from assert to raise ValueError
* Fix examples and token ids issues
* Fix missing all_cross_attentions when outputting tuple in modeling_gpt2
* Remove the changes in configuration docstrings.
* allow for bert 2 gpt2
* make fix-copies
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Change remaining examples to bert2gpt2
* Change the test to Bert2GPT2
* Fix examples
* Fix import
* Fix unpack bug
* Rename to FlaxEncoderDecoderModelTest and change the test to bert2gpt2
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix: NotImplentedError -> NotImplementedError
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* up
* finalize
Co-authored-by: ydshieh <ydshieh@user.noreply>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add test
* add change in PretrainedTokenizerBase
* change Luke
* deactivate
* add the possibility to add additional special tokens for M2M100
* format
* add special test for canine
* proposed changes for mbart
* proposed changes for mbart50
* proposed changes for byt5
* proposed changes for canine
* proposed changes for t5
* test fast and slow
* remove comment
* remove comment
* add fast version for all tests
* replace break by continue
* add more comments
* add check to avoid duplicates
* remove comment
* format
* proposed change for wave2vec2
* reverse changes mbart
* uncomment
* format
* Barrier -> barrier
* added logger for metrics
* removed stream handler in trainer
* moved handler
* removed streamhandler from trainer
* updated test image and instance type added datasets version to test
* Update tests/sagemaker/scripts/pytorch/requirements.txt
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Fill mask pipelines test updates.
* Model eval !!
* Adding slow test with actual values.
* Making all tests pass (skipping quite a bit.)
* Doc styling.
* Better doc cleanup.
* Making an explicit test with no pad token tokenizer.
* Typo.
* Fix inconsistency of the last element in hidden_states between PyTorch/Flax GPT2(Neo) (#13102)
* Fix missing elements in outputs tuple
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Fix local variable 'all_hidden_states' referenced before assignment
* Fix by returning tuple containing None values
* Fix quality
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Create py.typed
This creates a [py.typed as per PEP 561](https://www.python.org/dev/peps/pep-0561/#packaging-type-information) that should be distributed to mark that the package includes (inline) type annotations.
* Update setup.py
Include py.typed as package data
* Update setup.py
Call `setup(...)` with `zip_safe=False`.
* conditional declare `TOKENIZER_MAPPING_NAMES` within a `if TYPE_CHECKING` block so that type checkers dont need to evaluate the RHS of the assignment.
this improves performance of the pylance/pyright type checkers
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* adding missing import
* format
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Change FlaxBartForConditionalGeneration.decode() argument: deterministic -> train
* Also change the parameter name to train for flax marian and mbart
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Doctests
* Limit to 4 decimals
* Try with separate PT/TF tests
* Remove test for TF
* Ellips the predictions
* Doctest continue on failure
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Classification head of AlbertForMultipleChoice uses `hidden_dropout_prob` instead of `classifier_dropout_prob`. This
is not desirable as we cannot change classifer head dropout probability without changing the dropout probabilities of
the whole model.
* Fix doctests for quicktour
* Adapt causal LM exemple
* Remove space
* Fix until summarization
* End of task summary
* Style
* With last changes in quicktour
* Use original key for label in DataCollatorForTokenClassification
DataCollatorForTokenClassification accepts either `label` or `labels` as key for label in it's input. However after padding the label it assigns the padded labels to key `labels`. If originally `label` was used as key than the original upadded labels still remains in the batch. Then at line 192 when we try to convert the batch elements to torch tensor than these original unpadded labels cannot be converted as the labels for different samples have different lengths.
* Fixed style.
* Adding HuggingArtists to Community Notebooks
* Adding HuggingArtists to Community Notebooks
* Adding HuggingArtists to Community Notebooks
* docs: add HuggingArtists to community notebooks
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix_torch_device_generate_test
* remove @
* improve docs for clm
* speed-ups
* correct t5 example as well
* push final touches
* Update examples/flax/language-modeling/README.md
* correct docs for mlm
* Update examples/flax/language-modeling/README.md
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Fix tied weights on TPU
* Manually tie weights in no trainer examples
* Fix for test
* One last missing
* Gettning owned by my scripts
* Address review comments
* Fix test
* Fix tests
* Fix reformer tests
T5 with past ONNX export, and more explicit past_key_values inputs and outputs names for ONNX model
Authored-by: Michael Benayoun <michael@huggingface.co>
* Initial work
* All auto models
* All tf auto models
* All flax auto models
* Tokenizers
* Add feature extractors
* Fix typos
* Fix other typo
* Use the right config
* Remove old mapping names and update logic in AutoTokenizer
* Update check_table
* Fix copies and check_repo script
* Fix last test
* Add back name
* clean up
* Update template
* Update template
* Forgot a )
* Use alternative to fixup
* Fix TF model template
* Address review comments
* Address review comments
* Style
* First pass
* Make conversion script work
* Improve conversion script
* Fix bug, conversion script working
* Improve conversion script, implement BEiTFeatureExtractor
* Make conversion script work based on URL
* Improve conversion script
* Add tests, add documentation
* Fix bug in conversion script
* Fix another bug
* Add support for converting masked image modeling model
* Add support for converting masked image modeling
* Fix bug
* Add print statement for debugging
* Fix another bug
* Make conversion script finally work for masked image modeling models
* Move id2label for datasets to JSON files on the hub
* Make sure id's are read in as integers
* Add integration tests
* Make style & quality
* Fix test, add BEiT to README
* Apply suggestions from @sgugger's review
* Apply suggestions from code review
* Make quality
* Replace nielsr by microsoft in tests, add docs
* Rename BEiT to Beit
* Minor fix
* Fix docs of BeitForMaskedImageModeling
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
help for `ModelArguments.gradient_checkpointing` should be
"If True, use gradient checkpointing to save memory
at the expense of slower backward pass."
not "Whether to freeze the feature extractor layers of the model."
(which is duplicated from `freeze_feature_extractor` arg)
* Update feature extraction pipelilne.
* Leaving 1 small model for actual values check.
* Fixes tests
- Better support for tokenizer with no pad token
- Increasing PegasusModelTesterConfig for pipelines
- Test of feature extraction are more permissive + don't test Multimodel
models + encoder-decoder.
* Fixing model loading with incorrect shape (+ model with HEAD).
* Update tests/test_pipelines_common.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Revert modeling_utils modification.
* Some corrections.
* Update tests/test_pipelines_common.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/test_pipelines_feature_extraction.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Syntax.
* Fixing text-classification tests.
* Don't modify this file.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Raise an issue if the pytorch version is < 1.8.0
* Attempt to add a test to ensure it correctly raises.
* Missing docstring.
* Second attempt, patch with string absolute import.
* Let's do the call before checking it was called ...
* use the correct function ... 🤦
* Raise ImportError and AssertionError respectively when unable to find torch and torch version is not sufficient.
* Correct path mock patching
* relax constraint for torch_onnx_dict_inputs to ge instead of eq.
* Style.
* Split each version requirements for torch.
* Let's compare version directly.
* Import torch_version after checking pytorch is installed.
* @require_torch
While `Iterable[Iterable[int]]` is a nicer annotation (it's covariant!), the defensive statements parsing out `bad_words_ids` in `__init__(...)` force the caller to pass in `List[List[int]]`. I've changed the annotation to make that clear.
Change `score` -> `scores` because the argument is not positional-only, so you need consistently named parameters for the subclasses. The subclasses appear to favor `scores` over `score`.
* Fixed train_test_split test_size argument
* `Seq2SeqTrainer` set max_length and num_beams only when non None (#12899)
* set max_length and num_beams only when non None
* fix instance variables
* fix code style
* [FLAX] Minor fixes in CLM example (#12914)
* readme: fix retrieval of vocab size for flax clm example
* examples: fix flax clm example when using training/evaluation files
* Fix module path for symbolic_trace example
Co-authored-by: cchen-dialpad <47165889+cchen-dialpad@users.noreply.github.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
* Better heuristic for token-classification pipeline.
Relooking at the problem makes thing actually much simpler,
when we look at ids from a tokenizer, we have no way in **general**
to recover if some substring is part of a word or not.
However, within the pipeline, with offsets we still have access to the
original string, so we can simply look if previous character (if it
exists) of a token, is actually a space. This will obviously be wrong
for tokenizers that contain spaces within tokens, tokenizers where
offsets include spaces too (Don't think there are a lot).
This heuristic hopefully is fully bc and still can handle non-word based
tokenizers.
* Updating test with real values.
* We still need the older "correct" heuristic to prevent fusing
punctuation.
* Adding a real warning when important.
* add classifier_dropout to Electra
* no type annotations yet
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add classifier_dropout to Electra
* add classifier_dropout to Electra ForTokenClass.
* add classifier_dropout to bert
* add classifier_dropout to roberta
* add classifier_dropout to big_bird
* add classifier_dropout to mobilebert
* empty commit to trigger CI
* add classifier_dropout to reformer
* add classifier_dropout to ConvBERT
* add classifier_dropout to Albert
* add classifier_dropout to Albert
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Faster list concat for trainer_pt_utils.get_length_grouped_indices() (#11825)
get_length_grouped_indices() in LengthGroupedSampler and DistributedLengthGroupedSampler
is prohibitively slow for large number of megabatches (in test case takes hours for ~270k
megabatches with 100 items each) due to slow list concatenation with sum(megabatches, []).
Resolves: #11795
Co-authored-by: ctheodoris <cvtheodo@ds.dfci.harvard.edu>
* Replace double occurrences as the last step (#11367)
* [Flax] Fix PyTorch import error (#11839)
* fix_torch_device_generate_test
* remove @
* change pytorch import to flax import
* Fix reference to XLNet (#11846)
* Switch mem metrics flag (#11851)
* Switch mem metrics flag
* Update src/transformers/training_args.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Fix flos single node (#11844)
* fixing flos bug/typo in non-distributed setting
* storing flos every logging_interval
* Fix two typos in docs (#11852)
* typo2
* fix typo
* [Trainer] Report both steps and num samples per second (#11818)
* [Trainer] Report both steps and num samples per second
* Fix batch number
* Update src/transformers/trainer_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Address review comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Add some tests to the slow suite #11860
* Enable memory metrics in tests that need it (#11859)
* fixed a small typo in the doc (#11856)
* typo (#11858)
* Add option to log only once in multinode training (#11819)
* Add option to long only once in multinode training
* Use an alternate property
* [Wav2Vec2] SpecAugment Fast (#11764)
* first try
* finish
* [lm examples] fix overflow in perplexity calc (#11855)
* fix overflow in perplexity calc
* use inf
* fix
* [Examples] create model with custom config on the fly (#11798)
* create custom model on the flight
* better wording
* add update_from_string
* cleanup
* cleanup
* Update src/transformers/configuration_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* more bool options
* style
* fix logger
* add test
* add the doc
* assert on conflict of options
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [Wav2Vec2ForCTC] example typo fixed (#11878)
* Ensure input tensor are on device. (#11874)
The feature extractor does not create tensors on the appropriate device,
so we call `ensure_tensor_on_device` before feeding the processed inputs
to the model.
* Fix usage of head masks by TF encoder-decoder models' `generate()` function (#11775)
* Fix Bart
* Fix Blenderbot{,_small}
* Fix LED
* Fix Marian
* Fix MBart
* Fix Pegasus
* Fix T5
* Add test for generation with head_mask
* Add a common TF test
* Override a test for the LED model as head masking is not yet properly implemented
* Remove all head_masks from input preparation for LED
* Drop masking for T5 as it needs a bit of refactor
* Correcting comments in T5Stack to reflect correct tuple order (#11330)
* Correcting comments to reflect correct tuple order
In order to match the actual order (line 513 and 516, and as accessed in 968), I've changed the order mentioned in comments L962 and L966-967.
* Update modeling_t5.py
Updating another comment as well
* Removing extra space
* Fixing style and quality
* style & quality
* Update src/transformers/models/t5/modeling_t5.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [Flax] Allow dataclasses to be jitted (#11886)
* fix_torch_device_generate_test
* remove @
* change dataclasses to flax ones
* fix typo
* fix jitted tests
* fix bert & electra
* changing find_batch_size to work with tokenizer outputs (#11890)
* changing find_batch_size to work with tokenizer outputs
trainer_pt_utils.find_batch_size does not recognize the batch size of BatchEncoding objects. This can cause an error when a trainer relies on find_batch_size to report the number of observed examples in the evaluation loop.
* Trigger CI
Co-authored-by: jrenner <joseph.renner@inria.fr>
* Link official Cloud TPU JAX docs (#11892)
* Flax Generate (#11777)
* fix_torch_device_generate_test
* remove @
* add
* indexing
* correct a couple of tests
* fix tests
* add logits processor
* finish top_k, top_p, temp
* add docs
* correct flax prng key default
* improve generate
* add generation docs
* add docs
* make style
* revert model outputs change
* make style
* correct typo
* fix tests
* fix slow test
* add raise
* finish generation
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Add Emotion Speech Noteboook (#11900)
* Update deepspeed config to reflect hyperparameter search parameters (#11896)
* rebuild deepspeed config for hyperparameter search
* reformat code to fix style issues
* Adding new argument `max_new_tokens` for generate. (#11476)
* Adding new argument `max_new_tokens` for generate.
This is a proposal to add a new argument `max_new_tokens` to `generate`.
This include a `MaxNewTokensCriteria` that enables callers that don't
know about the token length ahead (like pipelines callers) to manage
more easily the length of their generated output.
* Adding a test for the user warning when both`max_length` and
`max_new_tokens` are used together.
* Removed redundant `no_grad`.
* Added Sequence Classification class in GPTNeo (#11906)
* seq classification changes
* fix tests
* [Flax] Return Attention from BERT, ELECTRA, RoBERTa and GPT2 (#11918)
* Added logic to return attention from flax-bert model and added test cases to check that
* Added new line at the end of file to test_modeling_flax_common.py
* fixing code style
* Fixing Roberta and Elextra models too from cpoying bert
* Added temporary hack to not run test_attention_outputs for FlaxGPT2
* Returning attention weights from GPT2 and changed the tests accordingly.
* last fixes
* bump flax dependency
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Test optuna and ray (#11924)
* Remove `datasets` submodule
* fix assert (#11935)
* Remove redundant `nn.log_softmax` in `run_flax_glue.py` (#11920)
* Remove redundant `nn.log_softmax` in `run_flax_glue.py`
`optax.softmax_cross_entropy` expects unnormalized logits, and so it already calls `nn.log_softmax`, so I believe it is not needed here. `nn.log_softmax` is idempotent so mathematically it shouldn't have made a difference.
* Remove unused 'flax.linen' import
* Add MT5ForConditionalGeneration as supported arch. to summarization README (#11961)
* Add MT5ForConditionalGeneration as supported arch.
* Update README.md
* Add FlaxCLIP (#11883)
* add flax CLIP
* default input_shape
* add tests
* fix test
* fix name
* fix docs
* fix shapes
* attend at least 1 token
* flax conv to torch conv
* return floats
* fix equivalence tests
* fix import
* return attention_weights and update tests
* fix dosctrings
* address patricks comments
* input_shape arg
* add tests for get_image_features and get_text_features methods
* fix tests
* RAG-2nd2end-revamp (#11893)
* initial
* code quality test
* code quality
* added test functions in test_modeling_rag.py and test_retrieval_rag.py to test end2end retreiver
* minor change in test_modeling_rag
* fixed tests
* Update examples/research_projects/rag-end2end-retriever/README.md
typo corrected as suggested by lhoestq
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* Update examples/research_projects/rag-end2end-retriever/finetune_rag.py
type change suggested by lhoestq
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* Update src/transformers/models/rag/retrieval_rag.py
Adding this change as mentioned by lhoestq.
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* completed the minor changes suggested by the reviewers
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* modify qa-trainer (#11872)
* modify qa-trainer
* fix flax model
* bugfixes training_args.py (#11922)
modified according to:
https://pytorch.org/xla/release/1.8.1/_modules/torch_xla/core/xla_model.html
* reinitialize wandb config for each hyperparameter search run (#11945)
* Add regression tests for slow sentencepiece tokenizers. (#11737)
* add test_vocab_size for sentencepiece tok.
* add test_get_vocab for sentencepiece tok.
* add test_convert_token_and_id for sentencepiece tok.
* add test_tokenize_and_convert_tokens_to_string for all tok.
* improve test_tokenize_and_convert_tokens_to_string for sp. tok.
* add common tokenizer integration tests
- for albert
- for barthez
* add tokenizer integration tests to bert gen.
* add most tokenizer integration tests
* fix camembert tokenizer integration test
* add tokenizer integration test to marian
* add tokenizer integration test to reformer
* add typing and doc to tokenizer_integration_test_util
* fix tokenizer integration test of reformer
* improve test_sentencepiece_tokenize_and_convert_tokens_to_string
* empty commit to trigger CI
* fix tokenizer integration test of reformer
* remove code not needed anymore
* empty commit to trigger CI
* empty commit to trigger CI
* Authorize args when instantiating an AutoModel (#11956)
* Neptune.ai integration (#11937)
An option that turns on neptune.ai logging
--report_to 'neptune'
Additional ENV variables:
NEPTUNE_PROJECT
NEPTUNE_API_TOKEN
NEPTUNE_RUN_NAME (optional)
NEPTUNE_STOP_TIMEOUT (optional)
* Run the integration tests on schedule tests instead of master tests
* [deepspeed] docs (#11940)
* deepspeed docs
* cleanup
* cleanup
* typo correction (#11973)
* typo correction
* type corrections
* ByT5 model (#11971)
* allow tf to use uneven num of layers
* add tokenizer
* finish docs
* finish docs
* Apply suggestions from code review
* include in index
* finish
* Update docs/source/model_doc/byt5.rst
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* apply sylvais suggestions
* make style
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Typo in usage example, changed to device instead of torch_device (#11979)
* [DeepSpeed] decouple `DeepSpeedConfigHF` from `Trainer` (#11966)
* decouple DeepSpeedConfigHF from Trainer
* add LoggingLevel ctx manager; add new test
* cleanup
* add docs
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* implemented suggested renames
* formatter workaround
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [Trainer] add train loss and flops metrics reports (#11980)
* add train loss and flops metrics reports
* consistency
* add train_loss to skip keys
* restore on_train_end call timing
* Bump urllib3 from 1.25.8 to 1.26.5 in /examples/research_projects/lxmert (#11983)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.25.8 to 1.26.5.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/1.25.8...1.26.5)
---
updated-dependencies:
- dependency-name: urllib3
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [RAG] Fix rag from pretrained question encoder generator behavior (#11962)
* fix_torch_device_generate_test
* remove @
* fix rag from pretrained loading
* add test
* uplaod
* finish
* VisualBERT (#10534)
* Init VisualBERT
* Add cookie-cutter, Config, and Embeddings
* Add preliminary Model
* Add Bert analogous classes
* Add basic code for NLVR, VQA, Flickr
* Update Init
* Fix VisualBert Downstream Models
* Rename classifier to cls
* Comment position_ids buffer
* Remove sentence image predictor output
* Update output dicts
* Remove unnecessary files
* Fix Auto Modeling
* Fix transformers init
* Add conversion script
* Add conversion script
* Fix docs
* Update visualbert modelling
* Update configuration
* Style fixes
* Add model and integration tests
* Add all tests
* Update model mapping
* Add simple detector from original repository
* Update docs and configs
* Fix style
* Fix style
* Update docs
* Fix style
* Fix import issues in style
* Fix style
* Add changes from review
* Fix style
* Fix style
* Update docs
* Fix style
* Fix style
* Update docs/source/model_doc/visual_bert.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/test_modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add changes from review
* Remove convert run script
* Add changes from review
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/visual_bert/modeling_visual_bert.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add changes from review
* Add changes from review
* Add visual embedding example in docs
* Fix "copied from" comments
* Add changes from review
* Fix error, style, checkpoints
* Update docs
* Fix integration tests
* Fix style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix examples (#11990)
* [docs] fix xref to `PreTrainedModel.generate` (#11049)
* fix xref to generate
* do the same for search methods
* style
* style
* Update return introduction (#11976)
Make it clear that the `forward` method now returns a dict instead of tuple.
Fix style
* [deepspeed] Move code and doc into standalone files (#11984)
* move code and docs
* style
* moved
* restore
* [deepspeed] add nvme test skip rule (#11997)
* add nvme skip rule
* fix
* Fix weight decay masking in `run_flax_glue.py` (#11964)
* Fix weight decay masking in `run_flax_glue.py`
Issues with the previous implementation:
- The `dict` from `traverse_util.flatten_dict` has keys which are tuples of strings, not one long string with the path separated by periods.
- `optax.masked` applies the transformation wherever the mask is True, so the masks are flipped.
- Flax's LayerNorm calls the scale parameter `scale` not `weight`
* Fix formatting with black
* adapt results
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* [Flax] Refactor MLM (#12013)
* fix_torch_device_generate_test
* remove @
* finish refactor
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* [Deepspeed] Assert on mismatches between ds and hf args (#12021)
* wip
* add mismatch validation + test
* renames
* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* renames
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [TrainerArguments] format and sort __repr__, add __str__ (#12018)
* format and sort __repr__, add __str__
* typo
* use __str__ directly
* alias __repr__ = __str__
* Fixed Typo in modeling_bart.py (#12035)
* Fixed Typo in modeling_bart.py - Issue #11895
* Fixed Typo in modeling_bart.py
* fix deberta 2 tokenizer integration test (#12017)
* fix docs of past_key_values (#12049)
* [JAX] Bump jax lib (#12053)
* fix_torch_device_generate_test
* remove @
* bump up jax lib
* Fixes bug that appears when using QA bert and distilation. (#12026)
* Fixing bug that appears when using distilation (and potentially other uses).
During backward pass Pytorch complains with:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
This happens because the QA model code modifies the start_positions and end_positions input tensors, using clamp_ function: as a consequence the teacher and the student both modifies the inputs, and backward pass fails.
* Fixing all models QA clamp_ bug.
* Extend pipelines for automodel tupels (#12025)
* fix_torch_device_generate_test
* remove @
* finish
* refactor
* add test
* fix test
* Attempt at simplification.
* Small fix.
* Fixing non existing AutoModel for TF.
* Naming.
* Remove extra condition.
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* Add optional grouped parsers description to HfArgumentParser (#12042)
* Adding optional argument group to HfArgumentParser
* Minor
* remove whitespace
* Minor styling
* adds metric prefix. (#12057)
* adds metric prefix.
* update tests to include prefix
* skip failing test (#12059)
* Fix integration tests (#12066)
* Fix tapas issue (#12063)
* Fix scatter function to be compatible with torch-scatter 2.7.0
* Allow test again
* updated the original RAG implementation to be compatible with latest Pytorch-Lightning (#11806)
* updated the original RAG implementation to be compatible with the latest PL version
* updated the requirements.txt file
* execute make style
* code quality test
* code quality
* conflix resolved in requirement.txt
* code quality
* changed the MyDDP class name to CustomDDP
* Replace legacy tensor.Tensor with torch.tensor/torch.empty (#12027)
* Replace legacy torch.Tensor constructor with torch.{tensor, empty}
* Remove torch.Tensor in examples
* Add torch to requirements.txt in language-modeling (#12040)
* Add torch to requirements.txt in language-modeling
* Update examples/pytorch/language-modeling/requirements.txt
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Properly indent block_size (#12070)
* [Deepspeed] various fixes (#12058)
* replace deprecated config
* sub_group_size was too big
* complete deprecation removal
* [Deepspeed Wav2vec2] integration (#11638)
* wip
* wip - but working with https://github.com/microsoft/DeepSpeed/pull/1044
* cleanup
* workaround
* working 5/8 modes
* solve fp32 distributed zero3
* style
* sync
* sync
* rework
* deprecation
* cleanup
* https://github.com/microsoft/DeepSpeed/pull/1044 pr was merged
* clean up
* add a guide
* more prose
* more prose
* fix
* more prose
* sub_group_size was too big
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* refactor
* bug fix
* make the true check explicit
* new deepspeed release
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* typo
* Update run_ner.py with id2label config (#12001)
* sync LayerDrop for Wav2Vec2Encoder + tests (#12076)
* Add DETR (#11653)
* Squash all commits of modeling_detr_v7 branch into one
* Improve docs
* Fix tests
* Style
* Improve docs some more and fix most tests
* Fix slow tests of ViT, DeiT and DETR
* Improve replacement of batch norm
* Restructure timm backbone forward
* Make DetrForSegmentation support any timm backbone
* Fix name of output
* Address most comments by @LysandreJik
* Give better names for variables
* Conditional imports + timm in setup.py
* Address additional comments by @sgugger
* Make style, add require_timm and require_vision to testsé
* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone
* Add png files to fixtures
* Fix type hint
* Add timm to workflows
* Add `BatchNorm2d` to the weight initialization
* Fix retain_grad test
* Replace model checkpoints by Facebook namespace
* Fix name of checkpoint in test
* Add user-friendly message when scipy is not available
* Address most comments by @patrickvonplaten
* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner
* Better initialization
* Scipy is necessary to get sklearn metrics
* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel
* Make style
* Improve docs and add 2 community notebooks
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* [test] support more than 2 gpus (#12074)
* support more than 2 gpus
* style
* Wav2Vec2 Pretraining (#11306)
* Working quantizer forward
* Working quantizer forward
* Clean up unused model parts, test reproducibility
* Working quantizer forward
* Clean up unused model parts, test reproducibility
* Remove custom outputs from the shared ones
* correct conversion
* correct bug
* add first pretrain script
* save intermediate
* static shapes
* save intermediate
* finish first pretrain script version
* more refactor
* remove wanddb
* refactor more
* improve test
* correct perplexity compute bug
* finish model implementation
* add to docs
* finish docs
* finish pretraining script
* finish pretraining script
* remove wandb
* finish PR for merge
* finish config
* finish
* make deepspeed work
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply suggestions
* fix flaky test
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* pass decay_mask fn to optimizer (#12087)
* rm require_version_examples (#12088)
* [Wav2Vec2ForPretraining] Correct checkpoints wav2vec2 & fix tests (#12089)
* fix_torch_device_generate_test
* remove @
* fix tests
* Add text_column_name and label_column_name to run_ner and run_ner_no_trainer args (#12083)
* Add text_column_name and label_column_name to run_ner args
* Minor fix: grouping for text and label column name
* CLIPFeatureExtractor should resize images with kept aspect ratio (#11994)
* Resize with kept aspect ratio
* Fixed failed test
* Overload center_crop and resize methods instead
* resize should handle non-PIL images
* update slow test
* Tensor => tensor
Co-authored-by: patil-suraj <surajp815@gmail.com>
* New TF GLUE example (#12028)
* Pushing partially-complete new GLUE example
* First draft of the new TF GLUE example! Needs a little more testing to be sure but it's almost ready.
* Fix to the fit() call
* Bugfixes, making sure TPU and multi-GPU support is ready
* Remove logger line that depends on Pytorch
* Style pass
* Deleting old TF GLUE example
* Include label2id and id2label in the saved model config
* Don't clobber the existing model.config.label2id
* Style fixes
* Update examples/tensorflow/text-classification/run_glue.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix quality
* Update README.md to cover the TF GLUE example.
* Minor style edits
* Appending label2id and id2label to models to ensure inference works properly (#12102)
* Fix a condition in test_generate_with_head_masking (#11911)
* Fix a condition in test_generate_with_head_masking
* Fix usage of head_mask in bigbirg_pegasus
* Fix head masking for speech2text
* Resolve copy mismatch + drop unwanted print statement
* Fix the condition
* Flax VisionTransformer (#11951)
* adding vit for flax
* added test for Flax-vit and some bug-fixes
* overrided methods where variable changes were necessary for flax_vit test
* added FlaxViTForImageClassification for test
* Update src/transformers/models/vit/modeling_flax_vit.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* made changes suggested in PR
* Adding jax-vit models for autoimport
* swapping num_channels and height,width dimension
* fixing the docstring for torch-like inputs for VIT
* add model to main init
* add docs
* doc, fix-copies
* docstrings
* small test fixes
* fix docs
* fix docstr
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* style
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add relevant description to tqdm in examples (#11927)
* add relevant `desc` in examples
* require_version datasets>=1.8.0
* Fix head masking generate tests (#12110)
* fix_torch_device_generate_test
* remove @
* fix tests
* Flax CLM script (#12023)
* first draft
* max_seq_length => block_size
* fix arg names
* fix typos
* fix loss calculation
* add max examples, fix train eval steps, metrics
* optimizer mask
* fix perpelexity, metric logging
* fix logging
* data_collator = > data_loader
* refactor loss_fn
* support single GPU
* pass distributed to write_metric
* fix jitting
* fix single device training
* fix single device metrics
* close inner progress bars once finished
* add overwrite_cache arg
* ifx dataset caching issue
* add more logs
* few small fixes,
* address nicholas suggestions
* fix docstr
* address patricks suggestions
* make flake happy
* pass new new_dropout_rng to apply_gradients
* reset train metrics after every epoc
* remove distributed logis, small fixes
* Add from_pretrained to dummy timm objects (#12097)
* Add from_pretrained to dummy timm
* Fix at the source
* Update utils/check_dummies.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Missing pretrained dummies
* Style
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix t5 error message (#12136)
* Fix t5 error message
* Fix again
* Fix megatron_gpt2 attention block's causal mask (#12007)
* Fix megatron_gpt2 attention block's causal mask.
* compatibility with checkpoints created with recent versions of Megatron-LM
* added integration test for the released Megatron-GPT2 model
* code style changes
* added option to megatron conversion script to read from config file
Co-authored-by: Guido Novati <gnovati@nvidia.com>
* Add mlm pretraining xla torch readme (#12011)
* fix_torch_device_generate_test
* remove @
* upload
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* Update examples/flax/language-modeling/README.md
* add more info
* finish
* fix
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* add readme for flax clm (#12111)
* add readme for flax clm
* use section link for tokenizer
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update metrics
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* FlaxBart (#11537)
* Start working on FlaxBart
* Create modeling_flax_bart.py
* Write FlaxBartAttention
* Add FlaxBartEncoderLayer
* Add FlaxBartDecoderLayer and some typing
* Add helepr function for FlaxBart
* shift_tokens_right
* _make_causal_mask
* _expand_mask
* Add PositionalEmbedding and fix init_std naming
* Add FlaxBartPretrainedModel
* Add FlaxBartEncoder
* Add FlaxBartEncoder
* Add FlaxBartEncoder among modules to be imported
* YET WE CANNOT INITIALIZE THAT!! :(
* Make BartEncoder working
Change BartEncoder to instance of nn.Module so far
* Add FlaxBartDecoder
* Add FlaxBartModel
* TODO to make model run -> Prepapre model inputs
* Resolve padding
* Add FlaxBartModel
* Add FlaxBartModel into importable modules
* Remove FlaxBartEncoder and FlaxBartDecoder from importable modules
* make style; not properly working
* make style; make quality not pass due to some import I left
* Remove TODO for padding_idx in nn.Embed so far
* Add FlaxBartForConditionalGeneration
* Incorporate Flax model output classes, i.e. return_dict
* Add another models and incorporate use_cache arg
* Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering
* Incorporate use_cache arg from PyTorch implementation
* Add all necessary Flax output utils
* Add FlaxBartForCausalLM; not working yet'
* Add minor improvements; still lacks some functionality
* Update docs, src and tests
* Add support of FlaxBart to docs/source
* Fix some bugs in FlaxBart souce code
* Add some neccessary tests for FlaxBart models - jit_compilation not passing
* Fix tests and add test_head_masking
* Fix tests for @jax.jit computation
* Add test_head_masking
* Migrate FlaxBart tests from jax.numpy to numpy
* Remove FlaxBartForCausalLM
* Clean repo
* fix bart model weight structure
* Fix FlaxBartForSequenceClassification
Slicing is not possible to use below jit, therefore, selecting sentence
representation from hidden_states must be changed.
* Allow FlaxBartForSequenceClassification for testing pt_flax equivalence
* Allow testing for FlaxBartForQA for pt_flax equivalence
* Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6
* remove past_key_values
* remove inputs_mebeds and make input_ids required
* add position ids
* re-write attention layer
* fix dataclass
* fix pos embeds and attention output
* fix pos embeds
* expose encode method
* expose decode method
* move docstring to top
* add cache for causal attn layer
* remove head masking for now
* s2s greedy search first pass
* boom boom
* fix typos
* fix greedy generate for bart
* use encoder, decoder layers instead of num_hidden_layers
* handle encoder_outputs
* cleanup
* simplify decoding
* more clean-up
* typos
* Change header + add {decoder_,}position_ids into 2 models
* add BartConfig
* fix existing tests
* add encode, decode methods
* Fix shift_tokens_right for JIT compilation + clarify one condition
* fix decode
* encoder => encode
* simplify generate
* add tests for encode and decode
* style
* add tests for cache
* fix equivalence tests
* sample generate now works with seq2seq
* generation tests
* initialize dense layers
* docstring and cleanup
* quality
* remove get/set input_embeddings
* address Patricks suggestions
* decode for every model, remove encoder_outputs from call
* update tests accordingly
* decode returns only decoder outputs and logits
* fix arguments
* doc encode, decode methods
* correct base_model_prefix
* fix test for seq classif model
* fix docs
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Feature to use the PreTrainedTokenizerFast class as a stand-alone tokenizer (#11810)
* feature for tokenizer without slow/legacy version
* format
* modify common test
* add tests
* add PreTrainedTokenizerFast to AutoTokenizer
* format
* change tokenizer common test in order to be able to run test without a slow version
* update tokenizer fast test in order to use `rust_tokenizer_class` attribute instead of `tokenizer_class`
* add autokenizer test
* replace `if self.tokenizer_class is not None` with ` if self.tokenizer_class is None`
* remove obsolete change in comment
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/tokenization_utils_fast.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* change `get_main_tokenizer` into `get_tokenizers`
* clarify `get_tokenizers` method
* homogenize with `test_slow_tokenizer` and `test_rust_tokenizer`
* add `test_rust_tokenizer = False` to tokenizer which don't define a fast version
* `test_rust_tokenizer = False` for BertJapaneseTokenizer
* `test_rust_tokenizer = False` for BertJapaneseCharacterTokenizationTest
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [Flax] Add links to google colabs (#12146)
* fix_torch_device_generate_test
* remove @
* add colab links
* Don't log anything before logging is setup in examples (#12121)
* Don't log anything before logging is setup in examples
* Last example
* Use text_column_name variable instead of "text" (#12132)
* Use text_column_name variable instead of "text"
`text_column_name` was already defined above where I made the changes and it was also used below where I made changes.
This is a very minor change. If a dataset does not use "text" as the column name, then the `tokenize_function` will now use whatever column is assigned to `text_column_name`. `text_column_name` is just the first column name if "text" is not a column name. It makes the function a little more robust, though I would assume that 90% + of datasets use "text" anyway.
* black formatting
* make style
Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
* [lm examples] Replicate --config_overrides addition to other LM examples (#12135)
* [lm examples] Replicate --config_overrides addition to other LM examples
* Removing no trainer files changes
* Update README
Co-authored-by: Kumar Abhishek <kabhishek@expedia.com>
* fix error message (#12148)
* [optim] implement AdafactorSchedule (#12123)
* implement AdafactorSchedule
* typo
* fix
* Update src/transformers/optimization.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [style] consistent nn. and nn.functional (#12124)
* consistent nn. and nn.functional
* fix glitch
* fix glitch #2
* Adding TFWav2Vec2Model (#11617)
* [WIP] Add TFWav2Vec2Model
Work in progress for adding a tensorflow version of Wav2Vec2
* feedback changes
* small fix
* Test Feedback Round 1
* Add SpecAugment and CTC Loss
* correct spec augment mask creation
* docstring and correct copyright
* correct bugs
* remove bogus file
* finish tests correction
* del unnecessary layers
* Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* make style
* correct final bug
* Feedback Changes
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [Flax] Fix flax pt equivalence tests (#12154)
* fix_torch_device_generate_test
* remove @
* upload
* consistent nn. and nn.functional: p2 templates (#12153)
* Flax Big Bird (#11967)
* add flax bert
* bert -> bigbird
* original_full ported
* add debugger
* init block sparse
* fix copies ; gelu_fast -> gelu_new
* block sparse port
* fix block sparse
* block sparse working
* all ckpts working
* fix-copies
* make quality
* init tests
* temporary fix for FlaxBigBirdForMultipleChoice
* skip test_attention_outputs
* fix
* gelu_fast -> gelu_new ; fix multiple choice model
* remove nsp
* fix sequence classifier
* fix
* make quality
* make fix-copies
* finish
* Delete debugger.ipynb
* Update src/transformers/models/big_bird/modeling_flax_big_bird.py
* make style
* finish
* bye bye jit flax tests
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [style] consistent nn. and nn.functional: part 3 `tests` (#12155)
* consistent nn. and nn.functional: p3 templates
* restore
* [style] consistent nn. and nn.functional: part 4 `examples` (#12156)
* consistent nn. and nn.functional: p4 examples
* restore
* consistent nn. and nn.functional: part 5 docs (#12161)
* Add video links to the documentation (#12162)
* [Flax generate] Add params to generate (#12171)
* fix_torch_device_generate_test
* remove @
* add params as input
* finish
* Use a released version of optax rather than installing from Git. (#12173)
Use a released version of optax rather than installing from Git
* Have dummy processors have a `from_pretrained` method (#12145)
* Add course banner (#12157)
* Add course banner
* Update course banner
* Adjust banner width
* Enable add_prefix_space if model_type is roberta or gpt2 (#12116)
* Update AutoModel classes in summarization example (#12178)
- Convert use of deprecated AutoModelWithLMHead to AutoModelForSeq2SeqLM
- Add newly required `truncation=True` to `tokenizer.encode` with `max_length`
This silences all warnings.
* Ray Tune Integration Updates (#12134)
* fix
* fixes
* add back to scheduled tests
* formatting
* Update integrations.py
* [testing] ensure concurrent pytest workers use a unique port for torch.dist (#12166)
* ensure concurrent pytest workers use a unique port for torch.distributed.launch
* reword
* Model card defaults (#12122)
* [WIP] Model card defaults
* finetuned_from default value
* Add all mappings to the mapping file
* Be more defensive on finetuned_from arg
* Add default task tag
* Separate tags from tasks
* Edge case for dataset
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Temporarily deactivate torch-scatter while we wait for new release (#12181)
* Temporarily deactivate torch-scatter while we wait for new release
* torch-1.8.1 binary for scatter
* Revert to 1.8.0
* Pin torch dependency
* torchaudio and torchvision
* Temporarily deactivate torchhub test (#12184)
* [Flax] Add Beam Search (#12131)
* fix_torch_device_generate_test
* remove @
* push new logit processors
* add processors
* save first working version
* save intermediate
* finish
* make style
* make fix-copies
* finish
* Update tests/test_modeling_flax_bart.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Hubert (#11889)
* fix_torch_device_generate_test
* remove @
* add hubert
* add first test file
* more docs
* fix bugs
* fix bug
* finish
* finish
* finish docstring
* fix
* fix
* finalize
* add to ignored
* finish
* Apply suggestions from code review
* correct naming
* finish
* fix auto config
* finish
* correct convert script
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* apply suggestions lysandre & suraj
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* updated DLC images and sample notebooks (#12191)
* Enabling AutoTokenizer for HubertConfig. (#12198)
* Use yaml to create metadata (#12185)
* Use yaml to create metadata
* Fix typo
* Remove pin
* [Docs] fixed broken link (#12205)
* fixed broken link
* Update docs/source/benchmarks.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update docs/source/benchmarks.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Pipeline update & tests (#12207)
* Improve detr (#12147)
* Remove unused variables
* Improve docs
* Fix docs of segmentation masks
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Add link to the course (#12229)
* Support for torch 1.9.0 (#12224)
* Support for torch 1.9.0
* Torch scatter for 1.9.0
* Github Actions run on 1.9.0
* fix pt-1.9.0 `add_` deprecation (#12217)
* fix pt-1.9.0 add_ deprecation
* add () for clarity
* Trigger CI
* require_version(torch
* Release: v4.7.0
* Docs for v4.8.0
* AutoTokenizer: infer the class from the tokenizer config if possible (#12208)
* AutoTokenizer: infer the class from the tokenizer config if possible
* Add tests
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update desc for map in all examples (#12226)
* update desc for map in all examples
* added plm
* suggestions
* [Flax] FlaxAutoModelForSeq2SeqLM (#12228)
* add FlaxAutoModelForSeq2SeqLM
* [FlaxBart] few small fixes (#12247)
* boom boom
* remove flax clip example
* few small fixes
* Depreciate pythonic Mish and support PyTorch 1.9 version of Mish (#12240)
* Moved Mish to Torch 1.9 version
* Run black formatting
* [t5 doc] make the example work out of the box (#12239)
* [run_clm.py] restore caching
* style
* [t5 doc] make the example work out of the box
This PR expands the training example to include the correct model type for the example to work, e.g. with `T5Model` this example will break.
* Update docs/source/model_doc/t5.rst
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* expand the other example
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Fix the scheduled CI
* Better CI feedback (#12279)
* Better run ID
* Only part of CI
* Revert "Only part of CI"
This reverts commit 29f7f248d21e0f5792e0670ba8705b31ad8967b7.
* Fix for making student ProphetNet for Seq2Seq Distillation (#12130)
* make_student.py: fix to make student ProphetNet
* reformat
* [FlaxClip] fix test from/save pretrained test (#12284)
* boom boom
* remove flax clip example
* fix from_save_pretrained
* [Flax] [WIP] allow loading head model with base model weights (#12255)
* boom boom
* remove flax clip example
* allow loading head model with base model weights
* add test
* fix imports
* disable save, load test for clip
* add test_save_load_to_base
* [DeepSpeed] don't ignore --adafactor (#12257)
* [Flax] Fix flax test save pretrained (#12256)
* fix_torch_device_generate_test
* remove @
* fix flax save pretrained test
* Tensorflow QA example (#12252)
* New Tensorflow QA example!
* Style pass
* Updating README.md for the new example
* flake8 fixes
* Update examples/tensorflow/question-answering/README.md
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [Flax] Add jax flax to env command (#12251)
* fix_torch_device_generate_test
* remove @
* add commands for flax/jax
* reset report_to to none, avoid deprecation warning (#12293)
* [trainer + examples] set log level from CLI (#12276)
* set log level from CLI
* add log_level_replica + test + extended docs
* cleanup
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* rename datasets objects to allow datasets module
* improve the doc
* style
* doc improve
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [tests] multiple improvements (#12294)
* [tests] multiple improvements
* cleanup
* style
* todo to investigate
* fix
* Fix for the issue of device-id getting hardcoded for token_type_ids during Tracing [WIP] (#11252)
* registering a buffer for token_type_ids, to pass the error of device-id getting hardcoded when tracing
* sytle format
* adding persistent flag to the resgitered buffers that prevent from adding them to the state_dict and addresses the Backward compatibility issue
* adding the try catch to the fix as persistent flag is only available from PT >1.6
* adding version check
* added the condition to only use the token_type_ids buffer when its autogenerated not passed by user
* adding comments and making the conidtion where token_type_ids are None to use the registered buffer
* taking out position-embeddding from the if block
* adding comments
* handling the case if buffer for position_ids was not registered
* reverted the changes on position_ids, fix the issue with size of token_type_ids buffer, moved the modification for generated token_type_ids to Bertmodel, instead of Embeddings
* reverting the token_type_ids in case of None to the previous version
* reverting changes on position_ids adding back the if block
* changes added by running make fix-copies
* changes added by running make fix-copies and added the import version as it was getting used
* changes added by running make fix-copies
* changes added by running make fix-copies
* fixing the import format
* fixing the import format
* modified to use temp tensor for trimed and expanded token_type_ids buffer
* changes made by fix-copies after temp tensor modifications
* changes made by fix-copies after temp tensor modifications
* changes made by fix-copies after temp tensor modifications
* clean up
* clean up
* clean up
* clean up
* Nit
* Nit
* Nit
* modified according to support device conversion on traced models
* modified according to support device conversion on traced models
* modified according to support device conversion on traced models
* modified according to support device conversion on traced models
* changes based on latest in master
* Adapt templates
* Add version import
Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-81.us-west-2.compute.internal>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* trainer_tf: adjust wandb installation command (#12291)
* add FlaxAutoModelForImageClassification in main init (#12298)
* Fix and improve documentation for LEDForConditionalGeneration (#12303)
* Replace conditional generation example (fixes#12268)
* Replace model in summarization example with finetuned checkpoint, adapt example text
* Fix typo in new summarization example
* Fix docstring formatting, add missing import statement to example
* [Flax] Main doc for event orga (#12305)
* fix_torch_device_generate_test
* remove @
* push
* finish
* some typos
* add more info on communication
* add suggestions
* [trainer] 2 bug fixes and a rename (#12309)
* bug fixes and a rename
* add extended DDP test
* FlaxBartPretrainedModel -> FlaxBartPreTrainedModel (#12313)
* [docs] performance (#12258)
* initial performance document
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* rewrites based on suggestions
* 8x multiple is for AMP only
* add contribute section
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Add CodeCarbon Integration (#12304)
* Add optional dependency
* Add CodeCarbon integration
* Add CodeCarbon integration
* Add CodeCarbon integration
* typo
* Optimizing away the `fill-mask` pipeline. (#12113)
* Optimizing away the `fill-mask` pipeline.
- Don't send anything to the tokenizer unless needed. Vocab check is
much faster
- Keep BC by sending data to the tokenizer when needed. User handling warning messages will see performance benefits again
- Make `targets` and `top_k` work together better `top_k` cannot be
higher than `len(targets)` but can be smaller still.
- Actually simplify the `target_ids` in case of duplicate (it can happen
because we're parsing raw strings)
- Removed useless code to fail on empty strings. It works only if empty
string is in first position, moved to ignoring them instead.
- Changed the related tests as only the tests would fail correctly
(having incorrect value in first position)
* Make tests compatible for 2 different vocabs... (at the price of a
warning).
Co-authored-by: @EtaoinWu
* ValueError working globally
* Update src/transformers/pipelines/fill_mask.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* `tokenizer.vocab` -> `tokenizer.get_vocab()` for more compatiblity +
fallback.
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Add output in a dictionary for TF `generate` method (#12139)
* Add output args to greedy search
* Fix critical typo + make style quality
* Handle generate_beam_search
* Add dict_specific tests and fix the placement of encoder outputs
* Add specific outputs
* Update doc
* Fix typo
* Adjust handling encoder_outputs + Fix generating for T5
* Fix generate for RAG
* Fix handling ouptut_attentions when target_mapping is not None
Take care of situations when target_mapping is provided
as there are 2-tuple of attentions
Change from:
if inputs["output_attentions"]:
attentions = tuple(tf.transpose(t, perm(2, 3, 0, 1)) for t in attentions)
to:
if inputs["output_attentions"]:
if inputs["target_mapping"] is not None:
# when target_mapping is provided, there are 2-tuple of attentions
attentions = tuple(
tuple(tf.transpose(attn_stream, perm=(2, 3, 0, 1)) for attn_stream in t) for t in attentions
)
else:
attentions = tuple(tf.transpose(t, perm=(2, 3, 0, 1)) for t in attentions)
* Rename kwargs to model_kwargs
* make style quality
* Move imports in test_modeling_tf_common.py
Move ModelOutput-related imports in test_modeling_tf_common.py
into the `is_tf_available():` statement.
* Rewrite nested if-statements
* Fix added tests
* Flax summarization script (#12230)
* add summrization script
* fix arguments, preprocessing, metrics
* add generation and metrics
* auto model, prediction loop
* prettify
* label smoothing
* adress Sylvain and Patricks suggestions
* dynamically import shift_tokens_right
* fix shift_tokens_right_fn call
* Rewrite ProphetNet to adapt converting ONNX friendly (#11981)
* Rewrite
* [ONNX] rewrite
* Flax T5 (#12150)
* copy pytorch-t5
* init
* boom boom
* forward pass same
* make generation work
* add more tests
* make test work
* finish normal tests
* make fix-copies
* finish quality
* correct slow example
* correct slow test
* version table
* upload models
* Update tests/test_modeling_flax_t5.py
* correct incorrectly deleted line
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Add mention of the huggingface_hub methods for offline mode (#12320)
* [Flax/JAX] Add how to propose projects markdown (#12311)
* fix_torch_device_generate_test
* remove @
* finish
* make style
* [TFWav2Vec2] Fix docs (#12283)
* fix error
* make style check happy
Co-authored-by: chenhaitao <chenhaitao@qiyi.com>
* Clean push to hub API (#12187)
* Clean push to hub API
* Create working dir if it does not exist
* Different tweak
* New API + all models + test Flax
* Adds the Trainer clean up
* Update src/transformers/file_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments
* (nit) output types
* No need to set clone_from when folder exists
* Update src/transformers/trainer.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Add generated_from_trainer tag
* Update to new version
* Fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Add all XxxPreTrainedModel to the main init (#12314)
* Add all XxxPreTrainedModel to the main init
* Add to template
* Add to template bis
* Add FlaxT5
* Conda build (#12323)
* Temporarily revert the `fill-mask` improvements.
* changed modeling_fx_utils.py to utils/fx.py for clarity (#12326)
Co-authored-by: Michael Benayoun <michael@huggingface.co>
* Pin good version of huggingface_hub
* [Flax T5] Fix weight initialization and fix docs (#12327)
* finish t5 flax fixes
* improve naming
* Release: v4.8.0
* v4.9.0.dev0
* Update training_args.py (#12328)
mention in `save_strategy` param description that `load_best_model_at_end` can override
* [Deepspeed] new docs (#12077)
* document sub_group_size
* style
* install + issues reporting
* style
* style
* Update docs/source/main_classes/deepspeed.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* indent 4
* restore
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix default to logging_dir lost in merge conflict
* try-this (#12338)
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
* [examples/Flax] move the examples table up (#12341)
* Fix torchscript tests (#12336)
* Fix torchscript tests
* Better test
* Remove bogus print
* Document patch release v4.8.1
* Add flax/jax quickstart (#12342)
* Update README.md
* fixed typo (#12356)
* Fix exception in prediction loop occurring for certain batch sizes (#12350)
* fix distributed_concat for scalar outputs
* Update README.md
* fixed typo (#12356)
* simplify fix with terser syntax
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Trigger CI
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: michal pitr <21157924+MichalPitr@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add FlaxBigBird QuestionAnswering script (#12233)
* port bigbird script
* adapt script a bit
* change location
* adapt more
* save progress
* init commit
* style
* dataset script tested
* readme add
* Replace NotebookProgressReporter by ProgressReporter in Ray Tune run (#12357)
* Replace NotebookProgressReporter by ProgressReporter in Ray Tune run
* Move to local import
* Style
* remove extra white space from log format (#12360)
* fixed multiplechoice tokenization (#12362)
* fixed multiplechoice tokenization
The model would have seen two sequences:
1. [CLS]prompt[SEP]prompt[SEP]
2. [CLS]choice0[SEP]choice1[SEP]
that is not correct as we want a contextualized embedding of prompt and choice
* removed outer brackets for proper sequence generation
* [trainer] add main_process_first context manager (#12351)
* main_process_first context manager
* handle multi-node, add context description
* sync desc
* [Examples] Replicates the new --log_level feature to all trainer-based pytorch (#12359)
* added log_level
* fix comment
* fixed log_level
* Trigger CI
* Unfied logging
* simplified args for log_level
* updated example template (#12365)
* replace print with logger (#12368)
* [Documentation] Warn that DataCollatorForWholeWordMask is limited to BertTokenizer-like tokenizers (#12371)
* Notify users that DataCollatorForWholeWordMask is limited to BertTokenier-like tokenizers
* Fix code formatting
* Update run_mlm.py (#12344)
Before the code could not be used for validation only because of this line:
extension = data_args.train_file.split(".")[-1]
was assuming that extension must be extracted from the training dataset. This line would run regardless of the training or validation options of the user. This would lead to an error if the user only wants to run an evaluation only and does not want to do train (because the training file does not exist). I modified it to extract extension from the training file if the user wants to do train and extract it from the validation file if the user wants to run eval. This way the code can be used for both training and validation separately.
* Add possibility to maintain full copies of files (#12312)
* [CI] add dependency table sync verification (#12364)
* add dependency table sync verification
* improve the message
* improve the message
* revert
* ready to merge
* [Examples] Added context manager to datasets map (#12367)
* added cotext manager to datasets map
* fixed style and spaces
* fixed warning of deprecation
* changed desc
* [Flax community event] Add more description to readme (#12398)
* fix_torch_device_generate_test
* remove @
* boom boom
* correct typos
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Apply suggestions from code review
Co-authored-by: Suzana Ilić <io.suzanai@gmail.com>
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Suzana Ilić <io.suzanai@gmail.com>
* Update README.md
* Fix copies
* Remove the need for `einsum` in Albert's attention computation (#12394)
* debug albert einsum
* Fix matmul computation
* Let's use torch linear layer.
* Style.
* [Flax] Adapt flax examples to include `push_to_hub` (#12391)
* fix_torch_device_generate_test
* remove @
* finish
* correct summary writer
* correct push to hub
* fix indent
* finish
* finish
* finish
* finish
* finish
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Tensorflow LM examples (#12358)
* Tensorflow MLM example
* Add CLM example
* Style fixes, adding missing checkpoint code from the CLM example
* Fix TPU training, avoid massive dataset warnings
* Fix incorrect training length calculation for multi-GPU training
* Fix incorrect training length calculation for multi-GPU training
* Refactors and nitpicks from the review
* Style pass
* Adding README
* pass the matching trainer log level to deepspeed (#12401)
* [Flax] Add T5 pretraining script (#12355)
* fix_torch_device_generate_test
* remove @
* add length computatan
* finish masking
* finish
* upload
* fix some bugs
* finish
* fix dependency table
* correct tensorboard
* Apply suggestions from code review
* correct processing
* slight change init
* correct some more mistakes
* apply suggestions
* improve readme
* fix indent
* Apply suggestions from code review
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* correct tokenizer
* finish
* finish
* finish
* finish
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* [models] respect dtype of the model when instantiating it (#12316)
* [models] respect dtype of the model when instantiating it
* cleanup
* cleanup
* rework to handle non-float dtype
* fix
* switch to fp32 tiny model
* improve
* use dtype.is_floating_point
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix the doc
* recode to use explicit torch_dtype_auto_detect, torch_dtype args
* docs and tweaks
* docs and tweaks
* docs and tweaks
* merge 2 args, add docs
* fix
* fix
* better doc
* better doc
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Rename detr targets to labels (#12280)
* Rename target to labels in DetrFeatureExtractor
* Update DetrFeatureExtractor tests accordingly
* Improve docs of DetrFeatureExtractor
* Improve docs
* Make style
* Add out of vocabulary error to ASR models (#12288)
* Add OOV error to ASR models
* Feedback changes
* Fix TFWav2Vec2 SpecAugment (#12289)
* Fix TFWav2Vec2 SpecAugment
* Invert masks
* Feedback changes
* [example/flax] add summarization readme (#12393)
* add readme
* update readme and add requirements
* Update examples/flax/summarization/README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [Flax] Example scripts - correct weight decay (#12409)
* fix_torch_device_generate_test
* remove @
* finish
* finish
* correct style
* fix ids_to_tokens naming error in tokenizer of deberta v2 (#12412)
Co-authored-by: Jipeng Huang <jihuan@microsoft.com>
* minor fixes in original RAG training (#12395)
* Added talks (#12415)
* Easily train a new fast tokenizer from a given one (#12361)
* [WIP] Easily train a new fast tokenizer from a given one
* Fix test
* Roll out to other tokenizers and add tests
* Fix bug with unk id and add emoji to test
* Really use something different in test
* Implement special tokens map
* Map special tokens in the Transformers tokenizers
* Fix test
* Make test more robust
* Fix test for BPE
* More robust map and test
Co-authored-by SaulLu
* Test file
* Stronger tests
Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
* Map unk token for Wordpiece and address review comment
* Fix lowercase test and address review comment
* Fix all tests
* Simplify test
* Fix tests for realsies
* Easily train a new fast tokenizer from a given one - tackle the special tokens format (str or AddedToken) (#12420)
* Propose change in tests regarding lower case
* add new test for special tokens types
* put back the test part about decoding
* add feature: the AddedToken is re-build with the different mapped content
* Address review comment: simplify AddedToken building
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
* Update src/transformers/tokenization_utils_fast.py
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* [modelcard] fix (#12422)
this PR is fixing an incorrect attribute - probably some tests are needed?
* Add option to save on each training node (#12421)
* Add option to save on each training node
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Address review comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Added to talks section (#12433)
Added one more confirmed speaker, zoom links and gcal event links
* Fix default bool in argparser (#12424)
* Fix default bool in argparser
* Add more to test
* Add default bos_token and eos_token for tokenizer of deberta_v2 (#12429)
* fix ids_to_tokens naming error in tokenizer of deberta v2
* Update tokenization_deberta_v2.py
Add bos_token and eos_token.
* format code
Co-authored-by: Jipeng Huang <jihuan@microsoft.com>
* Add CANINE (#12024)
* First pass
* More progress
* Add support for local attention
* More improvements
* More improvements
* Conversion script working
* Add CanineTokenizer
* Make style & quality
* First draft of integration test
* Remove decoder test
* Improve tests
* Add documentation
* Mostly docs improvements
* Add CanineTokenizer tests
* Fix most tests on GPU, improve upsampling projection
* Address most comments by @dhgarrette
* Remove decoder logic
* Improve Canine tests, improve docs of CanineConfig
* All tokenizer tests passing
* Make fix-copies and fix tokenizer tests
* Fix test_model_outputs_equivalence test
* Apply suggestions from @sgugger's review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Address some more comments
* Add support for hidden_states and attentions of shallow encoders
* Define custom CanineModelOutputWithPooling, tests pass
* First pass
* More progress
* Add support for local attention
* More improvements
* More improvements
* Conversion script working
* Add CanineTokenizer
* Make style & quality
* First draft of integration test
* Remove decoder test
* Improve tests
* Add documentation
* Mostly docs improvements
* Add CanineTokenizer tests
* Fix most tests on GPU, improve upsampling projection
* Address most comments by @dhgarrette
* Remove decoder logic
* Improve Canine tests, improve docs of CanineConfig
* All tokenizer tests passing
* Make fix-copies and fix tokenizer tests
* Fix test_model_outputs_equivalence test
* Apply suggestions from @sgugger's review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Address some more comments
* Make conversion script work for Canine-c too
* Fix tokenizer tests
* Remove file
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Document patch release v4.8.2
* fix typo in mt5 configuration docstring (#12432)
* Add to talks section (#12442)
* [JAX/Flax readme] add philosophy doc (#12419)
* add philosophy doc
* fix typos
* update doc
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* address Patricks suggestions
* add a training example and fix typos
* jit the training step
* jit train step
* fix example code
* typo
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [Flax] Add wav2vec2 (#12271)
* fix_torch_device_generate_test
* remove @
* start flax wav2vec2
* save intermediate
* forward pass has correct shape
* add weight norm
* add files
* finish ctc
* make style
* finish gumbel quantizer
* correct docstrings
* correct some more files
* fix vit
* finish quality
* correct tests
* correct docstring
* correct tests
* start wav2vec2 pretraining script
* save intermediate
* start pretraining script
* finalize pretraining script
* finish
* finish
* small typo
* finish
* correct
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* make style
* push
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Add missing Copied from statements
* Reference model uploaded under Google org
* Fix various duplicates from merging
* Rembert-large -> rembert, fix overeager Copied from, return type
* Incorporate PR comments from Patrick and Sylvain
Co-authored-by: ctheodoris <seanymphoceana@yahoo.com>
Co-authored-by: ctheodoris <cvtheodo@ds.dfci.harvard.edu>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Teven <teven.lescao@gmail.com>
Co-authored-by: Nick Lane-Smith <nlanesmith@gmail.com>
Co-authored-by: Shiro T <stsuchi@users.noreply.github.com>
Co-authored-by: Wang Ran (汪然) <wrran@outlook.com>
Co-authored-by: Ahmet Akkoç <themadprogramer@gmail.com>
Co-authored-by: francescorubbo <francescorubbo@users.noreply.github.com>
Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com>
Co-authored-by: talkhaldi <tareq.alkhaldi@gmail.com>
Co-authored-by: joerenner <joepeterrenner@gmail.com>
Co-authored-by: jrenner <joseph.renner@inria.fr>
Co-authored-by: Avital Oliver <avitalo@google.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
Co-authored-by: Josh Tanner <mindful.jt@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Bhadresh Savani <bhadreshpsavani@gmail.com>
Co-authored-by: Jayendra <jayendra0parmar@gmail.com>
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Philip May <philip@may.la>
Co-authored-by: Nicholas Vadivelu <nicholas.vadivelu@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Shamane Siri <shamane@ahlab.org>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: Fan Zhang <zhangfan.tju@gmail.com>
Co-authored-by: Riccardo Bassani <48254418+BassaniRiccardo@users.noreply.github.com>
Co-authored-by: Volodymyr Byno <volodymyr.byno@gmail.com>
Co-authored-by: Jeoung-Minju <51041861+JminJ@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Alberto Villa <a.villa.diez@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Gunjan Chhablani <chhablani.gunjan@gmail.com>
Co-authored-by: Kou Yong Kang <kou.yongkang@dhs.sg>
Co-authored-by: Shiva Pundir <36535845+ceevaaa@users.noreply.github.com>
Co-authored-by: François Lagunas <francois.lagunas@gmail.com>
Co-authored-by: Peter Izsak <232524+peteriz@users.noreply.github.com>
Co-authored-by: Russell Klopfer <russell@klopfer.us>
Co-authored-by: Mario Šaško <mariosasko777@gmail.com>
Co-authored-by: cdleong <4109253+cdleong@users.noreply.github.com>
Co-authored-by: Koichi Yasuoka <yasuoka@kanji.zinbun.kyoto-u.ac.jp>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
Co-authored-by: kumapo <kumapo@users.noreply.github.com>
Co-authored-by: Tobias Norlund <tobias@norlund.se>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Bhavitvya Malik <bhavitvya.malik@gmail.com>
Co-authored-by: Jonathan Chang <31893406+cccntu@users.noreply.github.com>
Co-authored-by: Guido Novati <16716298+novatig@users.noreply.github.com>
Co-authored-by: Guido Novati <gnovati@nvidia.com>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
Co-authored-by: Nicholas Broad <nbroad94@gmail.com>
Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
Co-authored-by: Kumar Abhishek <kr.abhish@gmail.com>
Co-authored-by: Kumar Abhishek <kabhishek@expedia.com>
Co-authored-by: Will Rice <will@spokestack.io>
Co-authored-by: Vasudev Gupta <7vasudevgupta@gmail.com>
Co-authored-by: Kilian Kluge <32523967+ionicsolutions@users.noreply.github.com>
Co-authored-by: Amog Kamsetty <amogkam@users.noreply.github.com>
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
Co-authored-by: Xa9aX ツ <mishradiganta91@gmail.com>
Co-authored-by: Vishal Burman <vishal.a.burman23@gmail.com>
Co-authored-by: Hamid Shojanazeri <hamid.nazeri2010@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-81.us-west-2.compute.internal>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: David Fan <30608893+jiafatom@users.noreply.github.com>
Co-authored-by: chenht2010 <chenht2010@yahoo.com>
Co-authored-by: chenhaitao <chenhaitao@qiyi.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Co-authored-by: Michael Benayoun <michael@huggingface.co>
Co-authored-by: Sam Havens <47401552+sam-qordoba@users.noreply.github.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
Co-authored-by: michal pitr <21157924+MichalPitr@users.noreply.github.com>
Co-authored-by: jglaser <glaserj@ornl.gov>
Co-authored-by: Kai Fricke <krfricke@users.noreply.github.com>
Co-authored-by: cronoik <johannes.schaffrath@mail.de>
Co-authored-by: Taha ValizadehAslani <47432410+TahaAslani@users.noreply.github.com>
Co-authored-by: Suzana Ilić <io.suzanai@gmail.com>
Co-authored-by: Funtowicz Morgan <mfuntowicz@users.noreply.github.com>
Co-authored-by: Will Rice <wrice20@gmail.com>
Co-authored-by: Jabin Huang <huangjipengnju@gmail.com>
Co-authored-by: Jipeng Huang <jihuan@microsoft.com>
Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
Co-authored-by: fcakyon <34196005+fcakyon@users.noreply.github.com>
* Proposal
* Testing pipelines slightly better.
- Overall same design
- Metaclass to get proper different tests instead of subTest (not well
supported by Pytest)
- Added ANY meta object to make output checking more readable.
- Skipping architectures either without tiny_config or without
architecture.
* Small fix.
* Fixing the tests in case of None value.
* Oups.
* Rebased with more architectures.
* Fixing reformer tests (no override anymore).
* Adding more options for model tester config.
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
This fixes the padded batch [issue](https://github.com/huggingface/transformers/issues/12282). The error was generated due to the maximum sequence length of the attention mask not matching the padded sequence length of the hidden_states. `np.allclose` now passes with a 1e-2 absolute tolerance.
This change fixes
* Add README_zh-tw.md
* Add links to each README.
* Fix a mismatched term.
* Minor improvements.
* Rename language code to be more inclusive.
* Polish terms to make them fluent.
* Remove redundant spaces.
* Fix typo.
* Base test
* More test
* Fix mistake
* Add a docstring change
* Add doc ignore
* Add changes
* Add recursive dep search
* Add recursive dep search
* save
* Finalize test mapping
* Fix bug
* Print prettier
* Ignore comments and empty lines
* Make script runnable from anywhere
* Need dev install
* Like that
* Adapt
* Add as artifact
* Try on torch tests
* Fix yaml error
* Install GitPython
* Apply everywhere
* Be more defensive
* Revert to all tests if something is wrong
* Install GitPython
* Test if there are tests before launching.
* Fixes
* Fixes
* Fixes
* Fixes
* Bash syntax is horrible
* Be less stupid
* Try differently
* Typo
* Typo
* Typo
* Style
* Better name
* Escape quotes
* Ignore black unhelpful re-formatting
* Not a docstring
* Deal with inits in dependency map
* Run all tests once PR is merged.
* Add last job
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Stronger dependencies gather
* Ignore empty lines too!
* Clean up
* Fix quality
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* We need to provide mask_time_indices to `_mask_hidden_states` to avoid applying the mask two times
* apply the same to wav2vec2
* Uniformize the style between hubert and wav2vec2
* fix tf as well
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* Adding TF translation example
* Fixes and style pass for TF translation example
* Remove unused postprocess_text copied from run_summarization
* Adding README
* Review fixes
* Move changes to model.config to after we've initialized the model
* fix_torch_device_generate_test
* remove @
* correct greedy search
* save intertmed
* add final logits bias
* correct
* up
* add more tests
* fix another bug
* finish tests
* finish marian tests
* up
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Add option to load a pretrained model with mismatched shapes
* Fail at loading when mismatched shapes in Flax
* Fix tests
* Update src/transformers/modeling_flax_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Address review comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Wrong model is used, should be character instead of subword
In the original Google repo for CANINE there was mixup in the model names in the README.md, which was fixed 2 weeks ago. Since this transformer model was created before, it probably resulted in wrong use in this example.
s = subword, c = character
* canine.rst style fix
* Update docs/source/model_doc/canine.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Styling canine.rst
* Added links to model cards.
* Fixed links to model cards.
Co-authored-by: Jeroen Steggink <978411+jsteggink@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* README Translation for Chinese (Simplified)
* update link
* h3->h4
* html refactor
* update model list
* fix
* Add a translation guide
* format
* update
* typo
* Refine wording
* Pass model_kwargs when loading a model in pipeline
* Add test for model_kwargs parameter of pipeline()
* Rewrite test to not download model
* Fix failing style checks
* Base test
* More test
* Fix mistake
* Add a docstring change
* Add doc ignore
* Simplify logic for unk token in Unigram tokenizers
* Remove changes from otehr branch
* This will reduce "Already borrowed error":
Original issue https://github.com/huggingface/tokenizers/issues/537
The original issue is caused by transformers calling many times
mutable functions on the rust tokenizers.
Rust needs to guarantee that only 1 agent has a mutable reference
to memory at a given time (for many reasons which don't need explaining
here). Usually, the rust compiler can guarantee that this property is
true at compile time.
Unfortunately, this is impossible for Python to do that, so PyO3, the
bridge between rust and python used by `tokenizers`, will change the
compile guarantee for a dynamic guarantee, so if multiple agents try
to have multiple mutable borrows at the same time, then the runtime will
yell with "Already borrowed".
The proposed fix here in transformers, is simply to reduce the actual
number of calls that really need mutable borrows. By reducing them,
we reduce the risk of running into "Already borrowed" error.
The caveat is now we add a call to read the current configuration of the
`_tokenizer`, so worst case we have 2 calls instead of 1, and best case
we simply have 1 + a Python comparison of a dict (should be negligible).
* Adding a test.
* trivial error :(.
* Update tests/test_tokenization_fast.py
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* Adding reference to original issues in the tests.
* Update the tests with fast tokenizer.
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* [model.from_pretrained] raise exception early on failed load
Currently if `load` pretrained weights fails in `from_pretrained`, we first print a whole bunch of successful messages and then fail - this PR puts the exception first to avoid all the misleading messages.
* style
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Fixing the pipeline optimization by rescaling the logits first.
* Add test for target equivalence
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Laying down building stone for more flexible ONNX export capabilities
* Ability to provide a map of config key to override before exporting.
* Makes it possible to export BART with/without past keys.
* Supports simple mathematical syntax for OnnxVariable.repeated
* Effectively apply value override from onnx config for model
* Supports export with additional features such as with-past for seq2seq
* Store the output path directly in the args for uniform usage across.
* Make BART_ONNX_CONFIG_* constants and fix imports.
* Support BERT model.
* Use tokenizer for more flexibility in defining the inputs of a model.
* Add TODO as remainder to provide the batch/sequence_length as CLI args
* Enable optimizations to be done on the model.
* Enable GPT2 + past
* Improve model validation with outputs containing nested structures
* Enable Roberta
* Enable Albert
* Albert requires opset >= 12
* BERT-like models requires opset >= 12
* Remove double printing.
* Enable XLM-Roberta
* Enable DistilBERT
* Disable optimization by default
* Fix missing setattr when applying optimizer_features
* Add value field to OnnxVariable to define constant input (not from tokenizers)
* Add T5 support.
* Simplify model type retrieval
* Example exporting token_classification pipeline for DistilBERT.
* Refactoring to package `transformers.onnx`
* Solve circular dependency & __main__
* Remove unnecessary imports in `__init__`
* Licences
* Use @Narsil's suggestion to forward the model's configuration to the ONNXConfig to avoid interpolation.
* Onnx export v2 fixes (#12388)
* Tiny fixes
Remove `convert_pytorch` from onnxruntime-less runtimes
Correct reference to model
* Style
* Fix Copied from
* LongFormer ONNX config.
* Removed optimizations
* Remvoe bad merge relicas.
* Remove unused constants.
* Remove some deleted constants from imports.
* Fix unittest to remove usage of PyTorch model for onnx.utils.
* Fix distilbert export
* Enable ONNX export test for supported model.
* Style.
* Fix lint.
* Enable all supported default models.
* GPT2 only has one output
* Fix bad property name when overriding config.
* Added unittests and docstrings.
* Disable with_past tests for now.
* Enable outputs validation for default export.
* Remove graph opt lvls.
* Last commit with on-going past commented.
* Style.
* Disabled `with_past` for now
* Remove unused imports.
* Remove framework argument
* Remove TFPreTrainedModel reference
* Add documentation
* Add onnxruntime tests to CircleCI
* Add test
* Rename `convert_pytorch` to `export`
* Use OrderedDict for dummy inputs
* WIP Wav2Vec2
* Revert "WIP Wav2Vec2"
This reverts commit f665efb04c92525c3530e589029f0ae7afdf603e.
* Style
* Use OrderedDict for I/O
* Style.
* Specify OrderedDict documentation.
* Style :)
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Adding support for `pipeline("automatic-speech-recognition")`.
- Ugly `"config"` choice for AutoModel. It would be great to have the
possibility to have something like `AutoModelFor` that would implement
the same logic (Load the config, check Architectures and load the first
one)
* Remove `model_id` was not needed in the end.
* Rebased !
* Remove old code.
* Rename `nlp`.
* Validation split percentage to be used for custom data files also
Issue same as https://github.com/huggingface/transformers/issues/12406 fixed for pytorch branch run_mlm.py
* Validation split added in the right place
* Update run_clm.py
* validation split added for custom files
* Validation split added for custom files
* Update run_plm.py
* fixed validation split for custom files as input for pytorch examples in lm
* Update run_clm_no_trainer.py
* args modified
* Copy BART to MBart and rename some stuff
* Add copy statements pointing to FlaxBart
* Update/add some common files
* Update shift_tokens_rigth + fix imports
* Fix shift_tokens_right method according to MBart implementation
* Update shift_tokens_right in tests accordingly
* Fix the import issue and update docs file
* make style quality
* Do some minor changes according to patil-suraj suggestions
* Change the order of normalization layer and attention
* Add some copu statementes
* Update generate method and add integration test for mBart
* Make a few updates after a review
Besides, add `lang_code_to_id` to MBartTokenizeFast
* fix-copies; make style quality
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* fix output type, style
* add copied from
* resolve conflicts
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* fix_torch_device_generate_test
* remove @
* upload
* finish dataset streaming
* adapt readme
* finish
* up
* up
* up
* up
* Apply suggestions from code review
* finish
* make style
* make style2
* finish
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Validation split added: custom data files
Validation split added in case of no validation file and loading custom data
* Updated documentation with custom file usage
Updated documentation with custom file usage
* Update README.md
* Update README.md
* Update README.md
* Made some suggested stylistic changes
* Used logger instead of print.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Made similar changes to add validation split
In case of a missing validation file, a validation split will be used now.
* max_train_samples to be used for training only
max_train_samples got misplaced, now corrected so that it is applied on training data only, not whole data.
* styled
* changed ordering
* Improved language of documentation
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Improved language of documentation
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fixed styling issue
* Update run_mlm.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add philosophy doc
* fix typos
* update doc
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* address Patricks suggestions
* add a training example and fix typos
* jit the training step
* jit train step
* fix example code
* typo
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* First pass
* More progress
* Add support for local attention
* More improvements
* More improvements
* Conversion script working
* Add CanineTokenizer
* Make style & quality
* First draft of integration test
* Remove decoder test
* Improve tests
* Add documentation
* Mostly docs improvements
* Add CanineTokenizer tests
* Fix most tests on GPU, improve upsampling projection
* Address most comments by @dhgarrette
* Remove decoder logic
* Improve Canine tests, improve docs of CanineConfig
* All tokenizer tests passing
* Make fix-copies and fix tokenizer tests
* Fix test_model_outputs_equivalence test
* Apply suggestions from @sgugger's review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Address some more comments
* Add support for hidden_states and attentions of shallow encoders
* Define custom CanineModelOutputWithPooling, tests pass
* First pass
* More progress
* Add support for local attention
* More improvements
* More improvements
* Conversion script working
* Add CanineTokenizer
* Make style & quality
* First draft of integration test
* Remove decoder test
* Improve tests
* Add documentation
* Mostly docs improvements
* Add CanineTokenizer tests
* Fix most tests on GPU, improve upsampling projection
* Address most comments by @dhgarrette
* Remove decoder logic
* Improve Canine tests, improve docs of CanineConfig
* All tokenizer tests passing
* Make fix-copies and fix tokenizer tests
* Fix test_model_outputs_equivalence test
* Apply suggestions from @sgugger's review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Address some more comments
* Make conversion script work for Canine-c too
* Fix tokenizer tests
* Remove file
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [WIP] Easily train a new fast tokenizer from a given one
* Fix test
* Roll out to other tokenizers and add tests
* Fix bug with unk id and add emoji to test
* Really use something different in test
* Implement special tokens map
* Map special tokens in the Transformers tokenizers
* Fix test
* Make test more robust
* Fix test for BPE
* More robust map and test
Co-authored-by SaulLu
* Test file
* Stronger tests
Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
* Map unk token for Wordpiece and address review comment
* Fix lowercase test and address review comment
* Fix all tests
* Simplify test
* Fix tests for realsies
* Easily train a new fast tokenizer from a given one - tackle the special tokens format (str or AddedToken) (#12420)
* Propose change in tests regarding lower case
* add new test for special tokens types
* put back the test part about decoding
* add feature: the AddedToken is re-build with the different mapped content
* Address review comment: simplify AddedToken building
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
* Update src/transformers/tokenization_utils_fast.py
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: SaulLu <lucilesaul.com@gmail.com>
Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
* Tensorflow MLM example
* Add CLM example
* Style fixes, adding missing checkpoint code from the CLM example
* Fix TPU training, avoid massive dataset warnings
* Fix incorrect training length calculation for multi-GPU training
* Fix incorrect training length calculation for multi-GPU training
* Refactors and nitpicks from the review
* Style pass
* Adding README
Before the code could not be used for validation only because of this line:
extension = data_args.train_file.split(".")[-1]
was assuming that extension must be extracted from the training dataset. This line would run regardless of the training or validation options of the user. This would lead to an error if the user only wants to run an evaluation only and does not want to do train (because the training file does not exist). I modified it to extract extension from the training file if the user wants to do train and extract it from the validation file if the user wants to run eval. This way the code can be used for both training and validation separately.
* fixed multiplechoice tokenization
The model would have seen two sequences:
1. [CLS]prompt[SEP]prompt[SEP]
2. [CLS]choice0[SEP]choice1[SEP]
that is not correct as we want a contextualized embedding of prompt and choice
* removed outer brackets for proper sequence generation
* Clean push to hub API
* Create working dir if it does not exist
* Different tweak
* New API + all models + test Flax
* Adds the Trainer clean up
* Update src/transformers/file_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments
* (nit) output types
* No need to set clone_from when folder exists
* Update src/transformers/trainer.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Add generated_from_trainer tag
* Update to new version
* Fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* copy pytorch-t5
* init
* boom boom
* forward pass same
* make generation work
* add more tests
* make test work
* finish normal tests
* make fix-copies
* finish quality
* correct slow example
* correct slow test
* version table
* upload models
* Update tests/test_modeling_flax_t5.py
* correct incorrectly deleted line
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Add output args to greedy search
* Fix critical typo + make style quality
* Handle generate_beam_search
* Add dict_specific tests and fix the placement of encoder outputs
* Add specific outputs
* Update doc
* Fix typo
* Adjust handling encoder_outputs + Fix generating for T5
* Fix generate for RAG
* Fix handling ouptut_attentions when target_mapping is not None
Take care of situations when target_mapping is provided
as there are 2-tuple of attentions
Change from:
if inputs["output_attentions"]:
attentions = tuple(tf.transpose(t, perm(2, 3, 0, 1)) for t in attentions)
to:
if inputs["output_attentions"]:
if inputs["target_mapping"] is not None:
# when target_mapping is provided, there are 2-tuple of attentions
attentions = tuple(
tuple(tf.transpose(attn_stream, perm=(2, 3, 0, 1)) for attn_stream in t) for t in attentions
)
else:
attentions = tuple(tf.transpose(t, perm=(2, 3, 0, 1)) for t in attentions)
* Rename kwargs to model_kwargs
* make style quality
* Move imports in test_modeling_tf_common.py
Move ModelOutput-related imports in test_modeling_tf_common.py
into the `is_tf_available():` statement.
* Rewrite nested if-statements
* Fix added tests
* Optimizing away the `fill-mask` pipeline.
- Don't send anything to the tokenizer unless needed. Vocab check is
much faster
- Keep BC by sending data to the tokenizer when needed. User handling warning messages will see performance benefits again
- Make `targets` and `top_k` work together better `top_k` cannot be
higher than `len(targets)` but can be smaller still.
- Actually simplify the `target_ids` in case of duplicate (it can happen
because we're parsing raw strings)
- Removed useless code to fail on empty strings. It works only if empty
string is in first position, moved to ignoring them instead.
- Changed the related tests as only the tests would fail correctly
(having incorrect value in first position)
* Make tests compatible for 2 different vocabs... (at the price of a
warning).
Co-authored-by: @EtaoinWu
* ValueError working globally
* Update src/transformers/pipelines/fill_mask.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* `tokenizer.vocab` -> `tokenizer.get_vocab()` for more compatiblity +
fallback.
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Replace conditional generation example (fixes#12268)
* Replace model in summarization example with finetuned checkpoint, adapt example text
* Fix typo in new summarization example
* Fix docstring formatting, add missing import statement to example
* registering a buffer for token_type_ids, to pass the error of device-id getting hardcoded when tracing
* sytle format
* adding persistent flag to the resgitered buffers that prevent from adding them to the state_dict and addresses the Backward compatibility issue
* adding the try catch to the fix as persistent flag is only available from PT >1.6
* adding version check
* added the condition to only use the token_type_ids buffer when its autogenerated not passed by user
* adding comments and making the conidtion where token_type_ids are None to use the registered buffer
* taking out position-embeddding from the if block
* adding comments
* handling the case if buffer for position_ids was not registered
* reverted the changes on position_ids, fix the issue with size of token_type_ids buffer, moved the modification for generated token_type_ids to Bertmodel, instead of Embeddings
* reverting the token_type_ids in case of None to the previous version
* reverting changes on position_ids adding back the if block
* changes added by running make fix-copies
* changes added by running make fix-copies and added the import version as it was getting used
* changes added by running make fix-copies
* changes added by running make fix-copies
* fixing the import format
* fixing the import format
* modified to use temp tensor for trimed and expanded token_type_ids buffer
* changes made by fix-copies after temp tensor modifications
* changes made by fix-copies after temp tensor modifications
* changes made by fix-copies after temp tensor modifications
* clean up
* clean up
* clean up
* clean up
* Nit
* Nit
* Nit
* modified according to support device conversion on traced models
* modified according to support device conversion on traced models
* modified according to support device conversion on traced models
* modified according to support device conversion on traced models
* changes based on latest in master
* Adapt templates
* Add version import
Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-81.us-west-2.compute.internal>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* boom boom
* remove flax clip example
* allow loading head model with base model weights
* add test
* fix imports
* disable save, load test for clip
* add test_save_load_to_base
* [run_clm.py] restore caching
* style
* [t5 doc] make the example work out of the box
This PR expands the training example to include the correct model type for the example to work, e.g. with `T5Model` this example will break.
* Update docs/source/model_doc/t5.rst
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* expand the other example
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* AutoTokenizer: infer the class from the tokenizer config if possible
* Add tests
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Temporarily deactivate torch-scatter while we wait for new release
* torch-1.8.1 binary for scatter
* Revert to 1.8.0
* Pin torch dependency
* torchaudio and torchvision
* [WIP] Model card defaults
* finetuned_from default value
* Add all mappings to the mapping file
* Be more defensive on finetuned_from arg
* Add default task tag
* Separate tags from tasks
* Edge case for dataset
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
- Convert use of deprecated AutoModelWithLMHead to AutoModelForSeq2SeqLM
- Add newly required `truncation=True` to `tokenizer.encode` with `max_length`
This silences all warnings.
* [WIP] Add TFWav2Vec2Model
Work in progress for adding a tensorflow version of Wav2Vec2
* feedback changes
* small fix
* Test Feedback Round 1
* Add SpecAugment and CTC Loss
* correct spec augment mask creation
* docstring and correct copyright
* correct bugs
* remove bogus file
* finish tests correction
* del unnecessary layers
* Update src/transformers/models/wav2vec2/modeling_tf_wav2vec2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* make style
* correct final bug
* Feedback Changes
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Use text_column_name variable instead of "text"
`text_column_name` was already defined above where I made the changes and it was also used below where I made changes.
This is a very minor change. If a dataset does not use "text" as the column name, then the `tokenize_function` will now use whatever column is assigned to `text_column_name`. `text_column_name` is just the first column name if "text" is not a column name. It makes the function a little more robust, though I would assume that 90% + of datasets use "text" anyway.
* black formatting
* make style
Co-authored-by: Nicholas Broad <nicholas@nmbroad.com>
* feature for tokenizer without slow/legacy version
* format
* modify common test
* add tests
* add PreTrainedTokenizerFast to AutoTokenizer
* format
* change tokenizer common test in order to be able to run test without a slow version
* update tokenizer fast test in order to use `rust_tokenizer_class` attribute instead of `tokenizer_class`
* add autokenizer test
* replace `if self.tokenizer_class is not None` with ` if self.tokenizer_class is None`
* remove obsolete change in comment
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/tokenization_utils_fast.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* change `get_main_tokenizer` into `get_tokenizers`
* clarify `get_tokenizers` method
* homogenize with `test_slow_tokenizer` and `test_rust_tokenizer`
* add `test_rust_tokenizer = False` to tokenizer which don't define a fast version
* `test_rust_tokenizer = False` for BertJapaneseTokenizer
* `test_rust_tokenizer = False` for BertJapaneseCharacterTokenizationTest
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Start working on FlaxBart
* Create modeling_flax_bart.py
* Write FlaxBartAttention
* Add FlaxBartEncoderLayer
* Add FlaxBartDecoderLayer and some typing
* Add helepr function for FlaxBart
* shift_tokens_right
* _make_causal_mask
* _expand_mask
* Add PositionalEmbedding and fix init_std naming
* Add FlaxBartPretrainedModel
* Add FlaxBartEncoder
* Add FlaxBartEncoder
* Add FlaxBartEncoder among modules to be imported
* YET WE CANNOT INITIALIZE THAT!! :(
* Make BartEncoder working
Change BartEncoder to instance of nn.Module so far
* Add FlaxBartDecoder
* Add FlaxBartModel
* TODO to make model run -> Prepapre model inputs
* Resolve padding
* Add FlaxBartModel
* Add FlaxBartModel into importable modules
* Remove FlaxBartEncoder and FlaxBartDecoder from importable modules
* make style; not properly working
* make style; make quality not pass due to some import I left
* Remove TODO for padding_idx in nn.Embed so far
* Add FlaxBartForConditionalGeneration
* Incorporate Flax model output classes, i.e. return_dict
* Add another models and incorporate use_cache arg
* Add FlaxBartForSequenceClassification and FlaxBartForQuestionAnswering
* Incorporate use_cache arg from PyTorch implementation
* Add all necessary Flax output utils
* Add FlaxBartForCausalLM; not working yet'
* Add minor improvements; still lacks some functionality
* Update docs, src and tests
* Add support of FlaxBart to docs/source
* Fix some bugs in FlaxBart souce code
* Add some neccessary tests for FlaxBart models - jit_compilation not passing
* Fix tests and add test_head_masking
* Fix tests for @jax.jit computation
* Add test_head_masking
* Migrate FlaxBart tests from jax.numpy to numpy
* Remove FlaxBartForCausalLM
* Clean repo
* fix bart model weight structure
* Fix FlaxBartForSequenceClassification
Slicing is not possible to use below jit, therefore, selecting sentence
representation from hidden_states must be changed.
* Allow FlaxBartForSequenceClassification for testing pt_flax equivalence
* Allow testing for FlaxBartForQA for pt_flax equivalence
* Add a comment to FlaxBartForSequenceClassification + change noise from 1e-3 to 1e-6
* remove past_key_values
* remove inputs_mebeds and make input_ids required
* add position ids
* re-write attention layer
* fix dataclass
* fix pos embeds and attention output
* fix pos embeds
* expose encode method
* expose decode method
* move docstring to top
* add cache for causal attn layer
* remove head masking for now
* s2s greedy search first pass
* boom boom
* fix typos
* fix greedy generate for bart
* use encoder, decoder layers instead of num_hidden_layers
* handle encoder_outputs
* cleanup
* simplify decoding
* more clean-up
* typos
* Change header + add {decoder_,}position_ids into 2 models
* add BartConfig
* fix existing tests
* add encode, decode methods
* Fix shift_tokens_right for JIT compilation + clarify one condition
* fix decode
* encoder => encode
* simplify generate
* add tests for encode and decode
* style
* add tests for cache
* fix equivalence tests
* sample generate now works with seq2seq
* generation tests
* initialize dense layers
* docstring and cleanup
* quality
* remove get/set input_embeddings
* address Patricks suggestions
* decode for every model, remove encoder_outputs from call
* update tests accordingly
* decode returns only decoder outputs and logits
* fix arguments
* doc encode, decode methods
* correct base_model_prefix
* fix test for seq classif model
* fix docs
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* add readme for flax clm
* use section link for tokenizer
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* update metrics
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix megatron_gpt2 attention block's causal mask.
* compatibility with checkpoints created with recent versions of Megatron-LM
* added integration test for the released Megatron-GPT2 model
* code style changes
* added option to megatron conversion script to read from config file
Co-authored-by: Guido Novati <gnovati@nvidia.com>
* adding vit for flax
* added test for Flax-vit and some bug-fixes
* overrided methods where variable changes were necessary for flax_vit test
* added FlaxViTForImageClassification for test
* Update src/transformers/models/vit/modeling_flax_vit.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* made changes suggested in PR
* Adding jax-vit models for autoimport
* swapping num_channels and height,width dimension
* fixing the docstring for torch-like inputs for VIT
* add model to main init
* add docs
* doc, fix-copies
* docstrings
* small test fixes
* fix docs
* fix docstr
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* style
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix a condition in test_generate_with_head_masking
* Fix usage of head_mask in bigbirg_pegasus
* Fix head masking for speech2text
* Resolve copy mismatch + drop unwanted print statement
* Fix the condition
* Pushing partially-complete new GLUE example
* First draft of the new TF GLUE example! Needs a little more testing to be sure but it's almost ready.
* Fix to the fit() call
* Bugfixes, making sure TPU and multi-GPU support is ready
* Remove logger line that depends on Pytorch
* Style pass
* Deleting old TF GLUE example
* Include label2id and id2label in the saved model config
* Don't clobber the existing model.config.label2id
* Style fixes
* Update examples/tensorflow/text-classification/run_glue.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Squash all commits of modeling_detr_v7 branch into one
* Improve docs
* Fix tests
* Style
* Improve docs some more and fix most tests
* Fix slow tests of ViT, DeiT and DETR
* Improve replacement of batch norm
* Restructure timm backbone forward
* Make DetrForSegmentation support any timm backbone
* Fix name of output
* Address most comments by @LysandreJik
* Give better names for variables
* Conditional imports + timm in setup.py
* Address additional comments by @sgugger
* Make style, add require_timm and require_vision to testsé
* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone
* Add png files to fixtures
* Fix type hint
* Add timm to workflows
* Add `BatchNorm2d` to the weight initialization
* Fix retain_grad test
* Replace model checkpoints by Facebook namespace
* Fix name of checkpoint in test
* Add user-friendly message when scipy is not available
* Address most comments by @patrickvonplaten
* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner
* Better initialization
* Scipy is necessary to get sklearn metrics
* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel
* Make style
* Improve docs and add 2 community notebooks
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* updated the original RAG implementation to be compatible with the latest PL version
* updated the requirements.txt file
* execute make style
* code quality test
* code quality
* conflix resolved in requirement.txt
* code quality
* changed the MyDDP class name to CustomDDP
* Fixing bug that appears when using distilation (and potentially other uses).
During backward pass Pytorch complains with:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
This happens because the QA model code modifies the start_positions and end_positions input tensors, using clamp_ function: as a consequence the teacher and the student both modifies the inputs, and backward pass fails.
* Fixing all models QA clamp_ bug.
* Fix weight decay masking in `run_flax_glue.py`
Issues with the previous implementation:
- The `dict` from `traverse_util.flatten_dict` has keys which are tuples of strings, not one long string with the path separated by periods.
- `optax.masked` applies the transformation wherever the mask is True, so the masks are flipped.
- Flax's LayerNorm calls the scale parameter `scale` not `weight`
* Fix formatting with black
* adapt results
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* add test_vocab_size for sentencepiece tok.
* add test_get_vocab for sentencepiece tok.
* add test_convert_token_and_id for sentencepiece tok.
* add test_tokenize_and_convert_tokens_to_string for all tok.
* improve test_tokenize_and_convert_tokens_to_string for sp. tok.
* add common tokenizer integration tests
- for albert
- for barthez
* add tokenizer integration tests to bert gen.
* add most tokenizer integration tests
* fix camembert tokenizer integration test
* add tokenizer integration test to marian
* add tokenizer integration test to reformer
* add typing and doc to tokenizer_integration_test_util
* fix tokenizer integration test of reformer
* improve test_sentencepiece_tokenize_and_convert_tokens_to_string
* empty commit to trigger CI
* fix tokenizer integration test of reformer
* remove code not needed anymore
* empty commit to trigger CI
* empty commit to trigger CI
* initial
* code quality test
* code quality
* added test functions in test_modeling_rag.py and test_retrieval_rag.py to test end2end retreiver
* minor change in test_modeling_rag
* fixed tests
* Update examples/research_projects/rag-end2end-retriever/README.md
typo corrected as suggested by lhoestq
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* Update examples/research_projects/rag-end2end-retriever/finetune_rag.py
type change suggested by lhoestq
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* Update src/transformers/models/rag/retrieval_rag.py
Adding this change as mentioned by lhoestq.
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* completed the minor changes suggested by the reviewers
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
* Remove redundant `nn.log_softmax` in `run_flax_glue.py`
`optax.softmax_cross_entropy` expects unnormalized logits, and so it already calls `nn.log_softmax`, so I believe it is not needed here. `nn.log_softmax` is idempotent so mathematically it shouldn't have made a difference.
* Remove unused 'flax.linen' import
* Added logic to return attention from flax-bert model and added test cases to check that
* Added new line at the end of file to test_modeling_flax_common.py
* fixing code style
* Fixing Roberta and Elextra models too from cpoying bert
* Added temporary hack to not run test_attention_outputs for FlaxGPT2
* Returning attention weights from GPT2 and changed the tests accordingly.
* last fixes
* bump flax dependency
Co-authored-by: jayendra <jayendra@infocusp.in>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Adding new argument `max_new_tokens` for generate.
This is a proposal to add a new argument `max_new_tokens` to `generate`.
This include a `MaxNewTokensCriteria` that enables callers that don't
know about the token length ahead (like pipelines callers) to manage
more easily the length of their generated output.
* Adding a test for the user warning when both`max_length` and
`max_new_tokens` are used together.
* Removed redundant `no_grad`.
* changing find_batch_size to work with tokenizer outputs
trainer_pt_utils.find_batch_size does not recognize the batch size of BatchEncoding objects. This can cause an error when a trainer relies on find_batch_size to report the number of observed examples in the evaluation loop.
* Trigger CI
Co-authored-by: jrenner <joseph.renner@inria.fr>
* Correcting comments to reflect correct tuple order
In order to match the actual order (line 513 and 516, and as accessed in 968), I've changed the order mentioned in comments L962 and L966-967.
* Update modeling_t5.py
Updating another comment as well
* Removing extra space
* Fixing style and quality
* style & quality
* Update src/transformers/models/t5/modeling_t5.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix Bart
* Fix Blenderbot{,_small}
* Fix LED
* Fix Marian
* Fix MBart
* Fix Pegasus
* Fix T5
* Add test for generation with head_mask
* Add a common TF test
* Override a test for the LED model as head masking is not yet properly implemented
* Remove all head_masks from input preparation for LED
* Drop masking for T5 as it needs a bit of refactor
The feature extractor does not create tensors on the appropriate device,
so we call `ensure_tensor_on_device` before feeding the processed inputs
to the model.
* [Trainer] Report both steps and num samples per second
* Fix batch number
* Update src/transformers/trainer_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Address review comments
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
get_length_grouped_indices() in LengthGroupedSampler and DistributedLengthGroupedSampler
is prohibitively slow for large number of megabatches (in test case takes hours for ~270k
megabatches with 100 items each) due to slow list concatenation with sum(megabatches, []).
Resolves: #11795
Co-authored-by: ctheodoris <cvtheodo@ds.dfci.harvard.edu>
* add separator for windows
* fixes test_is_copy_consistent on Windows
* fixing writing encoding issue on extended test (for Windows)
* resolving comments
Cleaner and more scalable implementation of symbolic tracing with torch.fx, and provides support for new architectures:
- ALBERT
- DistilBERT
- MobileBERT
- MegatronBERT
- GPT2
- GPT Neo
Co-authored-by: Michael Benayoun <michael@huggingface.co>
* Add missing head masking for generate() function
* Add head_mask, decoder_head_mask and cross_attn_head_mask
into prepare_inputs_for_generation for generate() function
for multiple encoder-decoder models.
* Add test_genereate_with_head_masking
* [WIP] Update the new test and handle special cases
* make style
* Omit ProphetNet test so far
* make fix-copies
* [TokenClassification] Label realignment for subword aggregation
Tentative to replace https://github.com/huggingface/transformers/pull/11622/files
- Added `AggregationStrategy`
- `ignore_subwords` and `grouped_entities` arguments are now fused
into `aggregation_strategy`. It makes more sense anyway because
`ignore_subwords=True` with `grouped_entities=False` did not have a
meaning anyway.
- Added 2 new ways to aggregate which are MAX, and AVERAGE
- AVERAGE requires a bit more information than the others, for now this
case is slightly specific, we should keep that in mind for future
changes.
- Testing has been modified to reflect new argument, and to check the
correct deprecation and the new aggregation_strategy.
- Put the testing argument and testing results for aggregation_strategy,
close together, so that readers can understand what is supposed to
happen.
- `aggregate` is now only tested on a small model as it does not mean
anything to test it globally for all models.
- Previous tests are unchanged in desired output.
- Added a new test case that showcases better the difference between the
FIRST, MAX and AVERAGE strategies.
* Wrong framework.
* Addressing three issues.
1- Tags might not follow B-, I- convention, so any tag should work now
(assumed as B-TAG)
2- Fixed an issue with average that leads to a substantial code change.
3- The testing suite was not checking for the "index" key for "none"
strategy. This is now fixed.
The issue is that "O" could not be chosen by AVERAGE strategy because
those tokens were filtered out beforehand, so their relative scores were
not counted in the average. Now filtering on
ignore_labels will happen at the very end of the pipeline fixing
that issue.
It's a bit hard to make sure this stays like that because we do
not have a end-to-end test for that behavior
* Formatting.
* Adding formatting to code + cleaner handling of B-, I- tags.
Co-authored-by: Francesco Rubbo <rubbo.francesco@gmail.com>
Co-authored-by: elk-cloner <rezakakhki.rk@gmail.com>
* Typo.
Co-authored-by: Francesco Rubbo <rubbo.francesco@gmail.com>
Co-authored-by: elk-cloner <rezakakhki.rk@gmail.com>
* Add 3D attention mask to T5 model (#9643)
Added code for 3D attention mask in T5 model. Similar to BERT model.
* Add test for 3D attention mask
Added test for 3D attention mask: test_decoder_model_past_with_3d_attn_mask()
3D attention mask of the shape [Batch_size, Seq_length, Seq_length] both for
attention mask and decoder attention mask. Test is passing.
* improve slow class tok usage at xlm rob
* add subword regularization for barthez
* improve barthez tok. test
* fix tokenizer tests
* add subword regularization for camembert
* add subword regularization for deberta v2 tokenizer
* add more doc to deberta v2 tokenizer
* add subword regularization for speech to text tok.
* fix sp_model_kwargs type in speech 2 text tok.
* add subword regularization for M2M100 tok.
* add more concrete type hints
* fix tests for m2m100 and s2t tok.
* add missing Any import
* fix syntax error in m2m100 tok.
* fix unpickle of m2m100 and s2t tok.
* fix test of m2m100 and s2t tok.
* improve unpickle of deberta v2 tok.
* add test for pickle of barthez & camembert
* fix pickle of barthez & camembert
* add test for deberta v2 tok. pickle
* fix m2m100 tok. pickle
* fix s2t tok. pickle
* add subword regularization to albert tok.
* refactor subword reg. test into TokenizerTesterMixin
improve albert tok. test
remove sample argument form albert tok.
check subword reg. using TokenizerTesterMixin
improve tok. tests
improve xlm roberta tok. tests
improve xlm roberta tok. tests
* add subword regularization for big bird t.
* improve xlm roberta tok. test
* add subword regularization for mbart50 tok.
* add subword regularization for pegasus tok.
* add subword regularization for reformer tok.
* add subword regularization for T5 tok.
* fix t5 tok. test formatting
* add subword regularization for xlm_proph. tok.
* add subword regularization for xlnet tok.
* add subword regularization for gert_gen tok.
* add typing to tokenizers
* add typing to xlm rob. tok
* add subword regularization for marian tok.
* add reverse tok. test
* fix marian tok test
* fix marian tok test
* fix casing in tok. tests
* fix style of tok. common test
* fix deberta v2 tok test
* add type annotations to tok. tests
* add type annotations to tok. __init__
* add typing to kokenizer
* add type annotations to tok. __init__
* don't specify the default when it's None
* fix barthez tok. doc
* move sentencepiece tok. tests to TokenizerTesterMixin
* fix unused imports
* fix albert tok. test
* add comment to sentencepiece test options
* fix Any import at big bird tok.
* fix Any import at xlm prophetnet tok.
* empty commit to trigger CI
* Improve docs of DeiT and ViT, add community notebook
* Add gitignore for test_samples
* Add notebook with Trainer
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Adds Flax BERT finetuning example
* fix traced jax tensor type
* Use Optax losses and learning schedulers
* Add 1GPU training results
* merge into master & make style
* fix input
* del file
* Fix bug in loss and add torch runs
* finish bert flax fine-tune
* Update examples/flax/text-classification/README.md
* Update examples/flax/text-classification/run_flax_glue.py
* add requirements
* finalize
* finalize
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Autogenerate model cards from the Trainer
* ModelCard deprecated
* Fix test
* Style
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Address review comments
* Quality
* With all metadata
* Metadata
* Post-merge conflict mess
* Data args and all examples
* Default license and languages when possible
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* added fix to decode function. added test to qa pipeline tests
* completed topk docstring
* fixed formatting with black
* applied style_doc to fix line length
* Added Big Bird Fast Tokenizer initial file
* style fixes
* flake fixes
* Added big bird fast tokenizer to init files
* Added big bird fast to Auto tokenization
* fix styles
* minor quality fixes
* Added initial test code
* Fix SpmConverter when precompiled_charsmap doesn't exist
* fixed post processor
* minor style fix
* minor fix input names
* Actually fix identity normalization
* style
* Added token type ids to fast tokenizer
* style
* flake fix
* fix copies
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
* Add the ImageClassificationPipeline
* Code review
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* Have `load_image` at the module level
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* Set generator in dataloader
* Use generator in all random samplers
* Checkpoint all RNG states
* Final version
* Quality
* Test
* Address review comments
* Quality
* Remove debug util
* Add python and numpy RNGs
* Split states in different files in distributed
* Quality
* local_rank for TPUs
* Only use generator when accepted
* Add test
* Set seed to avoid flakiness
* Make test less flaky
* Quality
* add electra model to flax
* Remove Electra Next Sentence Prediction model added by mistake
* fix parameter sharing and loosen equality threshold
* fix styling issues
* add mistaken removen imports
* fix electra table
* Add FlaxElectra to automodels and fixe docs
* fix issues pointed out the PR
* fix flax electra to comply with latest changes
* remove stale class
* add copied from
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add flax roberta
* make style
* correct initialiazation
* modify model to save weights
* fix copied from
* fix copied from
* correct some more code
* add more roberta models
* Apply suggestions from code review
* merge from master
* finish
* finish docs
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Make quality scripts work when one backend is missing.
* Check env variable is properly set
* Add default
* With print statements
* Fix typo
* Set env variable
* Remove debug code
* Rebase with master
* Minor bug fix in docs
* Copy files from adding_luke_v2 and improve docs
* change the default value of use_entity_aware_attention to True
* remove word_hidden_states
* fix head models
* fix tests
* fix the conversion script
* add integration tests for the pretrained large model
* improve docstring
* Improve docs, make style
* fix _init_weights for pytorch 1.8
* improve docs
* fix tokenizer to construct entity sequence with [MASK] entity when entities=None
* Make fix-copies
* Make style & quality
* Bug fixes
* Add LukeTokenizer to init
* Address most comments by @patil-suraj and @LysandreJik
* rename _compute_extended_attention_mask to get_extended_attention_mask
* add comments to LukeSelfAttention
* fix the documentation of the tokenizer
* address comments by @patil-suraj, @LysandreJik, and @sgugger
* improve docs
* Make style, quality and fix-copies
* Improve docs
* fix docs
* add "entity_span_classification" task
* update example code for LukeForEntitySpanClassification
* improve docs
* improve docs
* improve the code example in luke.rst
* rename the classification layer in LukeForEntityClassification from typing to classifier
* add bias to the classifier in LukeForEntitySpanClassification
* update docs to use fine-tuned hub models in code examples of the head models
* update the example sentences
* Make style & quality
* Add require_torch to tokenizer tests
* Add require_torch to tokenizer tests
* Address comments by @sgugger and add community notebooks
* Make fix-copies
Co-authored-by: Ikuya Yamada <ikuya@ikuya.net>
* prep for deepspeed==0.3.16
* new version
* too soon
* support and test fp32 mode
* troubleshooting doc start
* workaround no longer needed
* add fp32 doc
* style
* cleanup, add tf32 note
* clarify
* release was made
* Adding `AutomaticSpeechRecognitionPipeline`.
- Because we added everything to enable this pipeline, we probably
should add it to `transformers`.
- This PR tries to limit the scope and focuses only on the pipeline part
(what should go in, and out).
- The tests are very specific for S2T and Wav2vec2 to make sure both
architectures are supported by the pipeline. We don't use the mixin for
tests right now, because that requires more work in the `pipeline`
function (will be done in a follow up PR).
- Unsure about the "helper" function `ffmpeg_read`. It makes a lot of
sense from a user perspective, it does not add any additional
dependencies (as in hard dependency, because users can always use their
own load mechanism). Meanwhile, it feels slightly clunky to have so much
optional preprocessing.
- The pipeline is not done to support streaming audio right now.
Future work:
- Add `automatic-speech-recognition` as a `task`. And add the
FeatureExtractor.from_pretrained within `pipeline` function.
- Add small models within tests
- Add the Mixin to tests.
- Make the logic between ForCTC vs ForConditionalGeneration better.
* Update tests/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Adding docs + main import + type checking + LICENSE.
* Doc style !.
* Fixing TYPE_HINT.
* Specifying waveform shape in the docs.
* Adding asserts + specify in the documentation the shape of the input
np.ndarray.
* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Adding require to tests + move the `feature_extractor` doc.
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Implement gradient checkpoinging for T5Stack
* A bit more robust type checking
* Add `gradient_checkpointing` to T5Config
* Formatting
* Set requires_grad only when training
* None return value will only cause problems when training
* Change the output tuple according to `use_cache`
* Enable gradient checkpointing for the decoder
Squashed commit of the following:
commit 658bdd0bd1215353a8770f558bda2ea69a0ad0c7
Author: Ceshine Lee <shuanck@gmail.com>
Date: Sat Apr 24 14:08:17 2021 +0800
Only set `require_grad` for gradient checkpointing
commit acaeee6b2e675045fb28ce2176444c1d63e908bd
Author: Ceshine Lee <shuanck@gmail.com>
Date: Sat Apr 24 13:59:35 2021 +0800
Make gradient checkpointing work with the decoder
* Formatting
* removed max_len
* removed max_length from BeamSearchScorer
* correct max length
* finish
* del vim
* finish & add test
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
As the error comes from the inconsistency of variable meaning number of gpus in parser and its actual usage in the train.py script, 'gpus' and 'n_gpu' respectively, the correction makes the example work
* Add cross_attn_head_mask to BART
* Fix cross_attentions in TFBart-like models
* This commit enables returning of `cross_attentions`
for TFBart-like models
* It also fixes attention head masking in cross-attenion module
* Update TF model templates
* Fix missing , in TF model templates
* Fix typo: congig -> config
* removes the creation of separate config objects and uses the existing ones instead+overwrite resize_token_embeddings from parent class because it is not working for the EncoderDecoderModel
* rollback to current version of the huggingface master branch
* reworked version that ties the encoder and decoder config of the parent encoderdecoder instance
* overwrite of resize_token_embeddings throws an error now
* review comment suggestion
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* implemented warning in case encoderdecoder is created with differing configs of encoderdecoderconfig and decoderconfig or encoderconfig
* added test to avoid diverging configs of wrapper class and wrapped classes
* Update src/transformers/models/encoder_decoder/modeling_encoder_decoder.py
* make style
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add head_mask & decoder_head_mask + some corrections
* Fix head masking for N-grams
* Enable test_headmasking for encoder and decod
* Fix one typo regarding in modeling_propgetnet.py
* Enable test_headmasking for ProphetNetStandaloneDecoderModelTest
and ProphetNetStandaloneEncoderModelTest in test_modeling_prophetnet.py
* make style
* Fix cross_head_mask
* Fix attention head mask naming
* `cross_head_mask` -> `cross_attn_head_mask`
* `cross_layer_head_mask` -> `cross_attn_layer_head_mask`
* Still need to merge #10605 to master to pass the tests
* Fix cross-attention head mask for Torch BART models
* Fix head masking for cross-attention module for the following
models: BART, Blenderbot, Blenderbot_small, M2M_100, Marian, MBart,
Pegasus
* Enable test_headmasking for M2M_100 model
* Fix cross_head_mask for FSMT, LED and T5
* This commit fixes `head_mask` for cross-attention modules
in the following models: FSMT, LED, T5
* It also contains some smaller changes in doc so that
it is be perfectly clear the shape of `cross_head_mask`
is the same as of `decoder_head_mask`
* Update template
* Fix template for BartForCausalLM
* Fix cross_head_mask for Speech2Text models
* Fix cross_head_mask in templates
* Fix args order in BartForCausalLM template
* Fix doc in BART templates
* Make more explicit naming
* `cross_head_mask` -> `cross_attn_head_mask`
* `cross_layer_head_mask` -> `cross_attn_layer_head_mask`
* Fix doc
* make style quality
* Fix speech2text docstring
* Initial support for upload to hub
* push -> upload
* Fixes + examples
* Fix torchhub test
* Torchhub test I hate you
* push_model_to_hub -> push_to_hub
* Apply mixin to other pretrained models
* Remove ABC inheritance
* Add tests
* Typo
* Run tests
* Install git-lfs
* Change approach
* Add push_to_hub to all
* Staging test suite
* Typo
* Maybe like this?
* More deps
* Cache
* Adapt name
* Quality
* MOAR tests
* Put it in testing_utils
* Docs + torchhub last hope
* Styling
* Wrong method
* Typos
* Update src/transformers/file_utils.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>
* Address review comments
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* MOD: fit chinese wwm to new datasets
* MOD: move wwm to new folder
* MOD: formate code
* Styling
* MOD add param and recover trainer
* MOD: add token_type_ids method for big bird
* MOD: format code
* MOD: format code
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
* Pass metric_key_prefix as kwarg to on_evaluate
* Replace eval_loss with metric_key_prefix_loss
* Default to "eval" if metric_key_prefix not in kwargs
* Add kwargs to CallbackHandler.on_evaluate signature
* Revert "Add kwargs to CallbackHandler.on_evaluate signature"
This reverts commit 8d4c85ed512f558f7579d36771e907b3379947b7.
* Revert "Pass metric_key_prefix as kwarg to on_evaluate"
This reverts commit 7766bfe2718601230ae593d37b1317bd53cfc075.
* Extract metric_key_prefix from metrics
* Base move
* Examples reorganization
* Update references
* Put back test data
* Move conftest
* More fixes
* Move test data to test fixtures
* Update path
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments and clean
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Removed `max_length` from being mandatory within `generate`.
- Moving on to fully using `StoppingCriteria` for `greedy` and `sample`
modes.
- `max_length` still used for `beam_search` and `group_beam_search`
(Follow up PR)
- Fixes a bug with MaxLengthStoppingCriteria (we should stop as soon a
we hit the max_length, the comparison needs to be or equal, that affects
the tests).
- Added options to use `logits_processor` and `stopping_criteria`
directly within `generate` function (so some users can define their own
`logits_processor` and `stopping_criteria`).
- Modified the backward compat tests to make sure we issue a warning.
* Fix `max_length` argument in `generate`.
* Moving validate to being functional.
- Renamed `smax_length` to `stoppping_max_length`.
* Removing `logits_processor` and `stopping_criteria` from `generate`
arguments.
* Deepcopy.
* Fix global variable name.
* initial changes
* modified evaluation
* updated evaluation
* updated evaluation on text translation example script
* added translation example script
* Formatted translation example script
* Reformatted translation example
* Fixed evaluation bug and added support for other tokenisers
* Fixed evaluation bug and added support for other tokenisers
* Added translation example script
* Formatted summarization example script
* Removed typos from summarization example script
* Update language_modeling.py
in "class TextDatasetForNextSentencePrediction(Dataset)", double considering "self.tokenizer.num_special_tokens_to_add(pair=True)"
so, i remove self.block_size, and add parameter for "def create_examples_from_document". like "class LineByLineWithSOPTextDataset" do
* Update language_modeling.py
* Bulk of the work
* Polish and tests
* Update QA Trainer
* Avoid breaking the predict method
* Deprecation warnings
* Store real eval dataloder
* Get eval dataset reference before wrap
* [WIP] Enabling multilingual models for translation pipelines.
* decoder_input_ids -> forced_bos_token_id
* Improve docstring.
* Rebase
* Fixing 2 bugs
- Type token_ids coming from `_parse_and_tokenize`
- Wrong index from tgt_lang.
* Fixing black version.
* Adding tests for _build_translation_inputs and add them for all
tokenizers.
* Mbart actually puts the lang code at the end.
* Fixing m2m100.
* Adding TF support to `deep_round`.
* Update src/transformers/pipelines/text2text_generation.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Adding one line comment.
* Fixing M2M100 `_build_translation_input_ids`, and fix the call site.
* Fixing tests + deep_round -> nested_simplify
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Start writing BERT-Japanese doc
* Fix typo, Update toctree
* Modify model file to use comment for document, Add examples
* Clean bert_japanese by make style
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Split a big code block into two
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add prefix >>> to all lines in code blocks
* Clean bert_japanese by make fixup
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* First draft of deit
* More improvements
* Remove DeiTTokenizerFast from init
* Conversion script works
* Add DeiT to ViT conversion script
* Add tests, add head model, add support for deit in vit conversion script
* Update model checkpoint names
* Update image_mean and image_std, set resample to bicubic
* Improve docs
* Docs improvements
* Add DeiTForImageClassificationWithTeacher to init
* Address comments by @sgugger
* Improve feature extractors
* Make fix-copies
* Minor fixes
* Address comments by @patil-suraj
* All models uploaded
* Fix tests
* Remove labels argument from DeiTForImageClassificationWithTeacher
* Fix-copies, style and quality
* Fix tests
* Fix typo
* Multiple docs improvements
* More docs fixes
* Added documentation for data collator.
* Update docs/source/data_collator.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Added documentation for data collator.
* Added documentation for the data collator.
* Merge branch 'doc_DataCollator' of C:\Users\mahii\PycharmProjects\transformers with conflicts.
* Update documentation for the data collator.
* Update documentation for the data collator.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Amna <A.A.Ahmad@student.tudelft.nl>
* Add a special tokenizer for CPM model
* make style
* fix
* Add docs
* styles
* cpm doc
* fix ci
* fix the overview
* add test
* make style
* typo
* Custom tokenizer flag
* Add REAMDE.md
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Change duplicated LogitsProcessor to LogitsWarper in LogitsProcessorList document
* Write more detailed information about LogitsProcessor's scores argument
* apply suggestion from review
* style
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* added model_kwargs to infer_framework_from_model
* added model_kwargs to tokenizer
* added use_auth_token as named parameter
* added dynamic get for use_auth_token
* AutoFeatureExtractor
* Init and first tests
* Tests
* Damn you gitignore
* Quality
* Defensive test for when not all backends are here
* Use pattern for Speech2Text models
* better names
* add attention mixin
* all slow tests in one class
* make helper methods static so we can test
* add local attention tests
* better names
* doc
* apply review suggestions
It doesn't look like using 🤗 is a great idea for printing to console. See attachment.
This PR proposes to replace 🤗 with "HuggingFace" for an exception message.
@LysandreJik
* Replace pkg_resources with importlib_metadata
Fixes#10964. The other reason for this change is that pkg_resources has been [deprecated](8fe85c22ce) in favor of importlib_metadata.
* Reduce to a single importlib_metadata import switch
* Trigger CI
Co-authored-by: Stas Bekman <stas@stason.org>
* Squash all commits into one
* Update ViTFeatureExtractor to use image_utils instead of torchvision
* Remove torchvision and add Pillow
* Small docs improvement
* Address most comments by @sgugger
* Fix tests
* Clean up conversion script
* Pooler first draft
* Fix quality
* Improve conversion script
* Make style and quality
* Make fix-copies
* Minor docs improvements
* Should use fix-copies instead of manual handling
* Revert "Should use fix-copies instead of manual handling"
This reverts commit fd4e591bce4496d41406425c82606a8fdaf8a50b.
* Place ViT in alphabetical order
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Replace is_sagemaker_distributed_available
* Merge SageMakerTrainer into Trainer
* Test with shorter condition
* Put back deleted line
* Deprecate SageMakerTrainer and SageMakerTrainingArguments
* Apply suggestions from code review
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
a fully qualified model.
We simply forgot to change the call for this one when this landed:
https://github.com/huggingface/transformers/pull/10888
It's odd that tests didn't catch that. Should we add some ?
(It's a pretty edgy test case, but it does run within the API).
* init
* first working test
* added todo for setup.py
* working test for single node multi node ddp and smd
* added tensorflow single node test
* added directory for pytorch and tensorflow due to different requirements.txt
* added directory for pytorch and tensorflow
* added comment for run_glue until it is available
* added output_dir to it
* smaller dataset to make test running faster
* adjust HP and script
* adjusted parameter for tensorflow
* refactored test scripts
* adjusted make file
* init
* first working test
* added todo for setup.py
* working test for single node multi node ddp and smd
* added tensorflow single node test
* added directory for pytorch and tensorflow due to different requirements.txt
* added directory for pytorch and tensorflow
* added comment for run_glue until it is available
* added output_dir to it
* smaller dataset to make test running faster
* adjust HP and script
* adjusted parameter for tensorflow
* refactored test scripts
* adjusted make file
* updated dlc container
* commented in all tests
* added both ecr images
* added new master branches
* debug
* added new datasets version
* init
* strange rebase bug
* removed changes
* changed min version for tests to work
* updated DLC
* added model parallel test
* removed test files
* removed test files
* tested with ned dlc
* added correct sagemaker sdk version
* adjust DLCs for official one
* reworked tests
* quality
* removed default profile added documentation to it
* added step in release for sagemaker tests
* reverted version for example script removed duplicated script and added install from master to requirements.txt
* removed mistaken .DS_Stores from mac
* fixed tests
* added Sylvains feedback
* make style
* added lysandre's feedback
* Initial commit
* Another bunch of updates
* make style quliaty + delete debug arg from bash script
* Use compue_metrics func
* Do a few fixes
* Add copyright
* Fix typos
A new argument `length_column_name` has been added to
`TrainingArguments`, with default value `"length"`. If this column
exists and `group_by_length` is `True`, the train sampler will use
it for grouping rather than computing it before training starts.
This is an optimization that allows the user to prepare data for fast
processing, preventing sequential access to the dataset as described in
issue #10909.
* Add NER example with accelerate library
* This commit contains the first (yet really unfinished)
version of a script for showing how to train HuggingFace model
with their new accelerate library.
* Fix metric calculation
* make style quality
* mv ner_no_trainer to token-classification dir
* Delete --debug flag from running script
* hf_datasets -> raw_datasets
* Make a few slight adjustments
* Add an informative comment + rewrite a help comment
* Change header
* Fix a few things
* Enforce to use fast tokenizers only
* DataCollatorWithPadding -> DataCollatorForTokenClassification
* Change bash script: python3 -> accelerate launch
* make style
* Add a few missing things (see below)
* Add a max-lenghth padding to predictions and labels to
enable accelerate gather functionality
* Add PyTorch no trainer example to the example README.md
* Remove --do-train from args as being redundant for now
* DataCollatorWithPadding -> DataCollatorForTokenClassification
* Remove some obsolete args.do_train conditions from the script
* Delete --do_train from bash running script
* Delete use_slow_tokenizer from args
* Add unintentionally removed flag --label_all_tokens
* Delete --debug flag from running script
* Added embeddings layer
* Added layoutlm layers, main model, maskedlm and token classification classes
* Added model classes to tf auto models
* Added model to PT to TF conversion script
* Added model to doc README
* Added tests
* Removed unused imports
* Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py
* Made tests pass!
* Fixed typos in imports and docs
* Fixed a typo in embeddings layer
* Removed imports
* Fixed formatting issues, imports, tests
* Added layoutlm layers, main model, maskedlm and token classification classes
* Added model classes to tf auto models
* Added model to PT to TF conversion script
* Removed unused imports
* Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py
* Made tests pass!
* Fixed typos in imports and docs
* Removed imports
* Fixed small formatting issues
* Removed duplicates import from main __init__.py
* Chnaged deafult arg to true for adding pooling layer to tf layoutlm
* Fixed formatting issues
* Style
* Added copied from to classes copied from bert
* Fixed doc strings examples to work with layoutlm inputs
* Removed PyTorch reference in doc strings example
* Added integration tests
* Cleaned up initialization file
* Updated model checkpoint identifiers
* Fixed imports
Co-authored-by: Amir Tahmasbi <amir@ehsai.ca>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
the orignal code in line 246 is
```
tokenizer: Optional["PreTrainedTokenizerBase"] = None,
```
it should be
```
tokenizer: Optional[PreTrainedTokenizerBase] = None,
```
* Modify the _hp_search_setup method on the Trainer class to handle the wandb argument passed by Ray Tune to model config.
* Reformat single quotes as double quotes.
* Initial script
* Add script to properly sort imports in init.
* Add to the CI
* Update utils/custom_init_isort.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Separate scripts that change content from quality
* Move class_mapping_update to style_checks
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* fix backend tokenizer args override: key mismatch
* no touching the docs
* fix mpnet
* add mpnet to test
* fix test
Co-authored-by: theo <theo@matussie.re>
* [examples/seq2seq] fix t5 examples
This PR:
* fixes T5 examples to include `--source_prefix` - it's **not** optional. If you give it a try you will see that you get 10x worse bleu scores w/o it. w/ `27.6849`, w/ `2.374`
* added a normal translation example w/o the peculiarities of MBart and T5
* reduces the default max samples to 50 so it's much faster to test quickly
summarization seems to be broken for t5 score-wise: https://github.com/huggingface/transformers/issues/10733
@sgugger
* specify explicitly the t5 models requiring the special handling
* one more
* update the t5 summarization example to use cnn_dailymail
* move max*samples into the top level README.md
* better wording
* better wording
* Added check to ensure model name passed to from_pretrained and model are the same
* Added test to check from_pretrained throws assert error when passed an incompatiable model name
* Modified assert in from_pretrained with f-strings. Modified test to ensure desired assert message is being generated
* Added check to ensure config and model has model_type
* Fix FlauBERT heads
Co-authored-by: vimarsh chaturvedi <vimarsh chaturvedi>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* add initial script
* finish script
* add shell script example
* accept chars_to_ignor as cl arg
* align the script with other example scripts
* add torchaudio dep
* wav2vec2: support datasets other than LibriSpeech
* Formatting run_asr.py to pass code quality test
* bundled orthography options and added verbose logs
* fixing a typo in timit fine-tuning script
* update comment for clarity
* resize_lm_head and load custom vocab from file
* adding a max_duration_in_seconds filter
* do not assign `duration_filter` lambda, use a def
* log untransliterated text as well
* fix base model for arabic
* fix duration filter when target_sr is not set
* drop duration_in_seconds when unneeded
* script for wav2vec2-large-lv60-timit-asr
* fix for "tha" in arabic corpus (huggingface#10581)
* adding more options to work with common_voice
* PR feedback (huggingface#10581)
* small README change
* Create modeling_flax_eletra with code copied from modeling_flax_bert
* Add ElectraForMaskedLM and ElectraForPretraining
* Add modeling test for Flax electra and fix naming and arg in Flax Electra model
* Add documentation
* Fix code style
* Create modeling_flax_eletra with code copied from modeling_flax_bert
* Add ElectraForMaskedLM and ElectraForPretraining
* Add modeling test for Flax electra and fix naming and arg in Flax Electra model
* Add documentation
* Fix code style
* Fix code quality
* Adjust tol in assert_almost_equal due to very small difference between model output, ranging 0.0010 - 0.0016
* Remove redundant ElectraPooler
* save intermediate
* adapt
* correct bert flax design
* adapt roberta as well
* finish roberta flax
* finish
* apply suggestions
* apply suggestions
Co-authored-by: Chris Nguyen <anhtu2687@gmail.com>
* Added debug prints
* Added config
* Added prints
* Added prints
* Added extra samples to SequentialDistributedSampler
* Added extra samples to SequentialDistributedSampler
Updated SequentialDistributedSampler call
* Added deubg prints
* Removed extra prints
* Making predicitons and labels multiple of batchsize
* updated number of microbatches
* Removed extra prints
* Made start_remainder similar to DistributedSamplerWithLoop
* Minor spacing update
* Added debug prints
Added config
Added prints
Added prints
* Added extra samples to SequentialDistributedSampler
Updated SequentialDistributedSampler call
Added extra samples to SequentialDistributedSampler
Added deubg prints
Removed extra prints
Making predicitons and labels multiple of batchsize
updated number of microbatches
Removed extra prints
Squashing redundant commits
* Made start_remainder similar to DistributedSamplerWithLoop
Minor spacing update
Made start_remainder similar to DistributedSamplerWithLoop
* Test and styling
* Rename test
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
* Apply black before checking copies
* Fix for class methods
* Deal with lonely brackets
* Remove debug and add forward changes
* Separate copies and fix test
* Add black as a test dependency
* [Issue template] need to update/extend who to tag
1. need to update who to tag for `tensorflow`
2. also requesting to add someone to tag for models hub issues - perhaps separate sub-entries for UI and code - e.g. I don't know who to tag for broken models: https://github.com/huggingface/transformers/issues/10726
Thanks.
* model hub instructions
* s/jplu/LysandreJik/
* [doc] [testing] extend -k section
This PR adds more examples on using `pytest -k` - I always forget that I want to use `-k A OR B` when I want several tests - I keep trying AND and it doesn't match any.
* style
* pass hf optimizer and scheduler to deepspeed if not specified in ds config
* pass hf optimizer and scheduler to deepspeed if not specified in ds config
* update
* make init_deepspeed support config dict
* fix docstring formatting
* clean up trainer's comments
* add new tests
* fix type
* composit argparse doesn't work
* style
* add a new test, rename others
* document new functionality
* complete tests, add docs
* style
* correct level
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add new methods to the doc
* must tell DS we are using a non-native optimizer
* add protection against cpu_offload + HF optimizer combo
* fix the cli overrides
* sync docs + tests
* restore AdamW
* better docs
* need new version
* no longer needed
* remove outdate information
* refactor duplicated code
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Examples version update
* Refactor a bit
* All version updates
* Fixes
* README cleanup
* Post-release/patch
* Fixes
* More fixes
* Tests
* More fixes
* Moar fixes
* Make commands and update setup
* Replace spaces with weird tabs
* Fix test
* Style
* Tests run on Docker
Co-authored-by: Morgan <funtowiczmo@gmail.com>
* Comments from code review
* Reply to itself
* Dependencies
Co-authored-by: Morgan <funtowiczmo@gmail.com>
* Update super class reference
* Update default value reference
* Update src/transformers/models/wav2vec2/feature_extraction_wav2vec2.py
* Fix format style
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [WIP] Adding new parameter to `generate`: `max_time`.
Generation by tokens number is sometimes a bit clunky because we don't
know how many tokens are good enough or even how many tokens are in
the payload (for pipelines users for instance). This leads to hard
to understand behavior.
This PR proposes a new argument `max_time` which is a float of seconds
for the allowed time for `generate` to run on.
Ideally combinations of `max_tokens=None`, `max_time=2` could be used to
generate as many tokens as possible within time budget.
NB: Another possible approach consists of passing a callback to `generate`
putting the caller in charge of the actual decision of when to stop
generating tokens. It opens the door to 'which args should we pass'
to this callback. It's hard to imagine other use-cases for this
early stopping behavior than time (that are not already covered by
parameters of generate)
* Revamp with StoppingCriteria
* Removing deprecated mentions.
* Forgot arguments to stopping criteria.
* Readding max_length it's not just used as a stopping criteria.
* Default value for `stopping_criteria`.
* Address @patrickvonplaten comments.
- More docstrings
- Actual doc
- Include in global namespace
- Remove TF work.
* Put back `max_length` (deprecation different PR).
* Doc quality.
* Fixing old behavior without `stopping_criteria` but with `max_length`.
Making sure we don't break that in the future.
* Adding more tests for possible inconsistencies between
`max_length` and `stopping_criteria`.
* Fixing the torch imports.
* Allow to pass kwargs to model's from_pretrained when using pipeline.
* Disable the use of past_keys_values for GPT2 when exporting to ONNX.
* style
* Remove comment.
* Appease the documentation gods
* Fix style
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Remove special path for custom vocab files
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Expand error message
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* renamed logging to hf_logging
* changed logging from hf_logging to logging and loggin to native_logging
* removed everything trying to fix import Trainer error
* adding imports again
* added custom add_handler function to logging.py
* make style
* added remove_handler
* added another conditional to assert
* added sm to ua
* update id
* removed id
* removed comments
* added env variable
* changed variable name
* make quality happy
* added sguggers feedback
* make styling happy and remove brackets
* added sm to ua
* update id
* removed id
* removed comments
* added env variable
* changed variable name
* make quality happy
* added sguggers feedback
* make styling happy and remove brackets
* Create modeling_tf_dpr.py
* Add TFDPR
* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot
last commit accidentally deleted these 4 lines, so I recover them back
* Add TFDPR
* Add TFDPR
* clean up some comments, add TF input-style doc string
* Add TFDPR
* Make return_dict=False as default
* Fix return_dict bug (in .from_pretrained)
* Add get_input_embeddings()
* Create test_modeling_tf_dpr.py
The current version is already passed all 27 tests!
Please see the test run at :
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing
* fix quality
* delete init weights
* run fix copies
* fix repo consis
* del config_class, load_tf_weights
They shoud be 'pytorch only'
* add config_class back
after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion
* newline after .. note::
* import tf, np (Necessary for ModelIntegrationTest)
* slow_test from_pretrained with from_pt=True
At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug
* Add simple TFDPRModelIntegrationTest
Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet
* upload correct tf model
* remove position_ids as missing keys
* create modeling_tf_rag
* add tests for tf
* add tf tests
* revert wrong pt commit
* further refactor
* further refactor
* refactor
* Update modeling_tf_rag.py
- input_processing
- fix prepare_input_for_generation (mostly fix generate bug)
- bring back from_pretrained hack in order to test generate
* delete colab pieces of code
* Show case of greedy "generate"
Temporarily change from beam_search test to greedy_search test to show case that TF and PT do get equivalent output.
* cosmetic update
* correct typos
* update
* push some progress
* make easy check
* fix rag save from pretrained
* Update src/transformers/modeling_tf_utils.py
* remove commented out lines
* delete unnecessary lines
* add simple test case for nq_checkpoint
Add nq_checkpoint test to show that current version without hack still fails
* temporarily put ugly hack back again
* Add TFRagSequenceForGeneration!!
* __init__.py , import TFRagSequenceForGeneration
* Add TFRagSequence tests!
* rag init.py - add TFRagSequenceForGeneration
* fix from_pretrained
* fix prepare_inputs_for_generation
* Beam search for RagToken!
* minor clean up
* add tf.cast in TFRagModel
* More tf.cast
* Add all remaining tests (still have issues)
* delete all T5 related
* make style
* fix load weight prefix
* fix bart
* fix return_dict for tf_rag
make all tests pass .. Hooray
* fix some tests
* fix code quality
* fix qualtiy check
* finish tests tf rag
* add tf rag to docs
* remove TFT5 from docstring
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* remove TFT5 from docstring
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Delete outdated comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* improve doc strings
* add generative model classes
* fix adjust token logic
* refactor generate for TFRag
* using shape_list, not _get_shape
Co-authored-by: Julien Plu <plu.julien@gmail.com>
* axis=[1]->axis=1
* delete NEED_HELP comment
* improve readability
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* improve readability
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* improve readability
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Indicating model is in a developing state in docstrings
As suggested by Julien
* small last changes
* apply sylvains suggestions
* finish tf rag
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix Marian decoding
Tokenizer's decode and batch_decode now accepts a new argument (use_source_tokenizer) which indicates whether the source spm should be used to decode ids. This is useful for Marian models specificallly when decoding source input ids.
* Adapt docstrings
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
* offline mode start
* add specific values
* fix fallback
* add test
* better values check and range
* test that actually works
* document the offline mode
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* more strict check
* cleaner test
* pt-only test
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Refactor checkpoint name in ALBERT and ALBERT_tf
* Refactor checkpoint name in BART and BART_tf
* Refactor checkpoint name in BERT generation
* Refactor checkpoint name in Blenderbot_tf
* Refactor checkpoint name in Blenderbot_small_tf
* Refactor checkpoint name in ConvBERT AND CONVBERT_TF
* Refactor checkpoint name in CTRL AND CTRL_TF
* Refactor checkpoint name in DistilBERT AND DistilBERT_TF
* Refactor checkpoint name in DistilBERT redo
* Refactor checkpoint name in Electra and Electra_tf
* Refactor checkpoint name in FlauBERT and FlauBERT_tf
* Refactor checkpoint name in FSMT
* Refactor checkpoint name in GPT2 and GPT2_tf
* Refactor checkpoint name in IBERT
* Refactor checkpoint name in LED and LED_tf
* Refactor checkpoint name in Longformer and Longformer_tf
* Refactor checkpoint name in Lxmert and Lxmert_tf
* Refactor checkpoint name in Marian_tf
* Refactor checkpoint name in MBART and MBART_tf
* Refactor checkpoint name in MobileBERT and MobileBERT_tf
* Refactor checkpoint name in mpnet and mpnet_tf
* Refactor checkpoint name in openai and openai_tf
* Refactor checkpoint name in pegasus_tf
* Refactor checkpoint name in reformer
* Refactor checkpoint name in Roberta and Roberta_tf
* Refactor checkpoint name in SqueezeBert
* Refactor checkpoint name in Transformer_xl and Transformer_xl_tf
* Refactor checkpoint name in XLM and XLM_tf
* Refactor checkpoint name in XLNET and XLNET_tf
* Refactor checkpoint name in BERT_tf
* run make tests, style, quality, fixup
* feat: Adds three definitions to glossary from @cronoik
Needed a definition for transformer which in turn needed 2 more definitions
To do with issue https://github.com/huggingface/transformers/issues/9078
* fix: Adjusts definition of neural network to make it easier to read
* Introduce save_strategy training argument
* deprecate EvaluationStrategy
* collapse EvaluationStrategy and LoggingStrategy into a single
IntervalStrategy enum
* modify tests to use modified enum
* Fix None in add_token_positions - issue #10210
Fix None in add_token_positions related to the issue #10210
* add_token_positions fix None values in end_positions vector
add_token_positions fix None in end_positions vector as proposed by @joeddav
* Assumption of padding_idx <2 might not stand
* Use offset instead of 2
* Fix with black
* Change behavior to warning instead for backward compatibility.
* Fix with black
* Remove warning
* Make padding_idx non-required
* padding_idx fix for blenderbot
* padding_idx fix for blenderbot_small
* padding_idx fix for led
* padding_idx fix for mbart
* Remove extra whitespaces
* padding_idx fix for template
* Fix padding_idx passed to nn.Embedding mistake
* Fixed padding_idx passed to positional embedding in template
* Remove padding_idx from pytorch learned positional embeddings
* Remove accidentally added quotes
* Remove padding_idx from tf learned positional embeddings
* Remove zeroing of weights in __init__
Co-authored-by: Wang Ming Rui <mingrui.wang@C02CJTUYMD6M.local>
* convbert conversion test
* fin
* fin
* fin
* clean up tf<->pt conversion
* remove from_pt
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
Enhance resume_from_checkpoint argument of Trainer.train to accept
bool type. If True given, last saved checkpoint in self.args.output_dir
will be loaded. (#10280)
* [trainer] fix ignored columns logger
This PR fixes a confusing log entry that says:
```
The following columns in the evaluation set don't have a corresponding argument in `T5ForConditionalGeneration.forward` and have been ignored: .
```
when everything is in order.
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix run_seq2seq.py; porting DeepSpeed tests to it
* unrefactor
* defensive programming
* defensive programming 2
* port the rest of the trainer tests
* style
* a cleaner scripts dir finder
* cleanup
* Add check-ops script
* Finish to implement check_tf_ops and start the test
* Make the test mandatory only for BERT
* Update tf_ops folder
* Remove useless classes
* Add the ONNX test for GPT2 and BART
* Add a onnxruntime slow test + better opset flexibility
* Fix test + apply style
* fix tests
* Switch min opset from 12 to 10
* Update src/transformers/file_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Fix GPT2
* Remove extra shape_list usage
* Fix GPT2
* Address Morgan's comments
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Conversion from slow to fast for BPE spm vocabs contained an error.
- There is only 1 test currently (tokenizers + slow) that used the modified path
and it's reformer, which does not contain any ids modification so the
bug was silent for now.
- The real issue is that vocab variable was overloaded by
SentencePieceExtractor, leading to Slow specific vocab oddities to be
completely ignored
- The bug was reported here https://github.com/huggingface/transformers/issues/9518
- Ran the complete tokenization test suite with slow without error
(`RUN_SLOW=1 pytest -sv tests/test_tokenization_*`)
* Remove rebase error.
* Adding the fixture.
* remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric
* fix
* fix
* fix
* push
* fix
* everything works
* fix init
* fix
* special treatment for sepconv1d
* style
* 🙏🏽
* add doc and cleanup
* fix doc
* fix doc again
* fix doc again
* Apply suggestions from code review
* make style
* Proposal that should work
* Remove needless code
* Fix test
* Apply suggestions from code review
* remove xnli_compute_metrics, add load_dataset, load_metric, set_seed,metric.compute,load_metric
* amend README
* removed data_args.task_name and replaced with task_name = "xnli"; use split function to load train and validation dataset separately; remove __post_init__; remove flag --task_name from README.
* removed dict task_to_keys, use str "xnli" instead of variable task_name, change preprocess_function to use examples["premise"], examples["hypothesis"] directly, remove sentence1_key and sentence2_key, change compute_metrics function to cater only to accuracy metric, add condition for train_langauge is None when using dataset.load_dataset()
* removed `torch.distributed.barrier()` and `import torch` as `from_pretrained` is able to do the work; amend README
* fix rag generate and tests
* put back adjust_logits_during_generation
* tests are okay
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Unify logging with f-strings
* Get limits from MLflow rather than hardcode
* Add a check for parameter length overflow
Also constants are marked as internal
* Don't stop run in on_train_end
This causes bad behaviour when there is a seprarte validation step:
validation gets recorded as separate run.
* Fix style
* Cleaning up `ConversationalPipeline` to support more than DialoGPT.
Currently ConversationalPipeline was heavily biased towards DialoGPT
,which is the default model for this pipeline.
This PR proposes changes to put back the modifications specific to
DialoGPT into tokenizer-specific behavior wherever possible, by
creating `_build_conversation_input_ids` function that takes
conversation as input, and returns a list of ints corresponding
to the tokens. It feels natural to put here because all models
have probably different strategies to build input_ids from the
full conversation and it's the tokenizer's job to transform strings
into tokens (and vice-versa)
If `_build_conversation_input_ids` is missing, previous behavior is
used so we don't break anything so far (except for blenderbot where it's a fix).
This PR also contains a fix for too long inputs. There used
to be dead code for trying to limit the size of incoming input.
The introduced fixed is that we limit
within `_build_conversation_input_ids` to `tokenizer.model_max_length`.
It corresponds to the intent of the removed dead code and is actually
better because it corresponds to `model_max_length` which is different
from `max_length` (which is a default parameter for `generate`).
- Removed `history` logic from the Conversation as it's not relevant
anymore because tokenization logic has been moved to tokenizer.
And tokenizer cannot save any cache, and conversation cannot know
what is relevant or not.
Also it's not usable from `blenderbot` because the input_ids are
not append only (EOS tokens is always at the end).
- Added `iter_texts` method on `Conversation` because all
the code was literred with some form of this iteration of
past/generated_responses.
* Removing torch mention in types.
* Adding type checking to `_build_conversation_input_ids`.
* Fixing import in strings.
* Authorize last version of tokenizer
* Update version table
* Fix conversion of spm tokenizers and fix some hub links
* Bump tokenizers version to 0.10.1rc1
* Add script to check tokenizers conversion with XNLI
* Add some more mask_token lstrip support
* Must modify mask_token in slow tokenizers too
* Keep using the old method for Pegasus
* add missing import
Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
Adding new `encoder_no_repeat_ngram_size` to `generate`.
Blenderbot results seemed off compared to original ParlAI script:
`https://parl.ai/projects/recipes/`. Notably the model seems
to repeat a lot what was said during the conversation.
The actual problem was that `no_repeat_ngram_size` actually applies
to the `encoder_input_ids` but HF's `no_repeat_ngram_size` applies
to the previously generated ids (within the decoder). The history
conversation of blenderbot is within the `encoder` part so that
explains why HF's implementation had the repetitions.
This fix was focused on blenderbot *not* small and added tests
for those because they are quite different in configuration.
This change includes:
- Adding a new EncoderNoRepeatLogitProcessor.
- Adding 1 new arg to `generate` (`encoder_no_repeat_ngram_size`)
- Adding 1 new config parameter `encoder_no_repeat_ngram_size`.
- Adding 2 tests, one for the pipeline (high level, inputs exhibited
repeat behavior, one low level for EncoderNoRepeatLogitProcessor)
- Factored NoRepeatLogitProcessor so that logic could be reused.
Further work:
- Blenderbot conversational pipeline still does not behave correctly
as they way input is prepared within the pipeline is still incorrect
(follow up PR)
- Blenderbot allows the bot to have personas, which is done by
prepending "your personna: XXXX" to the input, this could be explored
too in a follow up PR.
@patrickvonplaten
@LysandreJik
* Update src/transformers/generation_logits_process.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/generation_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/configuration_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Doc quality.
* Fixing test.
* Last fixes.
* Fixing to account for batch_size.
* Update src/transformers/configuration_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/generation_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Add {decoder_,}head_mask to LED
* Fix create_custom_forward signatue in encoder
* Add head_mask to longformer
* Add head_mask to longformer to fix dependencies
of LED on Longformer.
* Not working yet
* Add mising one input in longofrmer_modeling.py
* make fix-copies
* change tokenizer requirement
* split line
* Correct typo from list to str
* improve style
* make other function pretty as well
* add comment
* correct typo
* add new test
* pass tests for tok without padding token
* Apply suggestions from code review
* Change documentation to correctly specify loss tensor size
* Change documentation to correct input format for labels
* Corrected output size of loss tensor for sequence classifier, multiple choice model and question answering
This affects Adafactor with relative_step=False and scale_parameter=True.
Updating group["lr"] makes the result of ._get_lr() depends on the previous call,
i.e., on the scale of other parameters. This isn't supposed to happen.
* MOD: fit chinese wwm to new datasets
* MOD: move wwm to new folder
* MOD: formate code
* Styling
* MOD add param and recover trainer
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
* [t5 doc] typos
a few run away backticks
@sgugger
* style
* [trainer] put fp16 args together
this PR proposes a purely cosmetic change that puts all the fp16 args together - so they are easier to manager/read
@sgugger
* style
* [wandb] make WANDB_DISABLED disable wandb with any value
This PR solves part of https://github.com/huggingface/transformers/issues/9623
It tries to actually do what https://github.com/huggingface/transformers/issues/9699 requested/discussed and that is any value of `WANDB_DISABLED` should disable wandb.
The current behavior is that it has to be one of `ENV_VARS_TRUE_VALUES = {"1", "ON", "YES"}`
I have been using `WANDB_DISABLED=true` everywhere in scripts as it was originally advertised. I have no idea why this was changed to a sub-set of possible values. And it's not documented anywhere.
@sgugger
* WANDB_DISABLED=true to disable; make tf trainer consistent
* style
* Add {decoder_,}head_mask to fsmt_modeling.py
* Enable test_headmasking and some changes to docs
* Remove test_head_masking flag from fsmt test file
Remove test_head_masking flag from test_modeling_fsmt.py
since test_head_masking is set to be True by default (thus it is redundant to store).
* Merge master and remove test_head_masking = True
* Rebase necessary due to an update of jaxlib
* Remove test_head_masking=True in tests/test_modeling_fsmt.py
as it is redundant.
* TFBart lables consider both pad token and -100
* make style
* fix for all other models
Co-authored-by: kykim <kykim>
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* Adding a new `return_full_text` parameter to TextGenerationPipeline.
For text-generation, it's sometimes used as prompting text.
In that context, prefixing `generated_text` with the actual input
forces the caller to take an extra step to remove it.
The proposed change adds a new parameter (for backward compatibility).
`return_full_text` that enables the caller to prevent adding the prefix.
* Doc quality.
* Remove redundant test_head_masking = True flags
* Remove all redundant test_head_masking flags in PyTorch test_modeling_* files
* Make test_head_masking = True as a default choice in test_modeling_tf_commong.py
* Remove all redundant test_head_masking flags in TensorFlow
test_modeling_tf_* files
* Put back test_head_masking=False fot TFT5 models
* Fix computation of attention_probs when head_mask is provided.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Apply changes to the template
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* fix --lr_scheduler_type choices
* rewrite to fix for all enum-based cl args
* cleanup
* adjust test
* style
* Proposal that should work
* Remove needless code
* Fix test
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
pipeline.
- If table is empty then the line that contain `answer[0]` will fail.
- This PR add a check to prevent `answer[0]`.
- Also adds an early check for presence of `table` and `query` to
prevent late failure and give better error message.
- Adds a few tests to make sure these errors are correctly raised.
* Add a debug print
* Adapt Trainer to use smdistributed if available
* Forgotten parenthesis
* Real check for sagemaker
* Donforget to define device...
* Woopsie, local)rank is defined differently
* Update since local_rank has the proper value
* Remove debug statement
* More robust check for smdistributed
* Quality
* Deal with key not present error
* Pad to 8x for fp16 multiple choice example (#9752)
* Pad to 8x for fp16 squad trainer example (#9752)
* Pad to 8x for fp16 ner example (#9752)
* Pad to 8x for fp16 swag example (#9752)
* Pad to 8x for fp16 qa beam search example (#9752)
* Pad to 8x for fp16 qa example (#9752)
* Pad to 8x for fp16 seq2seq example (#9752)
* Pad to 8x for fp16 glue example (#9752)
* Pad to 8x for fp16 new ner example (#9752)
* update script template #9752
* Update examples/multiple-choice/run_swag.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update examples/question-answering/run_qa.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update examples/question-answering/run_qa_beam_search.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* improve code quality #9752
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* We most likely don't want special tokens in this output.
* Adding `skip_special_tokens=True` to FillMaskPipeline
- It's backward incompatible.
- It makes for sense for pipelines to remove references to
special_tokens (all of the other pipelines do that).
- Keeping special tokens makes it hard for users to actually remove them
because all models have different tokens (<s>, <cls>, [CLS], ....)
* Fixing `token_str` in the same vein, and actually fix the tests too !
* Add head_mask/decoder_head_mask for TF BART models
* Add head_mask and decoder_head_mask input arguments for TF BART-based
models as a TF counterpart to the PR #9569
* Add test_headmasking functionality to tests/test_modeling_tf_common.py
* TODO: Add a test to verify that we can get a gradient back for
importance score computation
* Remove redundant #TODO note
Remove redundant #TODO note from tests/test_modeling_tf_common.py
* Fix assertions
* Make style
* Fix ...Model input args and adjust one new test
* Add back head_mask and decoder_head_mask to BART-based ...Model
after the last commit
* Remove head_mask ande decoder_head_mask from input_dict
in TF test_train_pipeline_custom_model as these two have different
shape than other input args (Necessary for passing this test)
* Revert adding global_rng in test_modeling_tf_common.py
* Auto-resume training from checkpoint
* Update examples/text-classification/run_glue.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Roll out to other examples
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Fix Trainer and Args to mention AdamW, not Adam.
* Update the docs for Training Arguments.
* Change arguments adamw_* to adam_*
* Fixed links to AdamW in TrainerArguments docs
* Fix line length in Training Args docs.
* Add decoder_head_mask for PyTorch T5 model
* Add decoder_head_mask args into T5Model and T5ForConditionalGeneration
* Slightly change the order of input args to be in accordance
with the convention from BART-based models introduced within the PR #9569.
* Make style for modeling_t5.py
* Add decoder_head_mask for TF T5 models
* Separate head_mask and decoder_head_mask args in TF T5 models
* Slightly change the order of input args to follow convention
of BART-based models updated in PR #9569
* Update test_forward_signature tests/test_modeling_tf_common.py
w.r.t. the changed order of input args
* Add FutureWarnings for T5 and TFT5 models
* Add FutureWarnings for T5 and TFT5 models warning a user that
input argument `head_mask` was split into two arguments -
`head_mask` and `decoder_head_mask`
* Add default behaviour - `decoder_head_mask` is set to copy
`head_mask`
* Fix T5 modeling and FutureWarning
* Make proper usage of head_mask and decoder_head_mask
in cross_attention
* Fix conditions for raising FutureWarning
* Reformat FutureWarning in T5 modeling
* Refactor the warning message
File "/share/apps/anaconda3/envs/my_env/lib/python3.7/site-packages/transformers/integrations.py", line 419, in __init__
self._SummaryWriter = SummaryWriter
UnboundLocalError: local variable 'SummaryWriter' referenced before assignment
* Update past_key_values in gpt2 (#9391)
* Update generation_utils, and rename some items
* Update modeling_gpt2 to avoid an error in gradient_checkpointing
* Remove 'reorder_cache' from util and add variations to XLNet, TransfoXL, GPT-2
* Change the location of '_reorder_cache' in modeling files
* Add '_reorder_cache' in modeling_ctrl
* Fix a bug of my last commit in CTRL
* Add '_reorder_cache' to GPT2DoubleHeadsModel
* Manage 'use_cache' in config of test_modeling_gpt2
* Clean up the doc string
* Update src/transformers/models/gpt2/modeling_gpt2.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Fix the doc string (GPT-2, CTRL)
* improve gradient_checkpointing_behavior
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add head_mask/decoder_head_mask for BART
This branch implement head_mask and decoder_head_mask
for BART-based models. Full list below:
- BART
- MBart
- Blenderbot
- BlenderbotSmall
- Marian
- Pegasus
Everything is accompanied with updated testing.
* Fix test_headmasking for BART models
* Fix text_headmasking for BART-like models
which has only 2 layers in each modules.
The condition
```
self.assertNotEqual(attentions[1][..., 0, :, :].flatten().sum().item(), 0.0)
```
is, therefore, invalid for encoder-decoder models considering
the `head_mask`
```
head_mask = torch.ones(
self.model_tester.num_hidden_layers,
self.model_tester.num_attention_heads,
device=torch_device,
)
head_mask[0, 0] = 0
head_mask[-1, :-1] = 0
```
specified in the `test_headmasking` test/function.
* Adjust test_modeling_common.py to reflect T5 input args
* Update tests/test_modeling_common.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make style
* make fix-copies
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Switch metrics in run_ner to datasets
* Add flag to return all metrics
* Upstream (and rename) sortish_sampler
* Revert "Upstream (and rename) sortish_sampler"
This reverts commit e07d0dcf650c2bae36da011dd76c77a8bb4feb0d.
* Update run_glue for do_predict with local test data (#9442)
* Update run_glue (#9442): fix comments ('files' to 'a file')
* Update run_glue (#9442): reflect the code review
* Update run_glue (#9442): auto format
* Update run_glue (#9442): reflect the code review
* Use the right version of tokenizers
* Try another way
* Try another way
* Deps are installed from there...
* Deps are installed from there...
* Revert last
* remove needless comment
* Add target contextmanager and rework prepare_seq2seq_batch
* Fix tests, treat BART and Barthez
* Add last tokenizers
* Fix test
* Set src token before calling the superclass
* Remove special behavior for T5
* Remove needless imports
* Remove needless asserts
* Add LayoutLMForSequenceClassification and integration tests
Improve docs
Add LayoutLM notebook to list of community notebooks
* Make style & quality
* Address comments by @sgugger, @patrickvonplaten and @LysandreJik
* Fix rebase with master
* Reformat in one line
* Improve code examples as requested by @patrickvonplaten
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Enable TruncationStrategy override for pipelines
* Update isort.
* Fixing test
* Fixing text_generation pipeline.
* Using same DummyTok as other PR for easier merge later.
* Some more import guards.
* Remove bogus file.
* Do not pass `generate_kwargs` to `_parse_and_tokenize`.
@patrickvonplaten
* Removed DummyTok.
* Doc quality.
* Cleaning up conversation tests.
* Adding tests that don't require downloading models + conversation can be
fully created from static state.
* Making tests non flaky (by fixing generation length)
* Bumping isort version.
* Doc cleanup.
* Remove unused test in this PR.
* Torch import guard for TF.
* Missing torch guard.
* Small mistake in doc.
* Actual uses `_history` and `_index` cache.
+ remove dead enumerate
+ improve warning message.
* Update src/transformers/pipelines/conversational.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/pipelines/conversational.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/pipelines/conversational.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Adding comments and cleaner code to address history copy.
* Improving pipeline name in tests.
* Change tokenizer to a real one (still created at runtime with no
external dependency)
* Simplify DummyTok, reverse changes on tokenization.
* Removing DummyTok.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Python 3.9 changed the format of the string serialization of `typing.Optional`.
For example, `str(typing.Optional[str])` is
`typing.Union[str, NoneType]` in python 3.8 and
`typing.Optional[str]` in Python 3.9.
* Merging all duplicated codes for Text2TextPipeline while preserving
backward compat.
* Fixing TranslationPipeline Hierarchy + return_name
* torch import guard.
* Update isort version.
* Remove code from other PR disentanglement.
* Removed named example to something more agnostic.
* Main init work
* Add version
* Change from absolute to relative imports
* Fix imports
* One more typo
* More typos
* Styling
* Make quality script pass
* Add necessary replace in template
* Fix typos
* Spaces are ignored in replace for some reason
* Forgot one models.
* Fixes for import
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
* Add documentation
* Styling
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
* Don't import libs to check they are available
* Don't import integrations at init
* Add importlib_metdata to deps
* Remove old vars references
* Avoid syntax error
* Adapt testing utils
* Try to appease torchhub
* Add dependency
* Remove more private variables
* Fix typo
* Another typo
* Refine the tf availability test
* Define new output dataclasses for greedy generation
* Add output_[...] flags in greedy generation methods
Added output_attentions, output_hidden_states, output_scores flags in
generate and greedy_search methods in GenerationMixin.
* [WIP] Implement logic and tests for output flags in generation
* Update GreedySearchOutput classes & docstring
* Implement greedy search output accumulation logic
Update greedy_search unittests
Fix generate method return value docstring
Properly init flags with the default config
* Update configuration to add output_scores flag
* Fix test_generation_utils
Sort imports and fix isinstance tests for GreedySearchOutputs
* Fix typo in generation_utils
* Add return_dict_in_generate for backwards compatibility
* Add return_dict_in_generate flag in config
* Fix tyPo in configuration
* Fix handling of attentions and hidden_states flags
* Make style & quality
* first attempt attentions
* some corrections
* improve tests
* special models requires special test
* disable xlm test for now
* clean tests
* fix for tf
* isort
* Add output dataclasses for other generation methods
* Add logic to return dict in sample generation
* Complete test for sample generation
- Pass output_attentions and output_hidden_states flags to encoder in
encoder-decoder models
- Fix import satements order in test_generation_utils file
* Add logic to return dict in sample generation
- Refactor tests to avoid using self.assertTrue, which provides
scarce information when the test fails
- Add tests for the three beam_search methods: vanilla, sample and
grouped
* Style doc
* Fix copy-paste error in generation tests
* Rename logits to scores and refactor
* Refactor group_beam_search for consistency
* make style
* add sequences_scores
* fix all tests
* add docs
* fix beam search finalize test
* correct docstring
* clean some files
* Made suggested changes to the documentation
* Style doc ?
* Style doc using the Python util
* Update src/transformers/generation_utils.py
* fix empty lines
* fix all test
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Store transformers version info when saving the model
* Store transformers version info when saving the model
* fix format
* fix format
* fix format
* Update src/transformers/configuration_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update configuration_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* first commit
* change phobert to phoBERT as per author in overview
* v3 and v4 both runs on same code hence there is no need to differentiate them
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [t5 doc] typos
a few run away backticks
@sgugger
* style
* [trainer] put fp16 args together
this PR proposes a purely cosmetic change that puts all the fp16 args together - so they are easier to manager/read
@sgugger
* style
* create model
* add integration
* save current state
* make integration tests pass
* add one more test
* add explanation to tests
* remove from bart
* add padding
* remove unnecessary test
* make all tests pass
* re-add cookie cutter tests
* finish PyTorch
* fix attention test
* Update tests/test_modeling_common.py
* revert change
* remove unused file
* add string to doc
* save intermediate
* make tf integration tests pass
* finish tf
* fix doc
* fix docs again
* add led to doctree
* add to auto tokenizer
* added tips for led
* make style
* apply jplus statements
* correct tf longformer
* apply lysandres suggestions
* apply sylvains suggestions
* Apply suggestions from code review
* Use extlinks to point hyperlink with the version of code
* Point to version on release and master until then
* Apply style
* Correct links
* Add missing backtick
* Simple missing backtick after all.
Co-authored-by: Raghavendra Sugeeth P S <raghav-5305@raghav-5305.csez.zohocorpin.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* --model_parallel hasn't been implemented for most models
* make the help clear as well
* implement is_parallelizable; use it
* oops
* remove property
This PR proposes to:
* auto-flush `transformers` logging
When using logging for tracing signals from different parts of the code and which could be mixed with print debug this aids to get all the logging events synchronized.
I don't think this change will introduce any performance impacts.
If it helps someone here is the code I used to sync `transformers` logging with various other debug prints.
I was porting bart to MP and I needed to trace that the device switching happens correctly and I added a bunch of logger.info calls inside `modeling_bart.py` and also had some other helpers `print` debug messages which weren't logger based:
```
# auto flush std streams
from sys import stdout, stderr
def stdout_write_flush(args, w=stderr.write): w(args); stderr.flush()
def stderr_write_flush(args, w=stderr.write): w(args); stderr.flush()
stdout.write = stdout_write_flush
stderr.write = stderr_write_flush
from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig
import logging
import transformers.utils.logging
import transformers.models.bart.modeling_bart
# I wanted a shorter simpler format
handlers = transformers.utils.logging._get_library_root_logger().handlers
for handler in handlers:
formatter = logging.Formatter("[%(funcName)s] %(message)s")
handler.setFormatter(formatter)
transformers.models.bart.modeling_bart.logger.setLevel(transformers.logging.INFO)
```
@LysandreJik, @sgugger, @patrickvonplaten
This PR:
* fixes trainer to have the logger agree with the actual default `output_dir`, but setting it one place and passing it as an argument to both places
@sgugger
* fix a bug in eval_batch_retrieval
* should return parser as well as other staticmethod
* remove duplicate argument
* these kwargs are no longer accepted (cause TypeError in self.generator.generate of modeling_rag.py)
* fixed file paths in README
* moved an arg to add_ray_specific_args
```
python -c "from apex.normalization import FusedProphetNetLayerNorm"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ImportError: cannot import name 'FusedProphetNetLayerNorm' from 'apex.normalization' (/home/stas/anaconda3/envs/main-38/lib/python3.8/site-packages/apex/normalization/__init__.py)
```
It looks like this code has never been tested, so it silently fails inside try/except.
Discovered this by accident in https://github.com/huggingface/transformers/issues/9338#issuecomment-752217708
* Create modeling_tf_dpr.py
* Add TFDPR
* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot
last commit accidentally deleted these 4 lines, so I recover them back
* Add TFDPR
* Add TFDPR
* clean up some comments, add TF input-style doc string
* Add TFDPR
* Make return_dict=False as default
* Fix return_dict bug (in .from_pretrained)
* Add get_input_embeddings()
* Create test_modeling_tf_dpr.py
The current version is already passed all 27 tests!
Please see the test run at :
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing
* fix quality
* delete init weights
* run fix copies
* fix repo consis
* del config_class, load_tf_weights
They shoud be 'pytorch only'
* add config_class back
after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion
* newline after .. note::
* import tf, np (Necessary for ModelIntegrationTest)
* slow_test from_pretrained with from_pt=True
At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug
* Add simple TFDPRModelIntegrationTest
Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet
* upload correct tf model
* remove position_ids as missing keys
* fix RagSeq generate with context_input_ids
fix RagSeq generate with context_input_ids
* apply style
* delete unused lines
* Add test_rag_sequence_generate_batch_from_context_input_ids
* Readability improved
* stylying
* Stylize
* typos
* add check_model_generate_from_context_input_ids
* make style
* Apply suggestions from code review
* make style2
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>
* add past_key_values
* add use_cache option
* make mask before cutting ids
* adjust position_ids according to past_key_values
* flatten past_key_values
* fix positional embeds
* fix _reorder_cache
* set use_cache to false when not decoder, fix attention mask init
* add test for caching
* add past_key_values for Roberta
* fix position embeds
* add caching test for roberta
* add doc
* make style
* doc, fix attention mask, test
* small fixes
* adress patrick's comments
* input_ids shouldn't start with pad token
* use_cache only when decoder
* make consistent with bert
* make copies consistent
* add use_cache to encoder
* add past_key_values to tapas attention
* apply suggestions from code review
* make coppies consistent
* add attn mask in tests
* remove copied from longformer
* apply suggestions from code review
* fix bart test
* nit
* simplify model outputs
* fix doc
* fix output ordering
* Add label smoothing in Trainer
* Add options for scheduler and Adafactor in Trainer
* Put Seq2SeqTrainer in the main lib
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Address review comments and adapt scripts
* Documentation
* Move test not using script to tests folder
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update the README of the text classification example
* Update examples/README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Adapt comment from review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add new run_swag example
* Add check
* Add sample
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Very important change to make Lysandre happy
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
setuptools has a pretty fixed expectation of version numbers.
This PR fixes the dev version number and adds a comment with correct formats for the future editors
This fix removes this warning on `make fixup|style|etc` or any other time `setup.py` is being run.
```
setuptools/dist.py:452: UserWarning: Normalizing '4.2.0dev0' to '4.2.0.dev0'
warnings.warn(tmpl.format(**locals()))
```
and the alternative:
```
/setuptools/dist.py:452: UserWarning: Normalizing '4.0.0-rc-1' to '4.0.0rc1
```
Fixes: #8749
@LysandreJik, @sgugger
* First commit: adding all files from tapas_v3
* Fix multiple bugs including soft dependency and new structure of the library
* Improve testing by adding torch_device to inputs and adding dependency on scatter
* Use Python 3 inheritance rather than Python 2
* First draft model cards of base sized models
* Remove model cards as they are already on the hub
* Fix multiple bugs with integration tests
* All model integration tests pass
* Remove print statement
* Add test for convert_logits_to_predictions method of TapasTokenizer
* Incorporate suggestions by Google authors
* Fix remaining tests
* Change position embeddings sizes to 512 instead of 1024
* Comment out positional embedding sizes
* Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
* Added more model names
* Fix truncation when no max length is specified
* Disable torchscript test
* Make style & make quality
* Quality
* Address CI needs
* Test the Masked LM model
* Fix the masked LM model
* Truncate when overflowing
* More much needed docs improvements
* Fix some URLs
* Some more docs improvements
* Test PyTorch scatter
* Set to slow + minify
* Calm flake8 down
* First commit: adding all files from tapas_v3
* Fix multiple bugs including soft dependency and new structure of the library
* Improve testing by adding torch_device to inputs and adding dependency on scatter
* Use Python 3 inheritance rather than Python 2
* First draft model cards of base sized models
* Remove model cards as they are already on the hub
* Fix multiple bugs with integration tests
* All model integration tests pass
* Remove print statement
* Add test for convert_logits_to_predictions method of TapasTokenizer
* Incorporate suggestions by Google authors
* Fix remaining tests
* Change position embeddings sizes to 512 instead of 1024
* Comment out positional embedding sizes
* Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
* Added more model names
* Fix truncation when no max length is specified
* Disable torchscript test
* Make style & make quality
* Quality
* Address CI needs
* Test the Masked LM model
* Fix the masked LM model
* Truncate when overflowing
* More much needed docs improvements
* Fix some URLs
* Some more docs improvements
* Add add_pooling_layer argument to TapasModel
Fix comments by @sgugger and @patrickvonplaten
* Fix issue in docs + fix style and quality
* Clean up conversion script and add task parameter to TapasConfig
* Revert the task parameter of TapasConfig
Some minor fixes
* Improve conversion script and add test for absolute position embeddings
* Improve conversion script and add test for absolute position embeddings
* Fix bug with reset_position_index_per_cell arg of the conversion cli
* Add notebooks to the examples directory and fix style and quality
* Apply suggestions from code review
* Move from `nielsr/` to `google/` namespace
* Apply Sylvain's comments
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Rogge Niels <niels.rogge@howest.be>
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
* trainer and finetune_trainer enhancements and fixes
* add fallback default
* move the fixing of incorrect keys back into finetune trainer
* s/eval/val/ to match the split
* trainer can now use a different prefix than eval_ for metrics
* document new arg
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* use 'eval' as the default for metric_key_prefix
* complete adjust var names + disambiguate
* fix logger
* add clarifying comment
* add clarifying comment
* style
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/trainer.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* complete removal of optional for metric_key_prefix
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add model parallelism to T5EncoderModel
add model parallelism to T5EncoderModel
* remove decoder from T5EncoderModel parallelize
* uodate T5EncoderModel docs
* Extend T5ModelTest for T5EncoderModel
* fix T5Stask using range for get_device_map
* fix style
Co-authored-by: Ahmed Elnaggar <elnaggar@rostlab.informatik.tu-muenchen.de>
* Resize the biases in same time than the embeddings
* Trigger CI
* Biases are not reset anymore
* Remove get_output_embeddings + better LM model detection in generation utils
* Apply style
* First test on BERT
* Update docstring + new name
* Apply the new resizing logic to all the models
* fix tests
* Apply style
* Update the template
* Fix naming
* Fix naming
* Apply style
* Apply style
* Remove unused import
* Revert get_output_embeddings
* Trigger CI
* Update num parameters
* Restore get_output_embeddings in TFPretrainedModel and add comments
* Style
* Add decoder resizing
* Style
* Fix tests
* Separate bias and decoder resize
* Fix tests
* Fix tests
* Apply style
* Add bias resizing in MPNet
* Trigger CI
* Apply style
* Reorganize example folder
* Continue reorganization
* Change requirements for tests
* Final cleanup
* Finish regroup with tests all passing
* Copyright
* Requirements and readme
* Make a full link for the documentation
* Address review comments
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Add symlink
* Reorg again
* Apply suggestions from code review
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Adapt title
* Update to new strucutre
* Remove test
* Update READMEs
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Create README.md
Initial README for `t5-base-indonesian-summarization-cased` model
* Update README for t5-base-indonesian-summarization-cased
Typo in README, change from `small` to `base`
* ci-doc-job-skip-take-4
* wip
* wip
* wip
* wip
* skip yaml
* wip
* wip
* wip
* wip
* wip
* wip
* wip
* wip
* wip
* wip
* wip
* wip
* ready to test
* yet another way
* trying with HEAD
* trying with head.sha
* trying with head.sha fix
* trying with head.sha fix wip
* undo
* try to switch to sha
* current branch
* current branch
* PR number check
* joy ride
* joy ride
* joy ride
* joy ride
* joy ride
* joy ride
* joy ride
* joy ride
* joy ride
* joy ride
* joy ride
* joy ride
* remove make on the fly linear embedding
* start refactor
* big first refactor
* save intermediate
* save intermediat
* correct mask issue
* save tests
* refactor padding masks
* make all tests pass
* further refactor
* make pegasus test pass
* fix bool if
* fix leftover tests
* continue
* bart renaming
* delete torchscript test hack
* fix imports in tests
* correct shift
* fix docs and repo cons
* re-add fix for FSTM
* typo in test
* fix typo
* fix another typo
* continue
* hot fix 2 for tf
* small fixes
* refactor types linting
* continue
* finish refactor
* fix import in tests
* better bart names
* further refactor and add test
* delete hack
* apply sylvains and lysandres commens
* small perf improv
* further perf improv
* improv perf
* fix typo
* make style
* small perf improv
* Remove "Model" suffix from Flax models to look more 🤗
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Initial working (forward + backward) for Flax MLM training example.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Simply code
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing comments, using module and moving to LM task.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Restore parameter name "module" wrongly renamed model.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Restore correct output ordering...
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Actually commit the example 😅
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Add FlaxBertModelForMaskedLM after rebasing.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make it possible to initialize the training from scratch
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Reuse flax linen example of cross entropy loss
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added specific data collator for flax
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove todo for data collator
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added evaluation step
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added ability to provide dtype to support bfloat16 on TPU
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Enable flax tensorboard output
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Enable jax.pmap support.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Ensure batches are correctly sized to be dispatched with jax.pmap
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Enable bfloat16 with --fp16 cmdline args
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Correctly export metrics to tensorboard
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added dropout and ability to use it.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Effectively enable & disable during training and evaluation steps.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Oops.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Enable specifying kernel initializer scale
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added warmup step to the learning rate scheduler.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix typo.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Print training loss
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* fix linter issue (flake8)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix model matching
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix dummies
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix non default dtype on Flax models
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Use the same create_position_ids_from_input_ids for FlaxRoberta
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make Roberta attention as Bert
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* fix copy
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Wording.
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
* diverse beam search
* bug fixes
* bug fixes
* bug fix
* separate out diverse_beam_search function
* separate out diverse_beam_search function
* bug fix
* improve code quality
* bug fix
* bug fix
* separate out diverse beam search scorer
* code format
* code format
* code format
* code format
* add test
* code format
* documentation changes
* code quality
* add slow integration tests
* more general name
* refactor into logits processor
* add test
* avoid too much copy paste
* refactor
* add to docs
* fix-copies
* bug fix
* Revert "bug fix"
This reverts commit c99eb5a8dc57a7b0d33a8ac06d8c6a32a7812ad4.
* improve comment
* implement sylvains feedback
Co-authored-by: Ayush Jain <a.jain@sprinklr.com>
Co-authored-by: ayushtiku5 <40797286+ayushtiku5@users.noreply.github.com>
* Add new SQUAD example
* Same with a task-specific Trainer
* Address review comment.
* Small fixes
* Initial work for XLNet
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Final clean up and working XLNet script
* Test and debug
* Final working version
* Add new SQUAD example
* Same with a task-specific Trainer
* Address review comment.
* Small fixes
* Initial work for XLNet
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Final clean up and working XLNet script
* Test and debug
* Final working version
* Add tick
* Update README
* Address review comments
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Removed unused `encoder_hidden_states` and `encoder_attention_mask` from MobileBert
* Removed decoder tests for MobileBert
* Removed now unnecessary import
* [training] SAVE_STATE_WARNING was removed in pytorch
FYI `SAVE_STATE_WARNING` has been removed 3 days ago: pytorch/pytorch#46813
Fixes: #8232
@sgugger
* style, but add () to prevent autoformatters from botching it
* switch to try/except
* cleanup
* initial commit
* [cli] lfs commands
* Fix FileSlice
* Tweak to FileSlice
* [hf_api] Backport filetype arg from `datasets`
cc @lhoestq
* Silm down the CI while i'm working
* Ok let's try this in CI
* Update config.yml
* Do not try this at home
* one more try
* Update lfs.py
* Revert "Tweak to FileSlice"
This reverts commit d7e32c4b3500400486411e85a2b74e57fb6b52f5.
* Update test_hf_api.py
* Update test_hf_api.py
* Update test_hf_api.py
* CI still green?
* make CI green again?
* Update test_hf_api.py
* make CI red again?
* Update test_hf_api.py
* add CI style back
* Fix CI?
* oh my
* doc + switch back to real staging endpoint
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>
* Fix docblock + f-strings
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com>
* Add TFGPT2ForSequenceClassification based on DialogRPT
* Add TFGPT2ForSequenceClassification based on DialogRPT
* TFGPT2ForSequenceClassification based on DialogRPT-refactored code, implemented review comments and added input processing
* Add TFGPT2ForSequenceClassification based on DialogRPT
* TFGPT2ForSequenceClassification based on DialogRPT-refactored code, implemented review comments and added input processing
* code refactor for latest other TF PR
* code refactor
* code refactor
* Update modeling_tf_gpt2.py
Without this fix, training a `BARTForSequenceClassification` model with `run_pl_glue.py` gives `TypeError: forward() got an unexpected keyword argument 'token_type_ids'`, because BART does not have token_type_ids. I've solved this issue in the same way as it's solved for the "distilbert" model, and I can train BART models on SNLI without errors now.
* Add badge w/ number of models on the hub
* try to apease @sgugger 😇
* not sure what this `c` was about [ci skip]
* Fix script and move stuff around
* Fix doc styling error
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
* [trainer] improve code
This PR:
- removes redundant code
```
self.model = model if model is not None else None
```
and
```
self.model = model
```
are the same.
* separate attribute assignment from code logic - which simplifies things further.
* whitespace
* Warning about too long input for fast tokenizers too
If truncation is not set in tokenizers, but the tokenization is too long
for the model (`model_max_length`), we used to trigger a warning that
The input would probably fail (which it most likely will).
This PR re-enables the warning for fast tokenizers too and uses common
code for the trigger to make sure it's consistent across.
* Checking for pair of inputs too.
* Making the function private and adding it's doc.
* Remove formatting ?? in odd place.
* Missed uppercase.
* restore skip
* Revert "Remove deprecated `evalutate_during_training` (#8852)"
This reverts commit 553029909620455e040a49032a9c45f6a5f0cd52.
* check that pipeline.git.base_revision is defined before proceeding
* Revert "Revert "Remove deprecated `evalutate_during_training` (#8852)""
This reverts commit dfec84db3fdce1079f01f1bc8dfaf21db2ccaba1.
* check that pipeline.git.base_revision is defined before proceeding
* doc only
* doc + code
* restore
* restore
* typo
* fix DP case on multi-gpu
* make executable
* test all 3 modes
* use the correct check for distributed
* dp doesn't need a special case
* restore original name
* cleanup
* NerPipeline (TokenClassification) now outputs offsets of words
- It happens that the offsets are missing, it forces the user to pattern
match the "word" from his input, which is not always feasible.
For instance if a sentence contains the same word twice, then there
is no way to know which is which.
- This PR proposes to fix that by outputting 2 new keys for this
pipelines outputs, "start" and "end", which correspond to the string
offsets of the word. That means that we should always have the
invariant:
```python
input[entity["start"]: entity["end"]] == entity["entity_group"]
# or entity["entity"] if not grouped
```
* Fixing doc style
* Slightly increase tolerance between pytorch and flax output
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* test_multiple_sentences doesn't require torch
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Simplify parameterization on "jit" to use boolean rather than str
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Use `require_torch` on `test_multiple_sentences` because we pull the weight from the hub.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Rename "jit" parameter to "use_jit" for (hopefully) making it self-documenting.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove pytest.mark.parametrize which seems to fail in some circumstances
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix unused imports.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Give default parameters values for traced model.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Review comment: Change sentences to sequences
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Use model.from_pretrained for DataParallel also
When training on multiple GPUs, the code wraps a model with torch.nn.DataParallel. However if the model has custom from_pretrained logic, it does not get applied during load_best_model_at_end.
This commit uses the underlying model during load_best_model_at_end, and re-wraps the loaded model with DataParallel.
If you choose to reject this change, then could you please move the this logic to a function, e.g. def load_best_model_checkpoint(best_model_checkpoint) or something, so that it can be overridden?
* Fix silly bug
* Address review comments
Thanks for the feedback. I made the change that you proposed, but I also think we should update L811 to check if `self.mode` is an instance of `PreTrained`, otherwise we would still not get into that `if` section, right?
* Fix decoder not returning hidden states from the last layer
* Resolve conflict
* Change the way to gather hidden states
* Add decoder hidden states test
* Make pytest and black happy
* Remove redundant line
* remove new line
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* update configuration_utils.py typing to allow pathlike objects when sensible
* update modeling_utils.py typing to allow pathlike objects when sensible
* black
* update tokenization_utils_base.py typing to allow pathlike objects when sensible
* update tokenization_utils_fast.py typing to allow pathlike objects when sensible
* update configuration_auto.py typing to allow pathlike objects when sensible
* update configuration_auto.py docstring to allow pathlike objects when sensible
* update tokenization_auto.py docstring to allow pathlike objects when sensible
* black
* fix mems in xlnet
* fix use_mems
* fix use_mem_len
* fix use mems
* clean docs
* fix tf typo
* make xlnet tf for generation work
* fix tf test
* refactor use cache
* add use cache for missing models
* correct use_cache in generate
* correct use cache in tf generate
* fix tf
* correct getattr typo
* make sylvain happy
* change in docs as well
* do not apply to cookie cutter statements
* fix tf test
* make pytorch model fully backward compatible
* bart output hidden states upstream
* same w/ decoder
* add tests
* fix prophetnet
* fix gpt2 and ctrl
* fix fstm and skip test for reformer and longformer
* fix all models
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* implement support for run-time dependency version checking
* try not escaping !
* use findall that works on py36
* small tweaks
* autoformatter worship
* simplify
* shorter names
* add support for non-versioned checks
* add deps
* revert
* tokenizers not required, check version only if installed
* make a proper distutils cmd and add make target
* tqdm must be checked before tokenizers
* workaround the DistributionNotFound peculiar setup
* handle the rest of packages in setup.py
* fully sync setup.py's install_requires - to check them all
* nit
* make install_requires more readable
* typo
* Update setup.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* restyle
* add types
* simplify
* simplify2
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* added instructions for syncing upstream master with forked master via PR
* expand to add a note to why this is requested
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Support BERT relative position embeddings
* Fix typo in README.md
* Address review comment
* Fix failing tests
* [tiny] Fix style_doc.py check by adding an empty line to configuration_bert.py
* make fix copies
* fix configs of electra and albert and fix longformer
* remove copy statement from longformer
* fix albert
* fix electra
* Add bert variants forward tests for various position embeddings
* [tiny] Fix style for test_modeling_bert.py
* improve docstring
* [tiny] improve docstring and remove unnecessary dependency
* [tiny] Remove unused import
* re-add to ALBERT
* make embeddings work for ALBERT
* add test for albert
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add early stopping patience and minimum threshold metric must improve to prevent early stopping to pytorch trainer
* Add early stopping test
* Set patience counter to 0 if best metric not defined yet
* Make early stopping a callback. Add callback event for updating the best metric for early stopping callback to trigger on.
* Run make style
* make funciton name sensible
* Improve new argument docstring wording and hope that flakey CI test passes.
* Use on_evaluation callback instead of custom. Remove some debug printing
* Move early stopping arguments and state into early stopping callback
* Run make style
* Remove old code
* Fix docs formatting. make style went rogue on me.
* Remove copied attributes and fix variable
* Add assertions on training arguments instead of mutating them. Move comment out of public docs.
* Make separate test for early stopping callback. Add test of invalid arguments.
* Run make style... I remembered before CI this time!
* appease flake8
* Add EarlyStoppingCallback to callback docs
* Make docstring EarlyStoppingCallabck match other callbacks.
* Fix typo in docs
* Make ci fail
* Try to make tests actually run?
* CI finally failing?
* Fix CI
* Revert "Fix CI"
This reverts commit ca7923be7334d4e571b023478ebdd6b33dfd0ebb.
* Ooops wrong one
* one more try
* Ok ok let's move this elsewhere
* Alternative to globals() (#8667)
* Alternative to globals()
* Error is raised later so return None
* Sentencepiece not installed make some tokenizers None
* Apply Lysandre wisdom
* Slightly clearer comment?
cc @sgugger
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [model_cards]: control arabic model examples
* [model_cards]: control input examples of Geotrend models
* [model_cards]: add link to generatation script
* replace init_ddp_connection for index init
* style
* add finetune test
* add test data
* move generate tensors to device
* add test on EM metric
* style
* allow multi process test
* keep gloo process group for retrieval
* add multi-gpu test
* use custom accelerator
* clean test finetune
* minor
* style
* style
* typo
* use python call instead of imported main fumction
* return_dict fix in modeling_rag
* use float32 in retrieval
* store as float32 as well in the custom knowledge dataset example
* style
* rename to finetune_rag
* style
* update readme
* rename utils and callbacks to utils_rag and callbacks_rag
* fix test
* patrick's comments
* generate dummy data in the finetue test script
* remove dummy data files
* style
You may be unaware but you're running some software that meddles with every commit on https://github.com/huggingface/transformers/
Something is wrong with the software you're using. It adds a reference to almost every PR in the master tree. Which is very wrong. Please check your software and please don't do it again.
Example:
see the bottom of this PR and most other PRs:
https://github.com/huggingface/transformers/pull/8639
* `disable_ngram_loss` fix for prophetnet
* add changes documentation
* fix _compute_loss to use mean reduction and -100 to masked tokens & remove unnecessary arguments
* mean label smoothing loss
* small refactor
* fix test
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
2020-11-19 19:18:07 +01:00
3437 changed files with 726977 additions and 151868 deletions
This is a document explaining how to deal with various issues on Circle-CI. The entries may include actually solutions or pointers to Issues that cover those.
## Circle CI
* pytest worker runs out of resident RAM and gets killed by `cgroups`: https://github.com/huggingface/transformers/issues/11408
* [ ] the official example scripts: (give details below)
* [ ] my own modified scripts: (give details below)
The tasks I am working on is:
* [ ] an official GLUE/SQUaD task: (give the name)
* [ ] my own task or dataset: (give details below)
## To reproduce
Steps to reproduce the behavior:
1.
2.
3.
<!-- If you have code snippets, error messages, stack traces please provide them here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.-->
## Expected behavior
<!-- A clear and concise description of what you would expect to happen. -->
- label:"An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)"
- label:"My own task or dataset (give details below)"
- type:textarea
id:reproduction
validations:
required:true
attributes:
label:Reproduction
description:|
Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
If you have code snippets, error messages, stack traces please provide them here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
placeholder:|
Steps to reproduce the behavior:
1.
2.
3.
- type:textarea
id:expected-behavior
validations:
required:true
attributes:
label:Expected behavior
description:"A clear and concise description of what you would expect to happen."
description:Submit a proposal/request for a new transformers feature
labels:["feature"]
body:
- type:textarea
id:feature-request
validations:
required:true
attributes:
label:Feature request
description:|
A clear and concise description of the feature proposal. Please provide a link to the paper and code in case they exist.
- type:textarea
id:motivation
validations:
required:true
attributes:
label:Motivation
description:|
Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too.
- type:textarea
id:contribution
validations:
required:true
attributes:
label:Your contribution
description:|
Is there any way that you could help, e.g. by submitting a PR? Make sure to read the CONTRIBUTING.MD [readme](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md)
name: "\U0001F4DA Migration from pytorch-pretrained-bert or pytorch-transformers"
about: Report a problem when migrating from pytorch-pretrained-bert or pytorch-transformers
to transformers
title: ''
labels: Migration
assignees: ''
---
# 📚 Migration
## Information
<!-- Important information -->
Model I am using (Bert, XLNet ...):
Language I am using the model on (English, Chinese ...):
The problem arises when using:
* [ ] the official example scripts: (give details below)
* [ ] my own modified scripts: (give details below)
The tasks I am working on is:
* [ ] an official GLUE/SQUaD task: (give the name)
* [ ] my own task or dataset: (give details below)
## Details
<!-- A clear and concise description of the migration issue.
If you have code snippets, please provide it here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
-->
## Environment info
<!-- You can run the command `python transformers-cli env` and copy-and-paste its output below.
Don't forget to fill out the missing fields in that output! -->
-`transformers` version:
- Platform:
- Python version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
<!-- IMPORTANT: which version of the former library do you use? -->
*`pytorch-transformers` or `pytorch-pretrained-bert` version (or branch):
## Checklist
- [ ] I have read the migration guide in the readme.
- label:"An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)"
- label:"My own task or dataset (give details below)"
- type:textarea
id:reproduction
validations:
required:true
attributes:
label:Reproduction
description:|
Please provide a code sample that reproduces the problem you ran into. It can be a Colab link or just a code snippet.
If you have code snippets, error messages, stack traces please provide them here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
placeholder:|
Steps to reproduce the behavior:
1.
2.
3.
- type:textarea
id:expected-behavior
validations:
required:true
attributes:
label:Expected behavior
description:"A clear and concise description of what you would expect to happen."
render:shell
- type:checkboxes
id:checklist
attributes:
label:Checklist
options:
- label:"I have read the migration guide in the readme.
description:Submit a proposal/request to implement a new model
labels:["New model"]
body:
- type:textarea
id:description-request
validations:
required:true
attributes:
label:Model description
description:|
Put any and all important information relative to the model
- type:checkboxes
id:information-tasks
attributes:
label:Open source status
description:|
Please note that if the model implementation isn't available or if the weights aren't open-source, we are less likely to implement it in `transformers`.
options:
- label:"The model implementation is available"
- label:"The model weights are available"
- type:textarea
id:additional-info
attributes:
label:Provide useful links for the implementation
description:|
Please provide information regarding the implementation, the weights, and the authors.
Please mention the authors by @gh-username if you're aware of their usernames.
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md#start-contributing-pull-requests),
- [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link
to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the
[documentation guidelines](https://github.com/huggingface/transformers/tree/master/docs), and
[here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/master/docs#writing-source-documentation).
[documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and
[here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?
## Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.
members/contributors who may be interested in your PR.
<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of**who to tag**.
This is a document explaining how to deal with various issues on github-actions self-hosted CI. The entries may include actually solutions or pointers to Issues that cover those.
## GitHub Actions (self-hosted CI)
* Deepspeed
- if jit build hangs, clear out `rm -rf ~/.cache/torch_extensions/` reference: https://github.com/huggingface/transformers/pull/12723
4. Set up a development environment by running the following command in a virtual environment:
@ -152,34 +175,26 @@ Follow these steps to start contributing:
5. Develop the features on your branch.
As you work on the features, you should make sure that the test suite
passes:
passes. You should run the tests impacted by your changes like this:
```bash
$ pytest tests/<TEST_TO_RUN>.py
```
You can also run the full suite with the following command, but it takes
a beefy machine to produce a result in a decent amount of time now that
Transformers has grown a lot. Here is the command for it:
```bash
$ make test
```
Note, that this command uses `-n auto` pytest flag, therefore, it will start as many parallel `pytest` processes as the number of your computer's CPU-cores, and if you have lots of those and a few GPUs and not a great amount of RAM, it's likely to overload your computer. Therefore, to run the test suite, you may want to consider using this command instead:
CircleCI does not run the slow tests, but github actions does every night!
6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_ctrl.py` for an
6. All public methods must have informative docstrings that work nicely with sphinx. See `modeling_bert.py` for an
example.
7. Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos and other non-text files. We prefer to leverage a hf.co hosted `dataset` like
the ones hosted on [`hf-internal-testing`](https://huggingface.co/hf-internal-testing) in which to place these files and reference
them by URL. We recommend putting them in the following dataset: [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images).
If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images
to this dataset.
See more about the checks run on a pull request in our [PR guide](pr_checks)
### Tests
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in
the [tests folder](https://github.com/huggingface/transformers/tree/master/tests) and examples tests in the
#### This guide was heavily inspired by the awesome [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md)
**This guide was heavily inspired by the awesome [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/main/CONTRIBUTING.md).**
### Develop on Windows
On windows, you need to configure git to transform Windows `CRLF` line endings to Linux `LF` line endings:
`git config core.autocrlf input`
One way one can run the make command on Window is to pass by MSYS2:
1. [Download MSYS2](https://www.msys2.org/), we assume to have it installed in C:\msys64
2. Open the command line C:\msys64\msys2.exe (it should be available from the start menu)
3. Run in the shell: `pacman -Syu` and install make with `pacman -S make`
4. Add `C:\msys64\usr\bin` to your PATH environment variable.
You can now use `make` from any terminal (Powershell, cmd.exe, etc) 🎉
### Syncing forked main with upstream (HuggingFace) main
To avoid pinging the upstream repository which adds reference notes to each upstream PR and sends unnecessary notifications to the developers involved in these PRs,
when syncing the main branch of a forked repository, please, follow these steps:
1. When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead merge directly into the forked main.
2. If a PR is absolutely necessary, use the following steps after checking out your branch:
```
$ git checkout -b your-branch-for-syncing
$ git pull --squash --no-commit upstream main
$ git commit -m '<your message without GitHub references>'
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# How To Request Support
This is an Open Source Project so please be mindful that like in any other project of this kind there is no obligation to answer all requests for help.
However, we want to encourage you to ask for help whenever you think it's needed! We are happy about every question we get because it allows us to better understand your needs, possible misunderstandings, and most importantly a way for you to help us make this library better. That being said, this document's main purpose is to provide guidelines at how you can formulate your requests to increase your chances to be understood and to get support.
There are two main venues to receive support: [the forums](https://discuss.huggingface.co/) and [the GitHub issues](https://github.com/huggingface/transformers/issues).
## The Forums
[The user forums](https://discuss.huggingface.co/) are supported by the wide community of the library users and backed up by developers when needed.
If you have a difficulty with deploying this library or some questions, or you'd like to discuss a new feature, please first consider discussing those things at the forums. Only when you feel your subject matter has been crystalized and you still need support from the library developers do proceed to file an [issue](https://github.com/huggingface/transformers/issues).
In particular all "Please explain" questions or objectively very user-specific feature requests belong to the forums. Here are some example of such questions:
* "I would like to use a BertModel within a RL-Agent for a customer support service. How can I use a BertForMaskedLM in my ChatBotModel?"
* "Could you please explain why T5 has no positional embedding matrix under T5Model?"
* "How should I set my generation parameters for translation?"
* "How to train T5 on De->En translation?"
## The GitHub Issues
Everything which hints at a bug should be opened as an [issue](https://github.com/huggingface/transformers/issues).
You are not required to read the following guidelines before opening an issue. However, if you notice that your issue doesn't get any replies, chances are that the developers have one or several difficulties with its quality. In this case, reading the following points and adjusting your issue accordingly could help.
1. Before posting an issue, first search for already posted issues, since chances are someone has already asked a similar question before you.
If you use Google your search query should be:
```
"huggingface" "transformers" your query
```
The first two quoted words tell Google to limit the search to the context of the Huggingface Transformers. The remainder is your query - most commonly this would be the error message the software fails with. We will go deeper into details shortly.
The results of such a query will typically match GitHub issues, Hugging Face forums, StackExchange, and blogs.
If you find relevant hints, you may choose to continue the discussion there if you have follow up questions.
If what you found is similar but doesn't quite answer your problem, please, post a new issue and do include links to similar issues or forum discussions you may have found.
Let's look at some examples:
The error message, often referred to as an assertion, tells us what went wrong. Here is an example of an assertion:
```python
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/transformers/src/transformers/__init__.py", line 34, in <module>
from . import dependency_versions_check
File "/transformers/src/transformers/dependency_versions_check.py", line 34, in <module>
from .utils import is_tokenizers_available
File "/transformers/src/transformers/utils/import_utils.py", line 40, in <module>
from tqdm.auto import tqdm
ModuleNotFoundError: No module named 'tqdm.auto'
```
and it typically includes a traceback, so that we can see the full stack of calls the program made before it fails. This gives us the context to know why the program failed.
Going back to the above example. If you received this error search, look at the very last line of the error which is:
```python
ModuleNotFoundError: No module named 'tqdm.auto'
```
And now we can use it to do the searching on your favorite search engine:
1. first for `"huggingface" "transformers" "ModuleNotFoundError: No module named 'tqdm.auto'"`
2. if you don't find relevant results, then search for just `"ModuleNotFoundError: No module named 'tqdm.auto'"`
3. and finally if nothing still comes up, then remove the outside quotes: `ModuleNotFoundError: No module named 'tqdm.auto'`
If the error includes any messages that include bits unique to your filesystem, always remove those in the search query since other users will not have the same filesystem as yours. For example:
```bash
python -c 'open("/tmp/wrong_path.txt", "r")'
Traceback (most recent call last):
File "<string>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/wrong_path.txt'
```
Here you'd search for just: `"FileNotFoundError: [Errno 2] No such file or directory"`
If the local information that you removed were inside the error message and you removed them you may need to remove double quotes since your query is no longer exact. So if the error message was something like:
```bash
ValueError: '/tmp/wrong_path.txt' cannot be found
```
then you'd search for `"ValueError" "cannot be found"`
As you search you will notice that when you don't use quotes often the search engines will return a variety of unrelated hits, which may or may not be what you want.
Experiment with different ways and find which approach gives the most satisfactory results.
2. Keep the issue short, providing the information that you think will aid the developers to understand your situation. Put yourself in the shoes of the person who has never seen your code or knows anything about your custom setup. This mental exercise will help to develop an intuition to what/what not to share"
3. If there is a software failure, always provide the full traceback, for example:
```python
$ python -c 'import transformers'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/transformers/src/transformers/__init__.py", line 34, in <module>
from . import dependency_versions_check
File "/transformers/src/transformers/dependency_versions_check.py", line 34, in <module>
from .utils import is_tokenizers_available
File "/transformers/src/transformers/utils/import_utils.py", line 40, in <module>
from tqdm.auto import tqdm
ModuleNotFoundError: No module named 'tqdm.auto'
```
As compared to providing just the last line of the error message, e.g.:
```python
ModuleNotFoundError: No module named 'tqdm.auto'
```
which is not sufficient.
If your application is running on more than one GPU (e.g. under `DistributedDataParallel`) and typically getting every log and traceback printed multiple times, please make sure that you paste only one copy of it. At times the traceback from parallel processes may get interleaved - so either disentangle these or change the loggers to log only for `local_rank==0` so that only one process logs things.
4. When quoting a traceback, command line instructions and any type of code always enclose it in triple backticks inside the editor window, that is:
If it's a command line with a long argument list, please consider breaking it down using backslashes and new lines. Here is an example of a good command line quote:
If you don't break it up, one has to scroll horizontally which often makes it quite difficult to quickly see what's happening.
The backslashes allow us to copy the command directly into the console to run it, without needing to edit it.
5. Include only the important information that you think will help the developer to quickly identify the problem.
For example applications often create huge amounts of logs. Ask yourself whether providing all or parts of the log is useful.
Pasting a 100-1000 lines of log into the issue is an immediate turn off, since it will take a lot of time to figure out where the pertinent parts of the log are.
Attaching a full log can be helpful if it's done as an attachment, if it's enclosed in the following html code in the comment editor window:
```
<details>
<summary>Full log</summary>
<pre>
many
lines
go
here
</pre>
</details>
```
which would result in the following entry, which can be opened if desired, but otherwise takes little space.
<details>
<summary>Full log</summary>
<pre>
many
lines
go
here
</pre>
</details>
You could also provide a link to a pastebin service, but this is less beneficial since those links tend to expire quickly and future readers of your issue might not be able to access that log file anymore and may lack some context.
6. If this is an issue in your code, do try to reduce that code to a minimal example that still demonstrates the problem. Please ask at the forums if you have a hard time figuring how to do that. Please realize that we don't have the luxury of having time to try and understand all of your custom code.
If you really tried to make a short reproducible code but couldn't figure it out, it might be that having a traceback will give the developer enough information to know what's going on. But if it is not enough and we can't reproduce the problem, we can't really solve it.
Do not despair if you can't figure it out from the beginning, just share what you can and perhaps someone else will be able to help you at the forums.
If your setup involves any custom datasets, the best way to help us reproduce the problem is to create a [Google Colab notebook](https://colab.research.google.com/) that demonstrates the issue and once you verify that the issue still exists, include a link to that notebook in the Issue. Just make sure that you don't copy and paste the location bar url of the open notebook - as this is private and we won't be able to open it. Instead, you need to click on `Share` in the right upper corner of the notebook, select `Get Link` and then copy and paste the public link it will give to you.
7. If you forked off some of this project's code or example applications, please, do not ask us to go into your code repository and figure out what you may have done. The code is already very complex and unless there is an easy way to do a diff and it's a small diff, it won't be possible to find someone with time on their hands to make a lengthy investigation. Albeit, you might find someone at the forums who will be generous to do this for you.
8. Before reporting an issue, first, always try to update your environment to the latest official version of this library. We have no resources to go and debug older revisions, which could easily have bugs that have been fixed in the latest released version.
We understand that this is not always possible, especially when APIs change, in which case file an issue against the highest library version your environment can support.
Of course, if you upgrade the library, always retest that the problem is still there.
9. Please do not ask us to reproduce an issue with your custom data, since we don't have it. So, either you should use some existing dataset supported by HF datasets or you need to supply a code that generates a small sample on the fly, or some another quick and simple way to get it.
Please do not send us any non-public domain data that may require a license or a permission to be used.
10. Do not tag multiple developers on the issue unless you know this is expected, either because you asked them and they gave you an explicit permission to tag them or the issue template instructs you to do so.
The "who to tag for what domain" part of the issue template is there to help users direct their questions to the right developers who are designated maintainers of project's specific domains. They can then decide at their own discretion to tag other developers if they feel it'd help move the issue forward.
We currently don't have a triage service and we trust your capacity to identify the right domain and thus the persons to tag in your issue. If you are not sure, please use the forums to ask for guidance.
When in doubt, err on the side of not tagging a given person. If you tag multiple people out of context or permission don't be surprised if you get no response at all. Please remember that every time you tag someone, they get a notification and you're taking their time without their permission. Please be sensitive to that.
If you got helped by one of the developers in the past please don't tag them in future issues, unless they are listed in the issue template for the domain you are asking about or that developer gave you an explicit permission to tag them in future issues.
If you see a certain developer doing multiple and/or recent commits into a specific area of the project that you feel is relevant to your issue, it is not a good reason to tag them. Various developers may be fixing things that prevent them from moving forward, but often their work is focused on a totally different domain. And while they may or may not know how to help you with the problem at hand, it would benefit the whole community much more if they focus on the domain of their unique expertise.
11. Use the Edit button. Take your time, and re-read and improve the wording and formatting to make your posts and comments as easy to understand as possible.
Avoid posting multiple comments in a row, as each comment generates a notification for the developers tagged in that issue. If you happened to post multiple comments in a row, and nobody followed up yet - consider merging those into one or a few comments while editing the combined content to be coherent.
If you choose to edit your older comments after others posted follow up comments you need to be aware that your modifications might not be noticed, so if it's not a typo fixing, try to write a new comment flagging that something has been changed in the previous comments.
For example, the very first comment is the most important one. If while the thread unfolds you realize that things aren't as they seemed to you originally you may want to edit the first post to reflect the up-to-date understanding of the issue at hand so that it helps those who read your issue in the future quickly understand what's going on and not need to sift through dozens of comments. It also helps to indicate that the post was edited. So, those reading the thread later can understand why there might be certain discontinuity in the information flow.
Use bullets and items if you have lists of items and the outcome improves overall readability.
Use backticks to refer to class and function names, e.g. `BartModel` and `generate` as these stand out and improve the speed of a reader's comprehension.
Try not use italics and bold text too much as these often make the text more difficult to read.
12. If you are cross-referencing a specific comment in a given thread or another issue, always link to that specific comment, rather than using the issue link. If you do the latter it could be quite impossible to find which specific comment you're referring to.
To get the link to the specific comment do not copy the url from the location bar of your browser, but instead, click the `...` icon in the upper right corner of the comment and then select "Copy Link".
For example the first link is a link to an issue, and the second to a specific comment in the same issue:
13. If you are replying to a last comment, it's totally fine to make your reply with just your comment in it. The readers can follow the information flow here.
But if you're replying to a comment that happened some comments back it's always a good practice to quote just the relevant lines you're replying it. The `>` is used for quoting, or you can always use the menu to do so. For example your editor box will look like:
```
> How big is your gpu cluster?
Our cluster is made of 256 gpus.
```
If you are addressing multiple comments, quote the relevant parts of each before your answer. Some people use the same comment to do multiple replies, others separate them into separate comments. Either way works. The latter approach helps for linking to a specific comment.
In general the best way to figure out what works the best is learn from issues posted by other people - see which issues get great responses and which get little to no response - observe what the posters who received great responses did differently from those who did not.
Thank you for reading this somewhat lengthy document. We would like to conclude that these are not absolute rules, but a friendly advice that will help maximize the chances for us to understand what you are trying to communicate, reproduce the problem then resolve it to your satisfaction and the benefit of the whole community.
If after reading this document there are remaining questions on how and why or there is a need for further elucidation, please, don't hesitate to ask your question in [this thread](https://discuss.huggingface.co/t/how-to-request-support/3128).
@md5sum -c --quiet md5sum.saved ||(printf"\nError: the version dependency table is outdated.\nPlease run 'make fixup' or 'make style' and commit the changes.\n\n"&&exit 1)
<p>State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0
<p>State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow</p>
</h3>
🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, etc in 100+ languages. Its aim is to make cutting-edge NLP easier to use for everyone.
🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets then share them with the community on our [model hub](https://huggingface.co/models). At the same time, each python module defining an architecture can be used as a standalone and modified to enable quick research experiments.
🤗 Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.
🤗 Transformers is backed by the two most popular deep learning libraries, [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/), with a seamless integration between them, allowing you to train your models with one then load it for inference with the other.
* 📝 Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages.
* 🖼️ Images, for tasks like image classification, object detection, and segmentation.
* 🗣️ Audio, for tasks like speech recognition and audio classification.
Transformer models can also perform tasks on **several modalities combined**, such as table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our [model hub](https://huggingface.co/models). At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.
🤗 Transformers is backed by the three most popular deep learning libraries — [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/) — with a seamless integration between them. It's straightforward to train your models with one before loading them for inference with the other.
## Online demos
You can test most of our models directly on their pages from the [model hub](https://huggingface.co/models). We also offer an [inference API](https://huggingface.co/pricing) to use those models.
You can test most of our models directly on their pages from the [model hub](https://huggingface.co/models). We also offer [private model hosting, versioning, & an inference API](https://huggingface.co/pricing) for public and private models.
Here are a few examples:
In Natural Language Processing:
- [Masked word completion with BERT](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France)
- [Name Entity Recognition with Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city)
- [Text generation with GPT-2](https://huggingface.co/gpt2?text=A+long+time+ago%2C+)
- [Natural Langugage Inference with RoBERTa](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal)
- [Natural Language Inference with RoBERTa](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal)
- [Summarization with BART](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct)
- [Question answering with DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species)
- [Translation with T5](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin)
In Computer Vision:
- [Image classification with ViT](https://huggingface.co/google/vit-base-patch16-224)
- [Object Detection with DETR](https://huggingface.co/facebook/detr-resnet-50)
- [Image Segmentation with DETR](https://huggingface.co/facebook/detr-resnet-50-panoptic)
In Audio:
- [Automatic Speech Recognition with Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h)
- [Keyword Spotting with Wav2Vec2](https://huggingface.co/superb/wav2vec2-base-superb-ks)
**[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team, is the official demo of this repo’s text generation capabilities.
## If you are looking for custom support from the Hugging Face team
To immediately use a model on a given text, we provide the `pipeline` API. Pipelines group together a pretrained model with the preprocessing that was used during that model training. Here is how to quickly use a pipeline to classify positive versus negative texts
To immediately use a model on a given input (text, image, audio, ...), we provide the `pipeline` API. Pipelines group together a pretrained model with the preprocessing that was used during that model's training. Here is how to quickly use a pipeline to classify positive versus negative texts:
```python
>>>fromtransformersimportpipeline
# Allocate a pipeline for sentiment-analysis
>>>classifier=pipeline('sentiment-analysis')
>>>classifier('We are very happy to include pipeline into the transformers repository.')
[{'label':'POSITIVE','score':0.9978193640708923}]
>>>classifier('We are very happy to introduce pipeline to the transformers repository.')
[{'label':'POSITIVE','score':0.9996980428695679}]
```
The second line of code downloads and caches the pretrained model used by the pipeline, the third line evaluates it on the given text. Here the answer is "positive" with a confidence of 99.8%.
The second line of code downloads and caches the pretrained model used by the pipeline, while the third evaluates it on the given text. Here the answer is "positive" with a confidence of 99.97%.
This is another example of pipeline used for that can extract question answers from some context:
Many tasks have a pre-trained `pipeline` ready to go, in NLP but also in computer vision and speech. For example, we can easily extract detected objects in an image:
On top of the answer, the pretrained model used here returned its confidence score, along with the start position and its end position in the tokenized sentence. You can learn more about the tasks supported by the `pipeline` API in [this tutorial](https://huggingface.co/transformers/task_summary.html).
Here we get a list of objects detected in the image, with a box surrounding the object and a confidence score. Here is the original image on the right, with the predictions displayed on the left:
To download and use any of the pretrained models on your given task, you just need to use those three lines of codes (PyTorch version):
You can learn more about the tasks supported by the `pipeline` API in [this tutorial](https://huggingface.co/docs/transformers/task_summary).
In addition to `pipeline`, to download and use any of the pretrained models on your given task, all it takes is three lines of code. Here is the PyTorch version:
```python
>>> from transformers import AutoTokenizer, AutoModel
@ -91,7 +167,8 @@ To download and use any of the pretrained models on your given task, you just ne
>>> from transformers import AutoTokenizer, TFAutoModel
@ -102,14 +179,14 @@ or for TensorFlow:
>>> outputs = model(**inputs)
```
The tokenizer is responsible for all the preprocessing the pretrained model expects, and can be called directly on one (or list) of texts (as we can see on the fourth line of both code examples). It will output a dictionary you can directly pass to your model (which is done on the fifth line).
The tokenizer is responsible for all the preprocessing the pretrained model expects, and can be called directly on a single string (as in the above examples) or a list. It will output a dictionary that you can use in downstream code or simply directly pass to your model using the ** argument unpacking operator.
The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) or a [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (depending on your backend) which you can use normally. For instance, [this tutorial](https://huggingface.co/transformers/training.html) explains how to integrate such a model in classic PyTorch or TensorFlow training loop, or how to use our `Trainer` API to quickly fine-tune the on a new dataset.
The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) or a [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (depending on your backend) which you can use as usual. [This tutorial](https://huggingface.co/docs/transformers/training) explains how to integrate such a model into a classic PyTorch or TensorFlow training loop, or how to use our `Trainer` API to quickly fine-tune on a new dataset.
## Why should I use transformers?
1. Easy-to-use state-of-the-art models:
- High performance on NLU and NLG tasks.
- High performance on natural language understanding & generation, computer vision, and audio tasks.
- Low barrier to entry for educators and practitioners.
- Few user-facing abstractions with just three classes to learn.
- A unified API for using all our pretrained models.
@ -117,44 +194,44 @@ The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/sta
1. Lower compute costs, smaller carbon footprint:
- Researchers can share trained models instead of always retraining.
- Practitioners can reduce compute time and production costs.
- Dozens of architectures with over 2,000 pretrained models, some in more than 100 languages.
- Dozens of architectures with over 60,000 pretrained models across all modalities.
1. Choose the right framework for every part of a model's lifetime:
- Train state-of-the-art models in 3 lines of code.
- Move a single model between TF2.0/PyTorch frameworks at will.
- Seamlessly pick the right framework for training, evaluation, production.
- Move a single model between TF2.0/PyTorch/JAX frameworks at will.
- Seamlessly pick the right framework for training, evaluation and production.
1. Easily customize a model or an example to your needs:
- Examples for each architecture to reproduce the results by the official authors of said architecture.
- Expose the models internal as consistently as possible.
- We provide examples for each architecture to reproduce the results published by its original authors.
- Model internals are exposed as consistently as possible.
- Model files can be used independently of the library for quick experiments.
## Why shouldn't I use transformers?
- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving in additional abstractions/files.
- The training API is not intended to work on any model but is optimized to work with the models provided by the library. For generic machine learning loops, you should use another library.
- While we strive to present as many use cases as possible, the scripts in our [examples folder](https://github.com/huggingface/transformers/tree/master/examples) are just that: examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs.
- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving into additional abstractions/files.
- The training API is not intended to work on any model but is optimized to work with the models provided by the library. For generic machine learning loops, you should use another library (possibly, [Accelerate](https://huggingface.co/docs/accelerate)).
- While we strive to present as many use cases as possible, the scripts in our [examples folder](https://github.com/huggingface/transformers/tree/main/examples) are just that: examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs.
## Installation
### With pip
This repository is tested on Python 3.6+, PyTorch 1.0.0+ (PyTorch 1.3.1+ for [examples](https://github.com/huggingface/transformers/tree/master/examples)) and TensorFlow 2.0.
This repository is tested on Python 3.6+, Flax 0.3.2+, PyTorch 1.3.1+ and TensorFlow 2.3+.
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
First, create a virtual environment with the version of Python you're going to use and activate it.
Then, you will need to install at least one of TensorFlow 2.0, PyTorch or Flax.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available), [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform and/or [Flax installation page](https://github.com/google/flax#quick-install).
Then, you will need to install at least one of Flax, PyTorch or TensorFlow.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/), [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) and/or [Flax](https://github.com/google/flax#quick-install) and [Jax](https://github.com/google/jax#installation) installation pages regarding the specific install command for your platform.
When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:
When one of those backends has been installed, 🤗 Transformers can be installed using pip as follows:
```bash
pip install transformers
```
If you'd like to play with the examples, you must [install the library from source](https://huggingface.co/transformers/installation.html#installing-from-source).
If you'd like to play with the examples or need the bleeding edge of the code and can't wait for a new release, you must [install the library from source](https://huggingface.co/docs/transformers/installation#installing-from-source).
### With conda
@ -166,77 +243,183 @@ Since Transformers version v4.0.0, we now have a conda channel: `huggingface`.
conda install -c huggingface transformers
```
Follow the installation pages of TensorFlow, PyTorch or Flax to see how to install them with conda.
Follow the installation pages of Flax, PyTorch or TensorFlow to see how to install them with conda.
## Models architectures
## Model architectures
🤗 Transformers currently provides the following architectures (see [here](https://huggingface.co/transformers/model_summary.html) for a high-level summary of each them):
**[All the model checkpoints](https://huggingface.co/models)** provided by 🤗 Transformers are seamlessly integrated from the huggingface.co [model hub](https://huggingface.co) where they are uploaded directly by [users](https://huggingface.co/users) and [organizations](https://huggingface.co/organizations).
1. **[ALBERT](https://huggingface.co/transformers/model_doc/albert.html)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
1. **[BART](https://huggingface.co/transformers/model_doc/bart.html)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BERT](https://huggingface.co/transformers/model_doc/bert.html)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERT For Sequence Generation](https://huggingface.co/transformers/model_doc/bertgeneration.html)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[Blenderbot](https://huggingface.co/transformers/model_doc/blenderbot.html)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[CamemBERT](https://huggingface.co/transformers/model_doc/camembert.html)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CTRL](https://huggingface.co/transformers/model_doc/ctrl.html)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[DeBERTa](https://huggingface.co/transformers/model_doc/deberta.html)** (from Microsoft Research) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DialoGPT](https://huggingface.co/transformers/model_doc/dialogpt.html)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](https://huggingface.co/transformers/model_doc/distilbert.html)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
1. **[DPR](https://huggingface.co/transformers/model_doc/dpr.html)** (from Facebook) released with the paper [Dense Passage Retrieval
for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon
Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[FlauBERT](https://huggingface.co/transformers/model_doc/flaubert.html)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[Funnel Transformer](https://huggingface.co/transformers/model_doc/funnel.html)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT-2](https://huggingface.co/transformers/model_doc/gpt2.html)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LXMERT](https://huggingface.co/transformers/model_doc/lxmert.html)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MBart](https://huggingface.co/transformers/model_doc/mbart.html)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[MT5](https://huggingface.co/transformers/model_doc/mt5.html)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Pegasus](https://huggingface.co/transformers/model_doc/pegasus.html)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[ProphetNet](https://huggingface.co/transformers/model_doc/prophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
ultilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
1. **[SqueezeBert](https://huggingface.co/transformers/model_doc/squeezebert.html)** released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[Transformer-XL](https://huggingface.co/transformers/model_doc/transformerxl.html)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[XLM](https://huggingface.co/transformers/model_doc/xlm.html)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lampleand Alexis Conneau.
1. **[XLM-ProphetNet](https://huggingface.co/transformers/model_doc/xlmprophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLNet](https://huggingface.co/transformers/model_doc/xlnet.html)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
Current number of checkpoints: 
🤗 Transformers currently provides the following architectures (see [here](https://huggingface.co/docs/transformers/model_summary) for a high-level summary of each them):
1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei.
1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) and a German version of DistilBERT.
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[Donut](https://huggingface.co/docs/transformers/main/model_doc/donut)** (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze.
1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.
1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert.
1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari.
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen.
1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (from Huawei Noah’s Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu.
1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (from META Platforms) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/abs/2010.12821) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/main/model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou.
1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/abs/2202.09741) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
1. **[VideoMAE](https://huggingface.co/docs/transformers/main/model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang.
1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino.
1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli.
1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.
1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
1. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
To cehck if each model has an implementation in PyTorch/TensorFlow/Flax or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to [this table](https://huggingface.co/transformers/index.html#bigtable)
To check if each model has an implementation in Flax, PyTorch or TensorFlow, or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to [this table](https://huggingface.co/docs/transformers/index#supported-frameworks).
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations. You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
These implementations have been tested on several datasets (see the example scripts) and should match the performance of the original implementations. You can find more details on performance in the Examples section of the [documentation](https://huggingface.co/docs/transformers/examples).
## Learn more
| Section | Description |
|-|-|
| [Documentation](https://huggingface.co/transformers/) | Full API documentation and tutorials |
| [Task summary](https://huggingface.co/transformers/task_summary.html) | Tasks supported by 🤗 Transformers |
| [Preprocessing tutorial](https://huggingface.co/transformers/preprocessing.html) | Using the `Tokenizer` class to prepare data for the models |
| [Training and fine-tuning](https://huggingface.co/transformers/training.html) | Using the models provided by 🤗 Transformers in a PyTorch/TensorFlow training loop and the `Trainer` API |
| [Quick tour: Fine-tuning/usage scripts](https://github.com/huggingface/transformers/tree/master/examples) | Example scripts for fine-tuning models on a wide range of tasks |
| [Model sharing and uploading](https://huggingface.co/transformers/model_sharing.html) | Upload and share your fine-tuned models with the community |
| [Migration](https://huggingface.co/transformers/migration.html) | Migrate to 🤗 Transformers from `pytorch-transformers` or `pytorch-pretrained-bert` |
| [Documentation](https://huggingface.co/docs/transformers/) | Full API documentation and tutorials |
| [Task summary](https://huggingface.co/docs/transformers/task_summary) | Tasks supported by 🤗 Transformers |
| [Preprocessing tutorial](https://huggingface.co/docs/transformers/preprocessing) | Using the `Tokenizer` class to prepare data for the models |
| [Training and fine-tuning](https://huggingface.co/docs/transformers/training) | Using the models provided by 🤗 Transformers in a PyTorch/TensorFlow training loop and the `Trainer` API |
| [Quick tour: Fine-tuning/usage scripts](https://github.com/huggingface/transformers/tree/main/examples) | Example scripts for fine-tuning models on a wide range of tasks |
| [Model sharing and uploading](https://huggingface.co/docs/transformers/model_sharing) | Upload and share your fine-tuned models with the community |
| [Migration](https://huggingface.co/docs/transformers/migration) | Migrate to 🤗 Transformers from `pytorch-transformers` or `pytorch-pretrained-bert` |
## Citation
We now have a [paper](https://arxiv.org/abs/1910.03771) you can cite for the 🤗 Transformers library:
We now have a [paper](https://www.aclweb.org/anthology/2020.emnlp-demos.6/) you can cite for the 🤗 Transformers library:
```bibtex
@article{Wolf2019HuggingFacesTS,
title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},
author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush},
journal={ArXiv},
year={2019},
volume={abs/1910.03771}
@inproceedings{wolf-etal-2020-transformers,
title = "Transformers: State-of-the-Art Natural Language Processing",
author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = oct,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
🤗 Transformers는 분류, 정보 추출, 질문 답변, 요약, 번역, 문장 생성 등을 100개 이상의 언어로 수행할 수 있는 수천개의 사전학습된 모델을 제공합니다. 우리의 목표는 모두가 최첨단의 NLP 기술을 쉽게 사용하는 것입니다.
🤗 Transformers는 이러한 사전학습 모델을 빠르게 다운로드해 특정 텍스트에 사용하고, 원하는 데이터로 fine-tuning해 커뮤니티나 우리의 [모델 허브](https://huggingface.co/models)에 공유할 수 있도록 API를 제공합니다. 또한, 모델 구조를 정의하는 각 파이썬 모듈은 완전히 독립적이여서 연구 실험을 위해 손쉽게 수정할 수 있습니다.
🤗 Transformers는 가장 유명한 3개의 딥러닝 라이브러리를 지원합니다. 이들은 서로 완벽히 연동됩니다 — [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/), [TensorFlow](https://www.tensorflow.org/). 간단하게 이 라이브러리 중 하나로 모델을 학습하고, 또 다른 라이브러리로 추론을 위해 모델을 불러올 수 있습니다.
## 온라인 데모
대부분의 모델을 [모델 허브](https://huggingface.co/models) 페이지에서 바로 테스트해볼 수 있습니다. 공개 및 비공개 모델을 위한 [비공개 모델 호스팅, 버전 관리, 추론 API](https://huggingface.co/pricing)도 제공합니다.
예시:
- [BERT로 마스킹된 단어 완성하기](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France)
- [Electra를 이용한 개체명 인식](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city)
- [GPT-2로 텍스트 생성하기](https://huggingface.co/gpt2?text=A+long+time+ago%2C+)
- [BART를 이용한 요약](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct)
- [DistilBERT를 이용한 질문 답변](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species)
원하는 텍스트에 바로 모델을 사용할 수 있도록, 우리는 `pipeline` API를 제공합니다. Pipeline은 사전학습 모델과 그 모델을 학습할 때 적용한 전처리 방식을 하나로 합칩니다. 다음은 긍정적인 텍스트와 부정적인 텍스트를 분류하기 위해 pipeline을 사용한 간단한 예시입니다:
```python
>>>fromtransformersimportpipeline
# Allocate a pipeline for sentiment-analysis
>>>classifier=pipeline('sentiment-analysis')
>>>classifier('We are very happy to introduce pipeline to the transformers repository.')
[{'label':'POSITIVE','score':0.9996980428695679}]
```
코드의 두번째 줄은 pipeline이 사용하는 사전학습 모델을 다운로드하고 캐시로 저장합니다. 세번째 줄에선 그 모델이 주어진 텍스트를 평가합니다. 여기서 모델은 99.97%의 확률로 텍스트가 긍정적이라고 평가했습니다.
많은 NLP 과제들을 `pipeline`으로 바로 수행할 수 있습니다. 예를 들어, 질문과 문맥이 주어지면 손쉽게 답변을 추출할 수 있습니다:
답변뿐만 아니라, 여기에 사용된 사전학습 모델은 확신도와 토크나이즈된 문장 속 답변의 시작점, 끝점까지 반환합니다. [이 튜토리얼](https://huggingface.co/docs/transformers/task_summary)에서 `pipeline` API가 지원하는 다양한 과제를 확인할 수 있습니다.
코드 3줄로 원하는 과제에 맞게 사전학습 모델을 다운로드 받고 사용할 수 있습니다. 다음은 PyTorch 버전입니다:
```python
>>> from transformers import AutoTokenizer, AutoModel
토크나이저는 사전학습 모델의 모든 전처리를 책임집니다. 그리고 (위의 예시처럼) 1개의 스트링이나 리스트도 처리할 수 있습니다. 토크나이저는 딕셔너리를 반환하는데, 이는 다운스트림 코드에 사용하거나 언패킹 연산자 ** 를 이용해 모델에 바로 전달할 수도 있습니다.
모델 자체는 일반적으로 사용되는 [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module)나 [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model)입니다. [이 튜토리얼](https://huggingface.co/transformers/training.html)은 이러한 모델을 표준적인 PyTorch나 TensorFlow 학습 과정에서 사용하는 방법, 또는 새로운 데이터로 fine-tune하기 위해 `Trainer` API를 사용하는 방법을 설명해줍니다.
## 왜 transformers를 사용해야 할까요?
1. 손쉽게 사용할 수 있는 최첨단 모델:
- NLU와 NLG 과제에서 뛰어난 성능을 보입니다.
- 교육자 실무자에게 진입 장벽이 낮습니다.
- 3개의 클래스만 배우면 바로 사용할 수 있습니다.
- 하나의 API로 모든 사전학습 모델을 사용할 수 있습니다.
1. 더 적은 계산 비용, 더 적은 탄소 발자국:
- 연구자들은 모델을 계속 다시 학습시키는 대신 학습된 모델을 공유할 수 있습니다.
- 실무자들은 학습에 필요한 시간과 비용을 절약할 수 있습니다.
- 수십개의 모델 구조, 2,000개 이상의 사전학습 모델, 100개 이상의 언어로 학습된 모델 등.
1. 모델의 각 생애주기에 적합한 프레임워크:
- 코드 3줄로 최첨단 모델을 학습하세요.
- 자유롭게 모델을 TF2.0나 PyTorch 프레임워크로 변환하세요.
- 학습, 평가, 공개 등 각 단계에 맞는 프레임워크를 원하는대로 선택하세요.
1. 필요한 대로 모델이나 예시를 커스터마이즈하세요:
- 우리는 저자가 공개한 결과를 재현하기 위해 각 모델 구조의 예시를 제공합니다.
- 모델 내부 구조는 가능한 일관적으로 공개되어 있습니다.
- 빠른 실험을 위해 모델 파일은 라이브러리와 독립적으로 사용될 수 있습니다.
## 왜 transformers를 사용하지 말아야 할까요?
- 이 라이브러리는 신경망 블록을 만들기 위한 모듈이 아닙니다. 연구자들이 여러 파일을 살펴보지 않고 바로 각 모델을 사용할 수 있도록, 모델 파일 코드의 추상화 수준을 적정하게 유지했습니다.
- 학습 API는 모든 모델에 적용할 수 있도록 만들어지진 않았지만, 라이브러리가 제공하는 모델들에 적용할 수 있도록 최적화되었습니다. 일반적인 머신 러닝을 위해선, 다른 라이브러리를 사용하세요.
- 가능한 많은 사용 예시를 보여드리고 싶어서, [예시 폴더](https://github.com/huggingface/transformers/tree/main/examples)의 스크립트를 준비했습니다. 이 스크립트들을 수정 없이 특정한 문제에 바로 적용하지 못할 수 있습니다. 필요에 맞게 일부 코드를 수정해야 할 수 있습니다.
## 설치
### pip로 설치하기
이 저장소는 Python 3.6+, Flax 0.3.2+, PyTorch 1.3.1+, TensorFlow 2.3+에서 테스트 되었습니다.
[가상 환경](https://docs.python.org/3/library/venv.html)에 🤗 Transformers를 설치하세요. Python 가상 환경에 익숙하지 않다면, [사용자 가이드](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/)를 확인하세요.
우선, 사용할 Python 버전으로 가상 환경을 만들고 실행하세요.
그 다음, Flax, PyTorch, TensorFlow 중 적어도 하나는 설치해야 합니다.
플랫폼에 맞는 설치 명령어를 확인하기 위해 [TensorFlow 설치 페이지](https://www.tensorflow.org/install/), [PyTorch 설치 페이지](https://pytorch.org/get-started/locally/#start-locally), [Flax 설치 페이지](https://github.com/google/flax#quick-install)를 확인하세요.
이들 중 적어도 하나가 설치되었다면, 🤗 Transformers는 다음과 같이 pip을 이용해 설치할 수 있습니다:
```bash
pip install transformers
```
예시들을 체험해보고 싶거나, 최최최첨단 코드를 원하거나, 새로운 버전이 나올 때까지 기다릴 수 없다면 [라이브러리를 소스에서 바로 설치](https://huggingface.co/docs/transformers/installation#installing-from-source)하셔야 합니다.
### conda로 설치하기
Transformers 버전 v4.0.0부터, conda 채널이 생겼습니다: `huggingface`.
🤗 Transformers는 다음과 같이 conda로 설치할 수 있습니다:
```shell script
conda install -c huggingface transformers
```
Flax, PyTorch, TensorFlow 설치 페이지에서 이들을 conda로 설치하는 방법을 확인하세요.
## 모델 구조
**🤗 Transformers가 제공하는 [모든 모델 체크포인트](https://huggingface.co/models)** 는 huggingface.co [모델 허브](https://huggingface.co)에 완벽히 연동되어 있습니다. [개인](https://huggingface.co/users)과 [기관](https://huggingface.co/organizations)이 모델 허브에 직접 업로드할 수 있습니다.
현재 사용 가능한 모델 체크포인트의 개수: 
🤗 Transformers는 다음 모델들을 제공합니다 (각 모델의 요약은 [여기](https://huggingface.co/docs/transformers/model_summary)서 확인하세요):
1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei.
1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/distillation) and a German version of DistilBERT.
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[Donut](https://huggingface.co/docs/transformers/main/model_doc/donut)** (from NAVER) released with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze.
1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.
1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert.
1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari.
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen.
1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (from Huawei Noah’s Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu.
1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (from META Research) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/main/model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou.
1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/pdf/2202.09741.pdf) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
1. **[VideoMAE](https://huggingface.co/docs/transformers/main/model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang.
1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino.
1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli.
1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl)** (from Facebook AI) released with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.
1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
1. 새로운 모델을 올리고 싶나요? 우리가 **상세한 가이드와 템플릿** 으로 새로운 모델을 올리도록 도와드릴게요. 가이드와 템플릿은 이 저장소의 [`templates`](./templates) 폴더에서 확인하실 수 있습니다. [컨트리뷰션 가이드라인](./CONTRIBUTING.md)을 꼭 확인해주시고, PR을 올리기 전에 메인테이너에게 연락하거나 이슈를 오픈해 피드백을 받으시길 바랍니다.
각 모델이 Flax, PyTorch, TensorFlow으로 구현되었는지 또는 🤗 Tokenizers 라이브러리가 지원하는 토크나이저를 사용하는지 확인하려면, [이 표](https://huggingface.co/docs/transformers/index#supported-frameworks)를 확인하세요.
이 구현은 여러 데이터로 검증되었고 (예시 스크립트를 참고하세요) 오리지널 구현의 성능과 같아야 합니다. [도큐먼트](https://huggingface.co/docs/transformers/examples)의 Examples 섹션에서 성능에 대한 자세한 설명을 확인할 수 있습니다.
## 더 알아보기
| 섹션 | 설명 |
|-|-|
| [도큐먼트](https://huggingface.co/transformers/) | 전체 API 도큐먼트와 튜토리얼 |
| [과제 요약](https://huggingface.co/docs/transformers/task_summary) | 🤗 Transformers가 지원하는 과제들 |
| [전처리 튜토리얼](https://huggingface.co/docs/transformers/preprocessing) | `Tokenizer` 클래스를 이용해 모델을 위한 데이터 준비하기 |
| [학습과 fine-tuning](https://huggingface.co/docs/transformers/training) | 🤗 Transformers가 제공하는 모델 PyTorch/TensorFlow 학습 과정과 `Trainer` API에서 사용하기 |
| [퀵 투어: Fine-tuning/사용 스크립트](https://github.com/huggingface/transformers/tree/main/examples) | 다양한 과제에서 모델 fine-tuning하는 예시 스크립트 |
| [모델 공유 및 업로드](https://huggingface.co/docs/transformers/model_sharing) | 커뮤니티에 fine-tune된 모델을 업로드 및 공유하기 |
🤗 Transformers 라이브러리를 인용하고 싶다면, 이 [논문](https://www.aclweb.org/anthology/2020.emnlp-demos.6/)을 인용해 주세요:
```bibtex
@inproceedings{wolf-etal-2020-transformers,
title = "Transformers: State-of-the-Art Natural Language Processing",
author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = oct,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (来自 Google Research and the Toyota Technological Institute at Chicago) 伴随论文 [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), 由 Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut 发布。
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (来自 Facebook) 伴随论文 [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) 由 Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer 发布。
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (来自 École polytechnique) 伴随论文 [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) 由 Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis 发布。
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (来自 VinAI Research) 伴随论文 [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) 由 Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen 发布。
1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (来自 Microsoft) 伴随论文 [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) 由 Hangbo Bao, Li Dong, Furu Wei 发布。
1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (来自 Google) 伴随论文 [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) 由 Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova 发布。
1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (来自 VinAI Research) 伴随论文 [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) 由 Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen 发布。
1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (来自 Google Research) 伴随论文 [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (来自 Google Research) 伴随论文 [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) 由 Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed 发布。
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (来自 Facebook) 伴随论文 [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) 由 Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston 发布。
1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (来自 Alexa) 伴随论文 [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) 由 Adrian de Wynter and Daniel J. Perry 发布。
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (来自 Google Research) 伴随论文 [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) 由 Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel 发布。
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (来自 Inria/Facebook/Sorbonne) 伴随论文 [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) 由 Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot 发布。
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (来自 Google Research) 伴随论文 [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) 由 Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting 发布。
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (来自 OpenAI) 伴随论文 [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) 由 Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever 发布。
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (来自 Salesforce) 伴随论文 [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) 由 Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong 发布。
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (来自 Facebook AI) 伴随论文 [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) 由 Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie 发布。
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (来自 Tsinghua University) 伴随论文 [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) 由 Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun 发布。
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (来自 Salesforce) 伴随论文 [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) 由 Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher 发布。
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (来自 Microsoft) 伴随论文 [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) 由 Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang 发布。
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (来自 Facebook) 伴随论文 [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) 由 Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli 发布。
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (来自 Berkeley/Facebook/Google) 伴随论文 [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) 由 Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch 发布。
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (来自 Facebook) 伴随论文 [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) 由 Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou 发布。
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (来自 Facebook) 伴随论文 [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) 由 Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko 发布。
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (来自 Microsoft Research) 伴随论文 [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) 由 Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan 发布。
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (来自 HuggingFace), 伴随论文 [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) 由 Victor Sanh, Lysandre Debut and Thomas Wolf 发布。 同样的方法也应用于压缩 GPT-2 到 [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/distillation), RoBERTa 到 [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/distillation), Multilingual BERT 到 [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/distillation) 和德语版 DistilBERT。
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (来自 Microsoft Research) 伴随论文 [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) 由 Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei 发布。
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (来自 Facebook) 伴随论文 [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) 由 Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih 发布。
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (来自 Intel Labs) 伴随论文 [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) 由 René Ranftl, Alexey Bochkovskiy, Vladlen Koltun 发布。
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (来自 Google Research/Stanford University) 伴随论文 [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) 由 Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning 发布。
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (来自 CNRS) 伴随论文 [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) 由 Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab 发布。
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (来自 Facebook AI) 伴随论文 [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) 由 Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela 发布。
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (来自 Google Research) 伴随论文 [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) 由 James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon 发布。
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (来自 CMU/Google Brain) 伴随论文 [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) 由 Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le 发布。
1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (来自 KAIST) 伴随论文 [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) 由 Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim 发布。
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (来自 OpenAI) 伴随论文 [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) 由 Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever 发布。
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (来自 EleutherAI) 随仓库 [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) 发布。作者为 Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy 发布。
1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (来自 OpenAI) 伴随论文 [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) 由 Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever** 发布。
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (来自 EleutherAI) 伴随论文 [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) 由 Ben Wang and Aran Komatsuzaki 发布。
1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (来自 UCSD, NVIDIA) 伴随论文 [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) 由 Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang 发布。
1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (来自 Facebook) 伴随论文 [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) 由 Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed 发布。
1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (来自 Berkeley) 伴随论文 [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) 由 Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer 发布。
1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (来自 OpenAI) 伴随论文 [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) 由 Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever 发布。
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (来自 Microsoft Research Asia) 伴随论文 [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) 由 Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou 发布。
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (来自 Microsoft Research Asia) 伴随论文 [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) 由 Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou 发布。
1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (来自 Microsoft Research Asia) 伴随论文 [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) 由 Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei 发布。
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (来自 Microsoft Research Asia) 伴随论文 [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) 由 Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei 发布。
1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (来自 AllenAI) 伴随论文 [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) 由 Iz Beltagy, Matthew E. Peters, Arman Cohan 发布。
1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (来自 Meta AI) 伴随论文 [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) 由 Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze 发布。
1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (来自 AllenAI) 伴随论文 [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) 由 Iz Beltagy, Matthew E. Peters, Arman Cohan 发布。
1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (来自 Google AI) released 伴随论文 [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) 由 Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang 发布。
1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (来自 Studio Ousia) 伴随论文 [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) 由 Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto 发布。
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (来自 UNC Chapel Hill) 伴随论文 [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) 由 Hao Tan and Mohit Bansal 发布。
1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (来自 Facebook) 伴随论文 [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) 由 Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert 发布。
1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov
1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (来自 Facebook) 伴随论文 [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) 由 Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer 发布。
1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (来自 Facebook) 伴随论文 [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) 由 Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan 发布。
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (来自 NVIDIA) 伴随论文 [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 发布。
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (来自 NVIDIA) 伴随论文 [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) 由 Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro 发布。
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (来自 Studio Ousia) 伴随论文 [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) 由 Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka 发布。
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (来自 CMU/Google Brain) 伴随论文 [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) 由 Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou 发布。
1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (来自 Apple) 伴随论文 [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) 由 Sachin Mehta and Mohammad Rastegari 发布。
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (来自 Microsoft Research) 伴随论文 [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) 由 Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu 发布。
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (来自 Google AI) 伴随论文 [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) 由 Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel 发布。
1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (来自 中国人民大学 AI Box) 伴随论文 [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) 由 Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen 发布。
1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (来自华为诺亚方舟实验室) 伴随论文 [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) 由 Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu 发布。
1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (来自 Meta) 伴随论文 [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) 由 the NLLB team 发布。
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (来自 the University of Wisconsin - Madison) 伴随论文 [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) 由 Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh 发布。
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (来自 Meta AI) 伴随论文 [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) 由 Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al 发布。
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (来自 Google AI) 伴随论文 [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) 由 Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby 发布。
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (来自 Google) 伴随论文 [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) 由 Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu 发布。
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (来自 Deepmind) 伴随论文 [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) 由 Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira 发布。
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (来自 VinAI Research) 伴随论文 [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) 由 Dat Quoc Nguyen and Anh Tuan Nguyen 发布。
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (来自 UCLA NLP) 伴随论文 [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) 由 Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang 发布。
1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (来自 Sea AI Labs) 伴随论文 [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) 由 Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng 发布。
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (来自 Microsoft Research) 伴随论文 [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) 由 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou 发布。
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (来自 NVIDIA) 伴随论文 [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) 由 Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius 发布。
1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (来自 Facebook) 伴随论文 [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) 由 Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela 发布。
1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (来自 Google Research) 伴随论文 [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) 由 Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang 发布。
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (来自 Google Research) 伴随论文 [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) 由 Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya 发布。
1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (from META Research) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (来自 Google Research) 伴随论文 [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) 由 Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder 发布。
1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (来自 Facebook), 伴随论文 [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) 由 Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov 发布。
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (来自 ZhuiyiTechnology), 伴随论文 [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) 由 Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu 发布。
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (来自 NVIDIA) 伴随论文 [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) 由 Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo 发布。
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (来自 ASAPP) 伴随论文 [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) 由 Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi 发布。
1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (来自 ASAPP) 伴随论文 [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) 由 Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi 发布。
1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (来自 Facebook), 伴随论文 [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) 由 Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino 发布。
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (来自 Facebook) 伴随论文 [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) 由 Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau 发布。
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (来自 Tel Aviv University) 伴随论文 [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) 由 Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy 发布。
1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (来自 Berkeley) 伴随论文 [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) 由 Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer 发布。
1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (来自 Microsoft) 伴随论文 [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) 由 Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo 发布。
1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/main/model_doc/swinv2)** (来自 Microsoft) 伴随论文 [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) 由 Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo 发布。
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (来自 Google AI) 伴随论文 [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) 由 Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu 发布。
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (来自 Google AI) 伴随论文 [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) 由 Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu 发布。
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (来自 Google AI) 伴随论文 [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) 由 Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos 发布。
1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (来自 Microsoft Research) 伴随论文 [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) 由 Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou 发布。
1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (来自 Google/CMU) 伴随论文 [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) 由 Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov 发布。
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (来自 Microsoft) 伴随论文 [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) 由 Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei 发布。
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (来自 Microsoft Research) 伴随论文 [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) 由 Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang 发布。
1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (来自 Tsinghua University and Nankai University) 伴随论文 [Visual Attention Network](https://arxiv.org/pdf/2202.09741.pdf) 由 Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu 发布。
1. **[VideoMAE](https://huggingface.co/docs/transformers/main/model_doc/videomae)** (来自 Multimedia Computing Group, Nanjing University) 伴随论文 [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) 由 Zhan Tong, Yibing Song, Jue Wang, Limin Wang 发布。
1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (来自 NAVER AI Lab/Kakao Enterprise/Kakao Brain) 伴随论文 [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) 由 Wonjae Kim, Bokyung Son, Ildoo Kim 发布。
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (来自 Google AI) 伴随论文 [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) 由 Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby 发布。
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (来自 UCLA NLP) 伴随论文 [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) 由 Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang 发布。
1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (来自 Meta AI) 伴随论文 [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) 由 Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick 发布。
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (来自 Facebook AI) 伴随论文 [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) 由 Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli 发布。
1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (来自 Facebook AI) 伴随论文 [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) 由 Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino 发布。
1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (来自 Facebook) 伴随论文 [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) 由 Guillaume Lample and Alexis Conneau 发布。
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (来自 Microsoft Research) 伴随论文 [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) 由 Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou 发布。
1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (来自 Facebook AI), 伴随论文 [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) 由 Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov 发布。
1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl)** (来自 Facebook AI) 伴随论文 [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) 由 Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau 发布。
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (来自 Google/CMU) 伴随论文 [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) 由 Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le 发布。
1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (来自 Facebook AI) 伴随论文 [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) 由 Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli 发布。
title = "Transformers: State-of-the-Art Natural Language Processing",
author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = oct,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!---
A useful guide for English-Traditional Chinese translation of Hugging Face documentation
-Add space around English words and numbers when they appear between Chinese characters. E.g., 共 100 多種語言; 使用 transformers 函式庫。
-Use square quotes, e.g.,「引用」
-Some of terms in the file can be found at National Academy for Educational Research (https://terms.naer.edu.tw/), an official website providing bilingual translations between English and Traditional Chinese.
Dictionary
API: API (不翻譯)
add: 加入
checkpoint: 檢查點
code: 程式碼
community: 社群
confidence: 信賴度
dataset: 資料集
documentation: 文件
example: 基本翻譯為「範例」,或依語意翻為「例子」
finetune: 微調
Hugging Face: Hugging Face(不翻譯)
implementation: 實作
inference: 推論
library: 函式庫
module: 模組
NLP/Natural Language Processing: 以 NLP 出現時不翻譯,以 Natural Language Processing 出現時翻譯為自然語言處理
online demos: 線上Demo
pipeline: pipeline(不翻譯)
pretrained/pretrain: 預訓練
Python data structures (e.g., list, set, dict): 翻譯為串列,集合,字典,並用括號標註原英文
1. **[ALBERT](https://huggingface.co/docs/transformers/model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
1. **[BART](https://huggingface.co/docs/transformers/model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BARThez](https://huggingface.co/docs/transformers/model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
1. **[BARTpho](https://huggingface.co/docs/transformers/model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
1. **[BEiT](https://huggingface.co/docs/transformers/model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei.
1. **[BERT](https://huggingface.co/docs/transformers/model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERT For Sequence Generation](https://huggingface.co/docs/transformers/model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[BERTweet](https://huggingface.co/docs/transformers/model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
1. **[BigBird-Pegasus](https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[BigBird-RoBERTa](https://huggingface.co/docs/transformers/model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](https://huggingface.co/docs/transformers/model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](https://huggingface.co/docs/transformers/model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](https://huggingface.co/docs/transformers/model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](https://huggingface.co/docs/transformers/model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](https://huggingface.co/docs/transformers/model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CANINE](https://huggingface.co/docs/transformers/model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
1. **[CLIP](https://huggingface.co/docs/transformers/model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[CodeGen](https://huggingface.co/docs/transformers/model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1. **[ConvBERT](https://huggingface.co/docs/transformers/model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
1. **[ConvNeXT](https://huggingface.co/docs/transformers/model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[CPM](https://huggingface.co/docs/transformers/model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CTRL](https://huggingface.co/docs/transformers/model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[CvT](https://huggingface.co/docs/transformers/model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](https://huggingface.co/docs/transformers/model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](https://huggingface.co/docs/transformers/model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](https://huggingface.co/docs/transformers/model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](https://huggingface.co/docs/transformers/model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
1. **[DeiT](https://huggingface.co/docs/transformers/model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](https://huggingface.co/docs/transformers/model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DialoGPT](https://huggingface.co/docs/transformers/model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](https://huggingface.co/docs/transformers/model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/distillation) and a German version of DistilBERT.
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[Donut](https://huggingface.co/docs/transformers/main/model_doc/donut)** (from NAVER) released with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
1. **[DPR](https://huggingface.co/docs/transformers/model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](https://huggingface.co/docs/transformers/model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[FlauBERT](https://huggingface.co/docs/transformers/model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FLAVA](https://huggingface.co/docs/transformers/model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
1. **[FNet](https://huggingface.co/docs/transformers/model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
1. **[Funnel Transformer](https://huggingface.co/docs/transformers/model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
1. **[GPT](https://huggingface.co/docs/transformers/model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released with the paper [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
1. **[GroupViT](https://huggingface.co/docs/transformers/model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
1. **[Hubert](https://huggingface.co/docs/transformers/model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
1. **[I-BERT](https://huggingface.co/docs/transformers/model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
1. **[ImageGPT](https://huggingface.co/docs/transformers/model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
1. **[LayoutLM](https://huggingface.co/docs/transformers/model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LayoutLMv2](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
1. **[LayoutLMv3](https://huggingface.co/docs/transformers/model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
1. **[LayoutXLM](https://huggingface.co/docs/transformers/model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
1. **[LED](https://huggingface.co/docs/transformers/model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LeViT](https://huggingface.co/docs/transformers/model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze.
1. **[Longformer](https://huggingface.co/docs/transformers/model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LongT5](https://huggingface.co/docs/transformers/model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.
1. **[LUKE](https://huggingface.co/docs/transformers/model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
1. **[LXMERT](https://huggingface.co/docs/transformers/model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[M-CTC-T](https://huggingface.co/docs/transformers/model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert.
1. **[M2M100](https://huggingface.co/docs/transformers/model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
1. **[MarianMT](https://huggingface.co/docs/transformers/model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MaskFormer](https://huggingface.co/docs/transformers/model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov
1. **[mBART](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[mBART-50](https://huggingface.co/docs/transformers/model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[Megatron-BERT](https://huggingface.co/docs/transformers/model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[Megatron-GPT2](https://huggingface.co/docs/transformers/model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[mLUKE](https://huggingface.co/docs/transformers/model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MobileBERT](https://huggingface.co/docs/transformers/model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
1. **[MobileViT](https://huggingface.co/docs/transformers/model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari.
1. **[MPNet](https://huggingface.co/docs/transformers/model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](https://huggingface.co/docs/transformers/model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[MVP](https://huggingface.co/docs/transformers/model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen.
1. **[Nezha](https://huggingface.co/docs/transformers/model_doc/nezha)** (from Huawei Noah’s Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu.
1. **[NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team.
1. **[Nyströmformer](https://huggingface.co/docs/transformers/model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](https://huggingface.co/docs/transformers/master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[OWL-ViT](https://huggingface.co/docs/transformers/model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
1. **[Pegasus](https://huggingface.co/docs/transformers/model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](https://huggingface.co/docs/transformers/model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](https://huggingface.co/docs/transformers/model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[PLBart](https://huggingface.co/docs/transformers/model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](https://huggingface.co/docs/transformers/model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RegNet](https://huggingface.co/docs/transformers/model_doc/regnet)** (from META Research) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
1. **[RemBERT](https://huggingface.co/docs/transformers/model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/pdf/2010.12821.pdf) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
1. **[ResNet](https://huggingface.co/docs/transformers/model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
1. **[RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
1. **[RoFormer](https://huggingface.co/docs/transformers/model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper a [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/pdf/2104.09864v1.pdf) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
1. **[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
1. **[SEW](https://huggingface.co/docs/transformers/model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
1. **[SEW-D](https://huggingface.co/docs/transformers/model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
1. **[SpeechToTextTransformer](https://huggingface.co/docs/transformers/model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
1. **[SpeechToTextTransformer2](https://huggingface.co/docs/transformers/model_doc/speech_to_text_2)** (from Facebook) released with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
1. **[Splinter](https://huggingface.co/docs/transformers/model_doc/splinter)** (from Tel Aviv University) released with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
1. **[SqueezeBERT](https://huggingface.co/docs/transformers/model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](https://huggingface.co/docs/transformers/model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[Swin Transformer V2](https://huggingface.co/docs/transformers/main/model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.
1. **[T5](https://huggingface.co/docs/transformers/model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1)** (from Google AI) released with the paper [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](https://huggingface.co/docs/transformers/model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[TAPEX](https://huggingface.co/docs/transformers/model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou.
1. **[Trajectory Transformer](https://huggingface.co/docs/transformers/model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine
1. **[Transformer-XL](https://huggingface.co/docs/transformers/model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[TrOCR](https://huggingface.co/docs/transformers/model_doc/trocr)** (from Microsoft) released with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[UL2](https://huggingface.co/docs/transformers/model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UniSpeech](https://huggingface.co/docs/transformers/model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
1. **[UniSpeechSat](https://huggingface.co/docs/transformers/model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
1. **[VAN](https://huggingface.co/docs/transformers/model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/pdf/2202.09741.pdf) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
1. **[VideoMAE](https://huggingface.co/docs/transformers/main/model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang.
1. **[ViLT](https://huggingface.co/docs/transformers/model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
1. **[VisualBERT](https://huggingface.co/docs/transformers/model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
1. **[ViTMAE](https://huggingface.co/docs/transformers/model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
1. **[Wav2Vec2-Conformer](https://huggingface.co/docs/transformers/model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino.
1. **[Wav2Vec2Phoneme](https://huggingface.co/docs/transformers/model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli.
1. **[WavLM](https://huggingface.co/docs/transformers/model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[XGLM](https://huggingface.co/docs/transformers/model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](https://huggingface.co/docs/transformers/model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](https://huggingface.co/docs/transformers/model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl)** (from Facebook AI) released with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
1. **[XLNet](https://huggingface.co/docs/transformers/model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[XLS-R](https://huggingface.co/docs/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
1. **[XLSR-Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. **[YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.
1. **[YOSO](https://huggingface.co/docs/transformers/model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
title = "Transformers: State-of-the-Art Natural Language Processing",
author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = oct,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
A folder called ``_build/html`` should have been created. You can now open the file ``_build/html/index.html`` in your
browser.
You can adapt the `--build_dir` to set any temporary folder that you prefer. This command will create it and generate
the MDX files that will be rendered as the documentation on the main website. You can inspect them in your favorite
Markdown editor.
## Previewing the documentation
To preview the docs, first install the `watchdog` module with:
```bash
pip install watchdog
```
Then run the following command:
```bash
doc-builder preview {package_name}{path_to_docs}
```
For example:
```bash
doc-builder preview transformers docs/source/en/
```
The docs will be viewable at [http://localhost:3000](http://localhost:3000). You can also preview the docs once you have opened a PR. You will see a bot add a comment to a link where the documentation with your changes lives.
---
**NOTE**
If you are adding/removing elements from the toc-tree or from any structural item, it is recommended to clean the build
directory before rebuilding. Run the following command to clean and build:
```bash
make clean && make html
```
The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml`& restart `preview`command (`ctrl-c` to stop it & call `doc-builder preview ...` again).
---
It should build the static app that will be available under `/docs/_build/html`
## Adding a new element to the navigation bar
## Adding a new element to the tree (toc-tree)
Accepted files are Markdown (.md or .mdx).
Accepted files are reStructuredText (.rst) and Markdown (.md). Create a file with its extension and put it
in the source directory. You can then link it to the toc-tree by putting the filename without the extension.
Create a file with its extension and put it in the source directory. You can then link it to the toc-tree by putting
the filename without the extension in the [`_toctree.yml`](https://github.com/huggingface/transformers/blob/main/docs/source/_toctree.yml) file.
## Preview the documentation in a pull request
## Renaming section headers and moving sections
Once you have made your pull request, you can check what the documentation will look like after it's merged by
following these steps:
It helps to keep the old links working when renaming section header and/or moving sections from one document to another. This is because the old links are likely to be used in Issues, Forums and Social media and it'd be make for a much more superior user experience if users reading those months later could still easily navigate to the originally intended information.
Therefore we simply keep a little map of moved sections at the end of the document where the original section was. The key is to preserve the original anchor.
So if you renamed a section from: "Section A" to "Section B", then you can add at the end of the file:
Use the relative style to link to the new file so that the versioned docs continue to work.
For an example of a rich moved sections set please see the very end of [the Trainer doc](https://github.com/huggingface/transformers/blob/main/docs/source/main_classes/trainer.mdx).
- Look at the checks at the bottom of the conversation page of your PR (you may need to click on "show all checks" to
expand them).
- Click on "details" next to the `ci/circleci: build_doc` check.
- In the new window, click on the "Artifacts" tab.
- Locate the file "docs/_build/html/index.html" (or any specific page you want to check) and click on it to get a
preview.
## Writing Documentation - Specification
The `huggingface/transformers` documentation follows the
[Google documentation](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) style. It is
If you just want to add a method that is not documented (for instance magic method like `__call__` are not documented
byt default) you can put the list of methods to add in a list that contains `all`:
```
## XXXTokenizer
[[autodoc]] XXXTokenizer
- all
- __call__
```
### Writing source documentation
Values that should be put in `code` should either be surrounded by double backticks: \`\`like so\`\` or be written as
an object using the :obj: syntax: :obj:\`like so\`. Note that argument names and objects like True, None or any strings
should usually be put in `code`.
Values that should be put in `code` should either be surrounded by backticks: \`like so\`. Note that argument names
and objects like True, None or any strings should usually be put in `code`.
When mentionning a class, it is recommended to use the :class: syntax as the mentioned class will be automatically
linked by Sphinx: :class:\`~transformers.XXXClass\`
When mentioning a class, function or method, it is recommended to use our syntax for internal links so that our tool
adds a link to its documentation with this syntax: \[\`XXXClass\`\] or \[\`function\`\]. This requires the class or
function to be in the main package.
When mentioning a function, it is recommended to use the :func: syntax as the mentioned function will be automatically
linked by Sphinx: :func:\`~transformers.function\`.
If you want to create a link to some internal class or function, you need to
provide its path. For instance: \[\`utils.ModelOutput\`\]. This will be converted into a link with
`utils.ModelOutput` in the description. To get rid of the path and only keep the name of the object you are
linking to in the description, add a ~: \[\`~utils.ModelOutput\`\] will generate a link with `ModelOutput` in the description.
When mentioning a method, it is recommended to use the :meth: syntax as the mentioned method will be automatically
linked by Sphinx: :meth:\`~transformers.XXXClass.method\`.
Links should be done as so (note the double underscore at the end): \`text for the link <./local-link-or-global-link#loc>\`__
The same works for methods so you can either use \[\`XXXClass.method\`\] or \[~\`XXXClass.method\`\].
#### Defining arguments in a method
Arguments should be defined with the `Args:` prefix, followed by a line return and an indentation.
The argument should be followed by its type, with its shape if it is a tensor, and a line return.
Another indentation is necessary before writing the description of the argument.
Arguments should be defined with the `Args:` (or `Arguments:` or `Parameters:`) prefix, followed by a line return and
an indentation. The argument should be followed by its type, with its shape if it is a tensor, a colon and its
description:
```
Args:
n_layers (`int`): The number of layers of the model.
```
If the description is too long to fit in one line, another indentation is necessary before writing the description
after th argument.
Here's an example showcasing everything so far:
```
Args:
input_ids (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`):
input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
Indices of input sequence tokens in the vocabulary.
Indices can be obtained using :class:`~transformers.AlbertTokenizer`.
See :meth:`~transformers.PreTrainedTokenizer.encode` and
:meth:`~transformers.PreTrainedTokenizer.__call__` for details.
Indices can be obtained using [`AlbertTokenizer`]. See [`~PreTrainedTokenizer.encode`] and
[`~PreTrainedTokenizer.__call__`] for details.
`What are input IDs? <../glossary.html#input-ids>`__
[What are input IDs?](../glossary#input-ids)
```
For optional arguments or arguments with defaults we follow the following syntax: imagine we have a function with the
@ -192,53 +242,190 @@ then its documentation should look like this:
```
Args:
x (:obj:`str`, `optional`):
x (`str`, *optional*):
This argument controls ...
a (:obj:`float`, `optional`, defaults to 1):
a (`float`, *optional*, defaults to 1):
This argument is used to ...
```
Note that we always omit the "defaults to :obj:\`None\`" when None is the default for any argument. Also note that even
Note that we always omit the "defaults to \`None\`" when None is the default for any argument. Also note that even
if the first line describing your argument type and its default gets long, you can't break it on several lines. You can
however write as many lines as you want in the indented description (see the example above with `input_ids`).
however write as many lines as you want in the indented description (see the example above with `input_ids`).
#### Writing a multi-line code block
#### Writing a multi-line code block
Multi-line code blocks can be useful for displaying examples. They are done like so:
Multi-line code blocks can be useful for displaying examples. They are done between two lines of three backticks as usual in Markdown:
````
```
Example::
# first line of code
# second line
# etc
# first line of code
# second line
# etc
```
The `Example` string at the beginning can be replaced by anything as long as there are two semicolons following it.
````
We follow the [doctest](https://docs.python.org/3/library/doctest.html) syntax for the examples to automatically test
the results stay consistent with the library.
#### Writing a return block
Arguments should be defined with the `Args:` prefix, followed by a line return and an indentation.
The return block should be introduced with the `Returns:` prefix, followed by a line return and an indentation.
The first line should be the type of the return, followed by a line return. No need to indent further for the elements
building the return.
Here's an example for tuple return, comprising several objects:
```
Returns:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.BertConfig`) and inputs:
loss (`optional`, returned when ``masked_lm_labels`` is provided) ``torch.FloatTensor`` of shape ``(1,)``:
Total loss as the sum of the masked language modeling loss and the next sequence prediction (classification) loss.
prediction_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`)
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
```
Here's an example for a single value return:
```
Returns:
:obj:`List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token.
`List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token.
```
Here's an example for tuple return, comprising several objects:
```
Returns:
`tuple(torch.FloatTensor)` comprising various elements depending on the configuration ([`BertConfig`]) and inputs:
- ** loss** (*optional*, returned when `masked_lm_labels` is provided) `torch.FloatTensor` of shape `(1,)` --
Total loss as the sum of the masked language modeling loss and the next sequence prediction (classification) loss.
- **prediction_scores** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) --
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
```
#### Adding an image
Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos and other non-text files. We prefer to leverage a hf.co hosted `dataset` like
the ones hosted on [`hf-internal-testing`](https://huggingface.co/hf-internal-testing) in which to place these files and reference
them by URL. We recommend putting them in the following dataset: [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images).
If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images
to this dataset.
## Styling the docstring
We have an automatic script running with the `make style` comment that will make sure that:
- the docstrings fully take advantage of the line width
- all code examples are formatted using black, like the code of the Transformers library
This script may have some weird failures if you made a syntax mistake or if you uncover a bug. Therefore, it's
recommended to commit your changes before running `make style`, so you can revert the changes done by that script
easily.
# Testing documentation examples
Good documentation oftens comes with an example of how a specific function or class should be used.
Each model class should contain at least one example showcasing
how to use this model class in inference. *E.g.* the class [Wav2Vec2ForCTC](https://huggingface.co/docs/transformers/model_doc/wav2vec2#transformers.Wav2Vec2ForCTC)
includes an example of how to transcribe speech to text in the
[docstring of its forward function](https://huggingface.co/docs/transformers/model_doc/wav2vec2#transformers.Wav2Vec2ForCTC.forward).
## Writing documenation examples
The syntax for Example docstrings can look as follows:
```
Example:
```python
>>> from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
'MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL'
```
```
The docstring should give a minimal, clear example of how the respective model
is to be used in inference and also include the expected (ideally sensible)
output.
Often, readers will try out the example before even going through the function
or class definitions. Therefore it is of utmost importance that the example
works as expected.
## Docstring testing
To do so each example should be included in the doctests.
We use pytests' [doctest integration](https://docs.pytest.org/doctest.html) to verify that all of our examples run correctly.
For Transformers, the doctests are run on a daily basis via GitHub Actions as can be
seen [here](https://github.com/huggingface/transformers/actions/workflows/doctests.yml).
To include your example in the daily doctests, you need add the filename that
contains the example docstring to the [documentation_tests.txt](../utils/documentation_tests.txt).
### For Python files
You will first need to run the following command (from the root of the repository) to prepare the doc file (doc-testing needs to add additional lines that we don't include in the doc source files):
```bash
python utils/prepare_for_doc_test.py src docs
```
If you work on a specific python module, say `modeling_wav2vec2.py`, you can run the command as follows (to avoid the unnecessary temporary changes in irrelevant files):
Then you can run all the tests in the docstrings of a given file with the following command, here is how we test the modeling file of Wav2Vec2 for instance:
If you want to isolate a specific docstring, just add `::` after the file name then type the whole path of the function/class/method whose docstring you want to test. For instance, here is how to just test the forward method of `Wav2Vec2ForCTC`:
Once you're done, you can run the following command (still from the root of the repository) to undo the changes made by the first command before committing:
You will first need to run the following command (from the root of the repository) to prepare the doc file (doc-testing needs to add additional lines that we don't include in the doc source files):
```bash
python utils/prepare_for_doc_test.py src docs
```
Then you can test locally a given file with this command (here testing the quicktour):
Once you're done, you can run the following command (still from the root of the repository) to undo the changes made by the first command before committing:
Here are a few tips to help you debug the doctests and make them pass:
- The outputs of the code need to match the expected output **exactly**, so make sure you have the same outputs. In particular doctest will see a difference between single quotes and double quotes, or a missing parenthesis. The only exceptions to that rule are:
* whitespace: one give whitespace (space, tabulation, new line) is equivalent to any number of whitespace, so you can add new lines where there are spaces to make your output more readable.
* numerical values: you should never put more than 4 or 5 digits to expected results as different setups or library versions might get you slightly different results. `doctest` is configure to ignore any difference lower than the precision to which you wrote (so 1e-4 if you write 4 digits).
- Don't leave a block of code that is very long to execute. If you can't make it fast, you can either not use the doctest syntax on it (so that it's ignored), or if you want to use the doctest syntax to show the results, you can add a comment `# doctest: +SKIP` at the end of the lines of code too long to execute
- Each line of code that produces a result needs to have that result written below. You can ignore an output if you don't want to show it in your code example by adding a comment ` # doctest: +IGNORE_RESULT` at the end of the line of code producing it.
### Translating the Transformers documentation into your language
As part of our mission to democratize machine learning, we'd love to make the Transformers library available in many more languages! Follow the steps below if you want to help translate the documentation into your language 🙏.
**🗞️ Open an issue**
To get started, navigate to the [Issues](https://github.com/huggingface/transformers/issues) page of this repo and check if anyone else has opened an issue for your language. If not, open a new issue by selecting the "Translation template" from the "New issue" button.
Once an issue exists, post a comment to indicate which chapters you'd like to work on, and we'll add your name to the list.
**🍴 Fork the repository**
First, you'll need to [fork the Transformers repo](https://docs.github.com/en/get-started/quickstart/fork-a-repo). You can do this by clicking on the **Fork** button on the top-right corner of this repo's page.
Once you've forked the repo, you'll want to get the files on your local machine for editing. You can do that by cloning the fork with Git as follows:
**📋 Copy-paste the English version with a new language code**
The documentation files are in one leading directory:
- [`docs/source`](https://github.com/huggingface/transformers/tree/main/docs/source): All the documentation materials are organized here by language.
You'll only need to copy the files in the [`docs/source/en`](https://github.com/huggingface/transformers/tree/main/docs/source/en) directory, so first navigate to your fork of the repo and run the following:
```bash
cd ~/path/to/transformers/docs
cp -r source/en source/LANG-ID
```
Here, `LANG-ID` should be one of the ISO 639-1 or ISO 639-2 language codes -- see [here](https://www.loc.gov/standards/iso639-2/php/code_list.php) for a handy table.
**✍️ Start translating**
The fun part comes - translating the text!
The first thing we recommend is translating the part of the `_toctree.yml` file that corresponds to your doc chapter. This file is used to render the table of contents on the website.
> 🙋 If the `_toctree.yml` file doesn't yet exist for your language, you can create one by copy-pasting from the English version and deleting the sections unrelated to your chapter. Just make sure it exists in the `docs/source/LANG-ID/` directory!
The fields you should add are `local` (with the name of the file containing the translation; e.g. `autoclass_tutorial`), and `title` (with the title of the doc in your language; e.g. `Load pretrained instances with an AutoClass`) -- as a reference, here is the `_toctree.yml` for [English](https://github.com/huggingface/transformers/blob/main/docs/source/en/_toctree.yml):
```yaml
- sections:
- local:pipeline_tutorial# Do not change this! Use the same name for your .md file
title:Pipelines for inference# Translate this!
...
title:Tutorials# Translate this!
```
Once you have translated the `_toctree.yml` file, you can start translating the [MDX](https://mdxjs.com/) files associated with your docs chapter.
> 🙋 If you'd like others to help you with the translation, you can either [open an issue](https://github.com/huggingface/transformers/issues) or tag @[espejelomar](https://twitter.com/espejelomar)
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# 🤗 Transformers
Maschinelles Lernen auf dem neuesten Stand der Technik für PyTorch, TensorFlow und JAX.
🤗 Transformers bietet APIs zum einfachen Herunterladen und Trainieren von vortrainierten Modellen auf dem neuesten Stand der Technik. Die Verwendung von vortrainierten Modellen kann Rechenkosten sparen und den CO2-Fußabdruck reduzieren und Zeit sparen, die für das Training eines Modells von Grund auf benötigt wird. Die Modelle können für verschiedene Modalitäten verwendet werden, wie z. B.:
* 📝 Text: Textklassifizierung, Informationsextrahierung, Beantwortung von Fragen, Zusammenfassung, Übersetzung und Texterstellung in über 100 Sprachen.
* 🖼️ Bilder: Bildklassifizierung, Objekterkennung und Segmentierung.
* 🗣️ Audio: Spracherkennung und Audioklassifizierung.
* 🐙 Multimodal: Beantwortung von Tabellenfragen, optische Zeichenerkennung, Informationsextraktion aus gescannten Dokumenten, Videoklassifizierung und Beantwortung visueller Fragen.
Unsere Bibliothek unterstützt die nahtlose Integration von drei der beliebtesten Deep-Learning-Bibliotheken: [PyTorch](https://pytorch.org/), [TensorFlow](https://www.tensorflow.org/) und [JAX](https://jax.readthedocs.io/en/latest/). Trainieren Sie Ihr Modell in drei Codezeilen in einem Framework und laden Sie es zur Inferenz mit einem anderen.
Jede 🤗 Transformers-Architektur ist in einem eigenständigen Python-Modul definiert, so dass sie leicht für Forschung und Experimente angepasst werden kann.
## Wenn Sie auf der Suche nach individueller Unterstützung durch das Hugging Face-Team sind
- **GET STARTED** enthält eine kurze Tour und Installationsanweisungen, um mit 🤗 Transformers loszulegen.
- **TUTORIALS** sind ein hervorragender Ausgangspunkt, wenn Sie neu in unserer Bibliothek sind. Dieser Abschnitt hilft Ihnen, die grundlegenden Fähigkeiten zu erlangen, die Sie benötigen, um mit 🤗 Transformers zu arbeiten.
- **HOW-TO GUIDES** zeigen Ihnen, wie Sie ein bestimmtes Ziel erreichen können, z. B. die Feinabstimmung eines vortrainierten Modells für die Sprachmodellierung oder die Erstellung eines benutzerdefinierten Modellkopfs.
- **KONZEPTUELLE ANLEITUNGEN** bietet weitere Diskussionen und Erklärungen zu den zugrunde liegenden Konzepten und Ideen hinter Modellen, Aufgaben und der Designphilosophie von 🤗 Transformers.
- **API** beschreibt jede Klasse und Funktion, gruppiert in:
- **MAIN CLASSES** für die Hauptklassen, die die wichtigsten APIs der Bibliothek darstellen.
- MODELLE** für die Klassen und Funktionen, die zu jedem in der Bibliothek implementierten Modell gehören.
- **INTERNAL HELPERS** für die Klassen und Funktionen, die wir intern verwenden.
Die Bibliothek enthält derzeit JAX-, PyTorch- und TensorFlow-Implementierungen, vortrainierte Modellgewichte, Nutzungsskripte und Konvertierungsprogramme für die folgenden Modelle.
### Unterstütze Modelle
<!--This list is updated automatically from the README with _make fix-copies_. Do not update manually! -->
1. **[ALBERT](model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
1. **[BART](model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BARThez](model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
1. **[BARTpho](model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
1. **[BEiT](model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei.
1. **[BERT](model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERT For Sequence Generation](model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[BERTweet](model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
1. **[BigBird-Pegasus](model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[BigBird-RoBERTa](model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CANINE](model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
1. **[CLIP](model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[CodeGen](model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1. **[ConvBERT](model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
1. **[ConvNeXT](model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[CPM](model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CTRL](model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[CvT](model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
1. **[DeiT](model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DialoGPT](model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) and a German version of DistilBERT.
1. **[DiT](model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[DPR](model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[DPT](master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
1. **[ELECTRA](model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[FlauBERT](model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FLAVA](model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
1. **[FNet](model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
1. **[Funnel Transformer](model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GLPN](model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
1. **[GPT](model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT Neo](model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
1. **[GPT NeoX](model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
1. **[GPT-2](model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[GPT-J](model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
1. **[GroupViT](model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
1. **[Hubert](model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
1. **[I-BERT](model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
1. **[ImageGPT](model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
1. **[LayoutLM](model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LayoutLMv2](model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
1. **[LayoutLMv3](model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
1. **[LayoutXLM](model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
1. **[LED](model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LeViT](model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze.
1. **[Longformer](model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LongT5](model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.
1. **[LUKE](model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
1. **[LXMERT](model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[M-CTC-T](model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert.
1. **[M2M100](model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
1. **[MarianMT](model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MaskFormer](model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
1. **[mBART](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[mBART-50](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[Megatron-BERT](model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[Megatron-GPT2](model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[mLUKE](model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MobileBERT](model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
1. **[MobileViT](model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari.
1. **[MPNet](model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[MVP](model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen.
1. **[Nezha](model_doc/nezha)** (from Huawei Noah’s Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu.
1. **[NLLB](model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team.
1. **[Nyströmformer](model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[OWL-ViT](model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
1. **[Pegasus](model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[PLBart](model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[ProphetNet](model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[QDQBert](model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[RAG](model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
1. **[REALM](model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RegNet](model_doc/regnet)** (from META Platforms) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
1. **[RemBERT](model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/abs/2010.12821) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
1. **[ResNet](model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
1. **[RoBERTa](model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
1. **[RoFormer](model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
1. **[SegFormer](model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
1. **[SEW](model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
1. **[SEW-D](model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
1. **[SpeechToTextTransformer](model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
1. **[SpeechToTextTransformer2](model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
1. **[Splinter](model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
1. **[SqueezeBERT](model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[Swin Transformer V2](model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.
1. **[T5](model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[T5v1.1](model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[TAPEX](model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou.
1. **[Trajectory Transformer](model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine
1. **[Transformer-XL](model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[TrOCR](model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[UL2](model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UniSpeech](model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
1. **[UniSpeechSat](model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
1. **[VAN](model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/abs/2202.09741) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
1. **[VideoMAE](model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang.
1. **[ViLT](model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[Vision Transformer (ViT)](model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
1. **[VisualBERT](model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
1. **[ViTMAE](model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[Wav2Vec2](model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
1. **[Wav2Vec2-Conformer](model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino.
1. **[Wav2Vec2Phoneme](model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli.
1. **[WavLM](model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[XGLM](model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLM-RoBERTa-XL](model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
1. **[XLNet](model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[XLS-R](model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
1. **[XLSR-Wav2Vec2](model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. **[YOLOS](model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.
1. **[YOSO](model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
### Unterstützte Frameworks
Die folgende Tabelle zeigt die derzeitige Unterstützung in der Bibliothek für jedes dieser Modelle, unabhängig davon, ob sie einen Python
Tokenizer haben (als "langsam" bezeichnet), ein "schneller" Tokenizer, der von der 🤗 Tokenizers Bibliothek unterstützt wird, ob sie Unterstützung in Jax (via
Flax), PyTorch, und/oder TensorFlow haben.
<!--This table is updated automatically from the auto modules with _make fix-copies_. Do not update manually!-->
| Model | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support |
Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Installation
Installieren Sie 🤗 Transformers für die Deep-Learning-Bibliothek, mit der Sie arbeiten, richten Sie Ihren Cache ein und konfigurieren Sie 🤗 Transformers optional für den Offline-Betrieb.
🤗 Transformers wurde unter Python 3.6+, PyTorch 1.1.0+, TensorFlow 2.0+, und Flax getestet. Folgen Sie den Installationsanweisungen unten für die von Ihnen verwendete Deep-Learning-Bibliothek:
Sie sollten 🤗 Transformers in einer [virtuellen Umgebung](https://docs.python.org/3/library/venv.html) installieren. Wenn Sie mit virtuellen Python-Umgebungen nicht vertraut sind, werfen Sie einen Blick auf diese [Anleitung](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). Eine virtuelle Umgebung macht es einfacher, verschiedene Projekte zu verwalten und Kompatibilitätsprobleme zwischen Abhängigkeiten zu vermeiden.
Beginnen wir mit der Erstellung einer virtuellen Umgebung in Ihrem Projektverzeichnis:
```bash
python -m venv .env
```
Aktivieren wir die virtuelle Umgebung. Unter Linux und MacOs:
```bash
source .env/bin/activate
```
Aktivieren wir die virtuelle Umgebung unter Windows
```bash
.env/Scripts/activate
```
Jetzt können wir die 🤗 Transformers mit dem folgenden Befehl installieren:
```bash
pip install transformers
```
Bei reiner CPU-Unterstützung können wir 🤗 Transformers und eine Deep-Learning-Bibliothek bequem in einer Zeile installieren. Installieren wir zum Beispiel 🤗 Transformers und PyTorch mit:
```bash
pip install transformers[torch]
```
🤗 Transformers und TensorFlow 2.0:
```bash
pip install transformers[tf-cpu]
```
🤗 Transformers und Flax:
```bash
pip install transformers[flax]
```
Überprüfen wir abschließend, ob 🤗 Transformers ordnungsgemäß installiert wurde, indem wir den folgenden Befehl ausführen. Es wird ein vortrainiertes Modell heruntergeladen:
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))"
```
Dann wird die Kategorie und die Wahrscheinlichkeit ausgegeben:
Dieser Befehl installiert die aktuelle `main` Version und nicht die neueste `stable` Version. Die `main`-Version ist nützlich, um mit den neuesten Entwicklungen Schritt zu halten. Zum Beispiel, wenn ein Fehler seit der letzten offiziellen Version behoben wurde, aber eine neue Version noch nicht veröffentlicht wurde. Das bedeutet jedoch, dass die "Hauptversion" nicht immer stabil ist. Wir bemühen uns, die Hauptversion einsatzbereit zu halten, und die meisten Probleme werden normalerweise innerhalb weniger Stunden oder eines Tages behoben. Wenn Sie auf ein Problem stoßen, öffnen Sie bitte ein [Issue] (https://github.com/huggingface/transformers/issues), damit wir es noch schneller beheben können!
Überprüfen wir, ob 🤗 Transformers richtig installiert wurde, indem Sie den folgenden Befehl ausführen:
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I love you'))"
```
## Editierbare Installation
Sie benötigen eine bearbeitbare Installation, wenn Sie:
* die "Haupt"-Version des Quellcodes verwenden möchten.
* Zu 🤗 Transformers beitragen und Änderungen am Code testen wollen.
Klonen Sie das Repository und installieren 🤗 Transformers mit den folgenden Befehlen:
Diese Befehle verknüpfen den Ordner, in den Sie das Repository geklont haben, mit den Pfaden Ihrer Python-Bibliotheken. Python wird nun in dem Ordner suchen, in den Sie geklont haben, zusätzlich zu den normalen Bibliothekspfaden. Wenn zum Beispiel Ihre Python-Pakete normalerweise in `~/anaconda3/envs/main/lib/python3.7/site-packages/` installiert sind, wird Python auch den Ordner durchsuchen, in den Sie geklont haben: `~/transformers/`.
<Tip warning={true}>
Sie müssen den Ordner `transformers` behalten, wenn Sie die Bibliothek weiter verwenden wollen.
</Tip>
Jetzt können Sie Ihren Klon mit dem folgenden Befehl ganz einfach auf die neueste Version von 🤗 Transformers aktualisieren:
```bash
cd ~/transformers/
git pull
```
Ihre Python-Umgebung wird beim nächsten Ausführen die `main`-Version von 🤗 Transformers finden.
## Installation mit conda
Installation von dem conda Kanal `huggingface`:
```bash
conda install -c huggingface transformers
```
## Cache Einrichtung
Vorgefertigte Modelle werden heruntergeladen und lokal zwischengespeichert unter: `~/.cache/huggingface/hub`. Dies ist das Standardverzeichnis, das durch die Shell-Umgebungsvariable "TRANSFORMERS_CACHE" vorgegeben ist. Unter Windows wird das Standardverzeichnis durch `C:\Benutzer\Benutzername\.cache\huggingface\hub` angegeben. Sie können die unten aufgeführten Shell-Umgebungsvariablen - in der Reihenfolge ihrer Priorität - ändern, um ein anderes Cache-Verzeichnis anzugeben:
1. Shell-Umgebungsvariable (Standard): `HUGGINGFACE_HUB_CACHE` oder `TRANSFORMERS_CACHE`.
Transformers verwendet die Shell-Umgebungsvariablen `PYTORCH_TRANSFORMERS_CACHE` oder `PYTORCH_PRETRAINED_BERT_CACHE`, wenn Sie von einer früheren Iteration dieser Bibliothek kommen und diese Umgebungsvariablen gesetzt haben, sofern Sie nicht die Shell-Umgebungsvariable `TRANSFORMERS_CACHE` angeben.
</Tip>
## Offline Modus
Transformers ist in der Lage, in einer Firewall- oder Offline-Umgebung zu laufen, indem es nur lokale Dateien verwendet. Setzen Sie die Umgebungsvariable `TRANSFORMERS_OFFLINE=1`, um dieses Verhalten zu aktivieren.
<Tip>
Fügen sie [🤗 Datasets](https://huggingface.co/docs/datasets/) zu Ihrem Offline-Trainingsworkflow hinzufügen, indem Sie die Umgebungsvariable `HF_DATASETS_OFFLINE=1` setzen.
</Tip>
So würden Sie beispielsweise ein Programm in einem normalen Netzwerk mit einer Firewall für externe Instanzen mit dem folgenden Befehl ausführen:
Das Skript sollte nun laufen, ohne sich aufzuhängen oder eine Zeitüberschreitung abzuwarten, da es weiß, dass es nur nach lokalen Dateien suchen soll.
### Abrufen von Modellen und Tokenizern zur Offline-Verwendung
Eine andere Möglichkeit, 🤗 Transformers offline zu verwenden, besteht darin, die Dateien im Voraus herunterzuladen und dann auf ihren lokalen Pfad zu verweisen, wenn Sie sie offline verwenden müssen. Es gibt drei Möglichkeiten, dies zu tun:
* Laden Sie eine Datei über die Benutzeroberfläche des [Model Hub](https://huggingface.co/models) herunter, indem Sie auf das ↓-Symbol klicken.
>>> model = AutoModel.from_pretrained("./your/path/bigscience_t0")
```
* Programmatisches Herunterladen von Dateien mit der [huggingface_hub](https://github.com/huggingface/huggingface_hub/tree/main/src/huggingface_hub) Bibliothek:
1. Installieren Sie die "huggingface_hub"-Bibliothek in Ihrer virtuellen Umgebung:
```bash
python -m pip install huggingface_hub
```
2. Verwenden Sie die Funktion [`hf_hub_download`](https://huggingface.co/docs/hub/adding-a-library#download-files-from-the-hub), um eine Datei in einen bestimmten Pfad herunterzuladen. Der folgende Befehl lädt zum Beispiel die Datei "config.json" aus dem Modell [T0](https://huggingface.co/bigscience/T0_3B) in den gewünschten Pfad herunter:
Weitere Informationen zum Herunterladen von Dateien, die auf dem Hub gespeichert sind, finden Sie im Abschnitt [Wie man Dateien vom Hub herunterlädt] (https://huggingface.co/docs/hub/how-to-downstream).
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Schnellstart
[[open-in-colab]]
Mit 🤗 Transformers können Sie sofort loslegen! Verwenden Sie die [`pipeline`] für schnelle Inferenz und laden Sie schnell ein vortrainiertes Modell und einen Tokenizer mit einer [AutoClass](./model_doc/auto), um Ihre Text-, Bild- oder Audioaufgabe zu lösen.
<Tip>
Alle in der Dokumentation vorgestellten Codebeispiele haben oben links einen Umschalter für PyTorch und TensorFlow. Wenn
nicht, wird erwartet, dass der Code für beide Backends ohne Änderungen funktioniert.
</Tip>
## Pipeline
[`pipeline`] ist der einfachste Weg, ein vortrainiertes Modell für eine bestimmte Aufgabe zu verwenden.
<Youtube id="tiZFewofSLM"/>
Die [`pipeline`] unterstützt viele gängige Aufgaben:
**Text**:
* Stimmungsanalyse: Klassifizierung der Polarität eines gegebenen Textes.
* Textgenerierung (auf Englisch): Generierung von Text aus einer gegebenen Eingabe.
* Name-Entity-Recognition (NER): Kennzeichnung jedes Worts mit der Entität, die es repräsentiert (Person, Datum, Ort usw.).
* Beantwortung von Fragen: Extrahieren der Antwort aus dem Kontext, wenn ein gewisser Kontext und eine Frage gegeben sind.
* Fill-mask: Ausfüllen von Lücken in einem Text mit maskierten Wörtern.
* Zusammenfassung: Erstellung einer Zusammenfassung einer langen Text- oder Dokumentensequenz.
* Übersetzung: Übersetzen eines Textes in eine andere Sprache.
* Merkmalsextraktion: Erstellen einer Tensordarstellung des Textes.
**Bild**:
* Bildklassifizierung: Klassifizierung eines Bildes.
* Bildsegmentierung: Klassifizierung jedes Pixels in einem Bild.
* Objekterkennung: Erkennen von Objekten innerhalb eines Bildes.
**Audio**:
* Audioklassifizierung: Zuweisung eines Labels zu einem bestimmten Audiosegment.
* Automatische Spracherkennung (ASR): Transkription von Audiodaten in Text.
<Tip>
Für mehr Details über die [`pipeline`] und assoziierte Aufgaben, schauen Sie in die Dokumentation [hier](./main_classes/pipelines).
</Tip>
### Verwendung der Pipeline
Im folgenden Beispiel werden Sie die [`pipeline`] für die Stimmungsanalyse verwenden.
Installieren Sie die folgenden Abhängigkeiten, falls Sie dies nicht bereits getan haben:
<frameworkcontent>
<pt>
```bash
pip install torch
```
</pt>
<tf>
```bash
pip install tensorflow
```
</tf>
</frameworkcontent>
Importieren sie die [`pipeline`] und spezifizieren sie die Aufgabe, welche sie lösen möchten:
```py
>>> from transformers import pipeline
>>> classifier = pipeline("sentiment-analysis")
```
Die Pipeline lädt ein standardmäßiges [vortrainiertes Modell] (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) und einen Tokenizer für die Stimmungs-Analyse herunter und speichert sie. Jetzt können Sie den "Klassifikator" auf Ihren Zieltext anwenden:
```py
>>> classifier("We are very happy to show you the 🤗 Transformers library.")
[{'label': 'POSITIVE', 'score': 0.9998}]
```
For more than one sentence, pass a list of sentences to the [`pipeline`] which returns a list of dictionaries:
```py
>>> results = classifier(["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."])
>>> for result in results:
... print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.5309
```
Die [`pipeline`] kann auch über einen ganzen Datensatz iterieren. Starten wir mit der Installation der [🤗 Datasets](https://huggingface.co/docs/datasets/) Bibliothek:
```bash
pip install datasets
```
Erstellen wir eine [`pipeline`] mit der Aufgabe die wir lösen und dem Modell welches wir nutzen möchten.
Als nächstes laden wir den Datensatz (siehe 🤗 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart.html) für mehr Details) welches wir nutzen möchten. Zum Beispiel laden wir den [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) Datensatz:
Audiodateien werden automatisch geladen und neu abgetastet, wenn die Spalte "audio" aufgerufen wird.
Extrahieren wir die rohen Wellenform-Arrays der ersten 4 Beispiele und übergeben wir sie als Liste an die Pipeline:
```py
>>> result = speech_recognizer(dataset[:4]["audio"])
>>> print([d["text"] for d in result])
['I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT', "FODING HOW I'D SET UP A JOIN TO HET WITH MY WIFE AND WHERE THE AP MIGHT BE", "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE AP SO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AND I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS", 'HOW DO I THURN A JOIN A COUNT']
```
Bei einem größeren Datensatz mit vielen Eingaben (wie bei Sprache oder Bildverarbeitung) sollten Sie einen Generator anstelle einer Liste übergeben, der alle Eingaben in den Speicher lädt. Weitere Informationen finden Sie in der [Pipeline-Dokumentation](./main_classes/pipelines).
### Ein anderes Modell und einen anderen Tokenizer in der Pipeline verwenden
Die [`pipeline`] kann jedes Modell aus dem [Model Hub] (https://huggingface.co/models) verwenden, wodurch es einfach ist, die [`pipeline`] für andere Anwendungsfälle anzupassen. Wenn Sie beispielsweise ein Modell wünschen, das französischen Text verarbeiten kann, verwenden Sie die Tags im Model Hub, um nach einem geeigneten Modell zu filtern. Das oberste gefilterte Ergebnis liefert ein mehrsprachiges [BERT-Modell](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment), das auf die Stimmungsanalyse abgestimmt ist. Großartig, verwenden wir dieses Modell!
Use the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and it's associated tokenizer (more on an `AutoClass` below):
```py
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
>>> model = AutoModelForSequenceClassification.from_pretrained(model_name)
Use the [`TFAutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and it's associated tokenizer (more on an `TFAutoClass` below):
```py
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
>>> classifier("Nous sommes très heureux de vous présenter la bibliothèque 🤗 Transformers.")
[{'label': '5 stars', 'score': 0.7273}]
```
Wenn Sie kein Modell für Ihren Anwendungsfall finden können, müssen Sie ein vortrainiertes Modell auf Ihren Daten feinabstimmen. Schauen Sie sich unser [Feinabstimmungs-Tutorial](./training) an, um zu erfahren, wie das geht. Und schließlich, nachdem Sie Ihr trainiertes Modell verfeinert haben, sollten Sie es mit der Community im Model Hub teilen (siehe Tutorial [hier](./model_sharing)), um NLP für alle zu demokratisieren! 🤗
## AutoClass
<Youtube id="AhChOFRegn4"/>
Unter der Haube arbeiten die Klassen [`AutoModelForSequenceClassification`] und [`AutoTokenizer`] zusammen, um die [`pipeline`] zu betreiben. Eine [`AutoClass`](./model_doc/auto) ist eine Abkürzung, die automatisch die Architektur eines trainierten Modells aus dessen Namen oder Pfad abruft. Sie müssen nur die passende `AutoClass` für Ihre Aufgabe und den zugehörigen Tokenizer mit [`AutoTokenizer`] auswählen.
Kehren wir zu unserem Beispiel zurück und sehen wir uns an, wie Sie die `AutoClass` verwenden können, um die Ergebnisse der [`pipeline`] zu replizieren.
### AutoTokenizer
Ein Tokenizer ist für die Vorverarbeitung von Text in ein für das Modell verständliches Format zuständig. Zunächst zerlegt der Tokenisierer den Text in Wörter, die *Token* genannt werden. Es gibt mehrere Regeln für den Tokenisierungsprozess, z. B. wie und auf welcher Ebene ein Wort aufgespalten wird (weitere Informationen über Tokenisierung [hier](./tokenizer_summary)). Das Wichtigste ist jedoch, dass Sie den Tokenizer mit demselben Modellnamen instanziieren müssen, um sicherzustellen, dass Sie dieselben Tokenisierungsregeln verwenden, mit denen ein Modell zuvor trainiert wurde.
Anschließend wandelt der Tokenizer die Token in Zahlen um, um einen Tensor als Eingabe für das Modell zu konstruieren. Dieser wird als *Vokabular* des Modells bezeichnet.
Übergeben Sie Ihren Text an den Tokenizer:
```py
>>> encoding = tokenizer("We are very happy to show you the 🤗 Transformers library.")
Der Tokenizer gibt ein Wörterbuch zurück, das Folgendes enthält:
* [input_ids](./glossary#input-ids): numerische Repräsentationen Ihrer Token.
* [atttention_mask](.glossary#attention-mask): gibt an, welche Token beachtet werden sollen.
Genau wie die [`pipeline`] akzeptiert der Tokenizer eine Liste von Eingaben. Darüber hinaus kann der Tokenizer den Text auch auffüllen und kürzen, um einen Stapel mit einheitlicher Länge zurückzugeben:
<frameworkcontent>
<pt>
```py
>>> pt_batch = tokenizer(
... ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
... padding=True,
... truncation=True,
... max_length=512,
... return_tensors="pt",
... )
```
</pt>
<tf>
```py
>>> tf_batch = tokenizer(
... ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
... padding=True,
... truncation=True,
... max_length=512,
... return_tensors="tf",
... )
```
</tf>
</frameworkcontent>
Lesen Sie das Tutorial [preprocessing](./preprocessing) für weitere Details zur Tokenisierung.
### AutoModel
<frameworkcontent>
<pt>
🤗 Transformers bietet eine einfache und einheitliche Möglichkeit, vortrainierte Instanzen zu laden. Das bedeutet, dass Sie ein [`AutoModel`] laden können, wie Sie einen [`AutoTokenizer`] laden würden. Der einzige Unterschied ist die Auswahl des richtigen [`AutoModel`] für die Aufgabe. Da Sie eine Text- oder Sequenzklassifizierung vornehmen, laden Sie [`AutoModelForSequenceClassification`]:
```py
>>> from transformers import AutoModelForSequenceClassification
In der [Aufgabenzusammenfassung](./task_summary) steht, welche [AutoModel]-Klasse für welche Aufgabe zu verwenden ist.
</Tip>
Jetzt können Sie Ihren vorverarbeiteten Stapel von Eingaben direkt an das Modell übergeben. Sie müssen nur das Wörterbuch entpacken, indem Sie `**` hinzufügen:
```py
>>> pt_outputs = pt_model(**pt_batch)
```
Das Modell gibt die endgültigen Aktivierungen in dem Attribut "logits" aus. Wenden Sie die Softmax-Funktion auf die "logits" an, um die Wahrscheinlichkeiten zu erhalten:
🤗 Transformers bietet eine einfache und einheitliche Methode zum Laden von vortrainierten Instanzen. Das bedeutet, dass Sie ein [`TFAutoModel`] genauso laden können, wie Sie einen [`AutoTokenizer`] laden würden. Der einzige Unterschied ist die Auswahl des richtigen [`TFAutoModel`] für die Aufgabe. Da Sie Text - oder Sequenz - Klassifizierung machen, laden Sie [`TFAutoModelForSequenceClassification`]:
```py
>>> from transformers import TFAutoModelForSequenceClassification
In der [Aufgabenzusammenfassung](./task_summary) steht, welche [AutoModel]-Klasse für welche Aufgabe zu verwenden ist.
</Tip>
Jetzt können Sie Ihren vorverarbeiteten Stapel von Eingaben direkt an das Modell übergeben, indem Sie die Wörterbuchschlüssel direkt an die Tensoren übergeben:
```py
>>> tf_outputs = tf_model(tf_batch)
```
Das Modell gibt die endgültigen Aktivierungen in dem Attribut "logits" aus. Wenden Sie die Softmax-Funktion auf die "logits" an, um die Wahrscheinlichkeiten zu erhalten:
Alle 🤗 Transformers-Modelle (PyTorch oder TensorFlow) geben die Tensoren *vor* der endgültigen Aktivierungsfunktion
Funktion (wie Softmax) aus, da die endgültige Aktivierungsfunktion oft mit dem Verlusten verschmolzen ist.
</Tip>
Modelle sind ein standardmäßiges [`torch.nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) oder ein [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model), sodass Sie sie in Ihrer üblichen Trainingsschleife verwenden können. Um jedoch die Dinge einfacher zu machen, bietet 🤗 Transformers eine [`Trainer`]-Klasse für PyTorch, die Funktionalität für verteiltes Training, gemischte Präzision und mehr bietet. Für TensorFlow können Sie die Methode `fit` aus [Keras](https://keras.io/) verwenden. Siehe das [training tutorial](./training) für weitere Details.
<Tip>
Transformers-Modellausgaben sind spezielle Datenklassen, so dass ihre Attribute in einer IDE automatisch vervollständigt werden.
Die Modellausgänge verhalten sich auch wie ein Tupel oder ein Wörterbuch (z.B. können Sie mit einem Integer, einem Slice oder einem String indexieren), wobei die Attribute, die "None" sind, ignoriert werden.
</Tip>
### Modell speichern
<frameworkcontent>
<pt>
Sobald Ihr Modell feinabgestimmt ist, können Sie es mit seinem Tokenizer speichern, indem Sie [`PreTrainedModel.save_pretrained`] verwenden:
Ein besonders cooles 🤗 Transformers-Feature ist die Möglichkeit, ein Modell zu speichern und es entweder als PyTorch- oder TensorFlow-Modell wieder zu laden. Der Parameter "from_pt" oder "from_tf" kann das Modell von einem Framework in das andere konvertieren:
Sie können die Konfigurationsklasse des Modells ändern, um zu bestimmen, wie ein Modell aufgebaut ist. Die Konfiguration legt die Attribute eines Modells fest, z. B. die Anzahl der verborgenen Schichten oder der Aufmerksamkeitsköpfe. Wenn Sie ein Modell aus einer benutzerdefinierten Konfigurationsklasse initialisieren, beginnen Sie bei Null. Die Modellattribute werden zufällig initialisiert, und Sie müssen das Modell trainieren, bevor Sie es verwenden können, um aussagekräftige Ergebnisse zu erhalten.
Beginnen Sie mit dem Import von [`AutoConfig`] und laden Sie dann das trainierte Modell, das Sie ändern möchten. Innerhalb von [`AutoConfig.from_pretrained`] können Sie das Attribut angeben, das Sie ändern möchten, z. B. die Anzahl der Aufmerksamkeitsköpfe:
Create a model from your custom configuration with [`AutoModel.from_config`]:
```py
>>> from transformers import AutoModel
>>> my_model = AutoModel.from_config(my_config)
```
</pt>
<tf>
Create a model from your custom configuration with [`TFAutoModel.from_config`]:
```py
>>> from transformers import TFAutoModel
>>> my_model = TFAutoModel.from_config(my_config)
```
</tf>
</frameworkcontent>
Weitere Informationen zur Erstellung von benutzerdefinierten Konfigurationen finden Sie in der Anleitung [Erstellen einer benutzerdefinierten Architektur](./create_a_model).
## Wie geht es weiter?
Nachdem Sie nun die 🤗 Transformers-Kurztour abgeschlossen haben, schauen Sie sich unsere Anleitungen an und erfahren Sie, wie Sie spezifischere Dinge tun können, wie das Schreiben eines benutzerdefinierten Modells, die Feinabstimmung eines Modells für eine Aufgabe und wie man ein Modell mit einem Skript trainiert. Wenn Sie mehr über die Kernkonzepte von 🤗 Transformers erfahren möchten, nehmen Sie sich eine Tasse Kaffee und werfen Sie einen Blick auf unsere konzeptionellen Leitfäden!
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Distributed training with 🤗 Accelerate
As models get bigger, parallelism has emerged as a strategy for training larger models on limited hardware and accelerating training speed by several orders of magnitude. At Hugging Face, we created the [🤗 Accelerate](https://huggingface.co/docs/accelerate) library to help users easily train a 🤗 Transformers model on any type of distributed setup, whether it is multiple GPU's on one machine or multiple GPU's across several machines. In this tutorial, learn how to customize your native PyTorch training loop to enable training in a distributed environment.
## Setup
Get started by installing 🤗 Accelerate:
```bash
pip install accelerate
```
Then import and create an [`~accelerate.Accelerator`] object. The [`~accelerate.Accelerator`] will automatically detect your type of distributed setup and initialize all the necessary components for training. You don't need to explicitly place your model on a device.
```py
>>> from accelerate import Accelerator
>>> accelerator = Accelerator()
```
## Prepare to accelerate
The next step is to pass all the relevant training objects to the [`~accelerate.Accelerator.prepare`] method. This includes your training and evaluation DataLoaders, a model and an optimizer:
- batch = {k: v.to(device) for k, v in batch.items()}
outputs = model(**batch)
loss = outputs.loss
- loss.backward()
+ accelerator.backward(loss)
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
progress_bar.update(1)
```
## Train
Once you've added the relevant lines of code, launch your training in a script or a notebook like Colaboratory.
### Train with a script
If you are running your training from a script, run the following command to create and save a configuration file:
```bash
accelerate config
```
Then launch your training with:
```bash
accelerate launch train.py
```
### Train with a notebook
🤗 Accelerate can also run in a notebook if you're planning on using Colaboratory's TPUs. Wrap all the code responsible for training in a function, and pass it to [`~accelerate.notebook_launcher`]:
```py
>>> from accelerate import notebook_launcher
>>> notebook_launcher(training_function)
```
For more information about 🤗 Accelerate and it's rich features, refer to the [documentation](https://huggingface.co/docs/accelerate).
If selected, you will then work closely with one member of the Hugging Face team to integrate the model into 🤗
Transformers. By doing so, you will both gain a theoretical and deep practical understanding of the proposed model. But
more importantly, you will have made a major open-source contribution to 🤗 Transformers. Along the way, you will:
- get insights into open-source best practices
- understand the design principles of one of the most popular NLP libraries
- learn how to do efficiently test large NLP models
- learn how to integrate Python utilities like `black`, `isort`, `make fix-copies` into a library to always
ensure clean and readable code
We are also more than happy if you want to add a model that cannot be found in the “calls-for-model-addition” folder.
The following sections explain in detail how to add a new model. It might also be very helpful to check out already
added models to see if those resemble the model you would like to add [here](https://github.com/huggingface/transformers/pulls?q=is%3Apr+label%3A%22PR+for+Model+Addition%22+is%3Aclosed).
To start, let's try to get a general overview of the Transformers library.
## General overview of 🤗 Transformers
First, you should get a general overview of 🤗 Transformers. 🤗 Transformers is a very opinionated library, so there is a
chance that you don't agree with some of the library's philosophies or design choices. From our experience, however, we
found that the fundamental design choices and philosophies of the library are crucial to efficiently scale 🤗
Transformers while keeping maintenance costs at a reasonable level.
A good first starting point to better understand the library is to read the [documentation of our philosophy](philosophy). As a result of our way of working, there are some choices that we try to apply to all models:
- Composition is generally favored over-abstraction
- Duplicating code is not always bad if it strongly improves the readability or accessibility of a model
- Model files are as self-contained as possible so that when you read the code of a specific model, you ideally only
have to look into the respective `modeling_....py` file.
In our opinion, the library's code is not just a means to provide a product, *e.g.* the ability to use BERT for
inference, but also as the very product that we want to improve. Hence, when adding a model, the user is not only the
person that will use your model, but also everybody that will read, try to understand, and possibly tweak your code.
With this in mind, let's go a bit deeper into the general library design.
### Overview of models
To successfully add a model, it is important to understand the interaction between your model and its config,
[`PreTrainedModel`], and [`PretrainedConfig`]. For exemplary purposes, we will
call the model to be added to 🤗 Transformers `BrandNewBert`.
As you can see, we do make use of inheritance in 🤗 Transformers, but we keep the level of abstraction to an absolute
minimum. There are never more than two levels of abstraction for any model in the library. `BrandNewBertModel`
inherits from `BrandNewBertPreTrainedModel` which in turn inherits from [`PreTrainedModel`] and
that's it. As a general rule, we want to make sure that a new model only depends on
[`PreTrainedModel`]. The important functionalities that are automatically provided to every new
model are [`~PreTrainedModel.from_pretrained`] and
[`~PreTrainedModel.save_pretrained`], which are used for serialization and deserialization. All of the
other important functionalities, such as `BrandNewBertModel.forward` should be completely defined in the new
`modeling_brand_new_bert.py` script. Next, we want to make sure that a model with a specific head layer, such as
`BrandNewBertForMaskedLM` does not inherit from `BrandNewBertModel`, but rather uses `BrandNewBertModel`
as a component that can be called in its forward pass to keep the level of abstraction low. Every new model requires a
configuration class, called `BrandNewBertConfig`. This configuration is always stored as an attribute in
[`PreTrainedModel`], and thus can be accessed via the `config` attribute for all classes
inheriting from `BrandNewBertPreTrainedModel`:
```python
model = BrandNewBertModel.from_pretrained("brandy/brand_new_bert")
model.config # model has access to its config
```
Similar to the model, the configuration inherits basic serialization and deserialization functionalities from
[`PretrainedConfig`]. Note that the configuration and the model are always serialized into two
different formats - the model to a *pytorch_model.bin* file and the configuration to a *config.json* file. Calling
[`~PreTrainedModel.save_pretrained`] will automatically call
[`~PretrainedConfig.save_pretrained`], so that both model and configuration are saved.
### Code style
When coding your new model, keep in mind that Transformers is an opinionated library and we have a few quirks of our
own regarding how code should be written :-)
1. The forward pass of your model should be fully written in the modeling file while being fully independent of other
models in the library. If you want to reuse a block from another model, copy the code and paste it with a
`# Copied from` comment on top (see [here](https://github.com/huggingface/transformers/blob/v4.17.0/src/transformers/models/roberta/modeling_roberta.py#L160)
for a good example).
2. The code should be fully understandable, even by a non-native English speaker. This means you should pick
descriptive variable names and avoid abbreviations. As an example, `activation` is preferred to `act`.
One-letter variable names are strongly discouraged unless it's an index in a for loop.
3. More generally we prefer longer explicit code to short magical one.
4. Avoid subclassing `nn.Sequential` in PyTorch but subclass `nn.Module` and write the forward pass, so that anyone
using your code can quickly debug it by adding print statements or breaking points.
5. Your function signature should be type-annotated. For the rest, good variable names are way more readable and
understandable than type annotations.
### Overview of tokenizers
Not quite ready yet :-( This section will be added soon!
## Step-by-step recipe to add a model to 🤗 Transformers
Everyone has different preferences of how to port a model so it can be very helpful for you to take a look at summaries
of how other contributors ported models to Hugging Face. Here is a list of community blog posts on how to port a model:
1. [Porting GPT2 Model](https://medium.com/huggingface/from-tensorflow-to-pytorch-265f40ef2a28) by [Thomas](https://huggingface.co/thomwolf)
2. [Porting WMT19 MT Model](https://huggingface.co/blog/porting-fsmt) by [Stas](https://huggingface.co/stas)
From experience, we can tell you that the most important things to keep in mind when adding a model are:
- Don't reinvent the wheel! Most parts of the code you will add for the new 🤗 Transformers model already exist
somewhere in 🤗 Transformers. Take some time to find similar, already existing models and tokenizers you can copy
from. [grep](https://www.gnu.org/software/grep/) and [rg](https://github.com/BurntSushi/ripgrep) are your
friends. Note that it might very well happen that your model's tokenizer is based on one model implementation, and
your model's modeling code on another one. *E.g.* FSMT's modeling code is based on BART, while FSMT's tokenizer code
is based on XLM.
- It's more of an engineering challenge than a scientific challenge. You should spend more time on creating an
efficient debugging environment than trying to understand all theoretical aspects of the model in the paper.
- Ask for help, when you're stuck! Models are the core component of 🤗 Transformers so that we at Hugging Face are more
than happy to help you at every step to add your model. Don't hesitate to ask if you notice you are not making
progress.
In the following, we try to give you a general recipe that we found most useful when porting a model to 🤗 Transformers.
The following list is a summary of everything that has to be done to add a model and can be used by you as a To-Do
List:
- 1. ☐ (Optional) Understood theoretical aspects
- 2. ☐ Prepared transformers dev environment
- 3. ☐ Set up debugging environment of the original repository
- 4. ☐ Created script that successfully runs forward pass using original repository and checkpoint
- 5. ☐ Successfully added the model skeleton to Transformers
- 6. ☐ Successfully converted original checkpoint to Transformers checkpoint
- 7. ☐ Successfully ran forward pass in Transformers that gives identical output to original checkpoint
- 8. ☐ Finished model tests in Transformers
- 9. ☐ Successfully added Tokenizer in Transformers
- 10. ☐ Run end-to-end integration tests
- 11. ☐ Finished docs
- 12. ☐ Uploaded model weights to the hub
- 13. ☐ Submitted the pull request
- 14. ☐ (Optional) Added a demo notebook
To begin with, we usually recommend to start by getting a good theoretical understanding of `BrandNewBert`. However,
if you prefer to understand the theoretical aspects of the model *on-the-job*, then it is totally fine to directly dive
into the `BrandNewBert`'s code-base. This option might suit you better, if your engineering skills are better than
your theoretical skill, if you have trouble understanding `BrandNewBert`'s paper, or if you just enjoy programming
much more than reading scientific papers.
### 1. (Optional) Theoretical aspects of BrandNewBert
You should take some time to read *BrandNewBert's* paper, if such descriptive work exists. There might be large
sections of the paper that are difficult to understand. If this is the case, this is fine - don't worry! The goal is
not to get a deep theoretical understanding of the paper, but to extract the necessary information required to
effectively re-implement the model in 🤗 Transformers. That being said, you don't have to spend too much time on the
theoretical aspects, but rather focus on the practical ones, namely:
- What type of model is *brand_new_bert*? BERT-like encoder-only model? GPT2-like decoder-only model? BART-like
encoder-decoder model? Look at the [model_summary](model_summary) if you're not familiar with the differences between those.
- What are the applications of *brand_new_bert*? Text classification? Text generation? Seq2Seq tasks, *e.g.,*
summarization?
- What is the novel feature of the model making it different from BERT/GPT-2/BART?
- Which of the already existing [🤗 Transformers models](https://huggingface.co/transformers/#contents) is most
similar to *brand_new_bert*?
- What type of tokenizer is used? A sentencepiece tokenizer? Word piece tokenizer? Is it the same tokenizer as used
for BERT or BART?
After you feel like you have gotten a good overview of the architecture of the model, you might want to write to the
Hugging Face team with any questions you might have. This might include questions regarding the model's architecture,
its attention layer, etc. We will be more than happy to help you.
### 2. Next prepare your environment
1. Fork the [repository](https://github.com/huggingface/transformers) by clicking on the ‘Fork' button on the
repository's page. This creates a copy of the code under your GitHub user account.
2. Clone your `transformers` fork to your local disk, and add the base repository as a remote:
We expect that every model added to 🤗 Transformers passes a couple of integration tests, meaning that the original
model and the reimplemented version in 🤗 Transformers have to give the exact same output up to a precision of 0.001!
Since it is normal that the exact same model written in different libraries can give a slightly different output
depending on the library framework, we accept an error tolerance of 1e-3 (0.001). It is not enough if the model gives
nearly the same output, they have to be the almost identical. Therefore, you will certainly compare the intermediate
outputs of the 🤗 Transformers version multiple times against the intermediate outputs of the original implementation of
*brand_new_bert* in which case an **efficient** debugging environment of the original repository is absolutely
important. Here is some advice is to make your debugging environment as efficient as possible.
- Find the best way of debugging intermediate results. Is the original repository written in PyTorch? Then you should
probably take the time to write a longer script that decomposes the original model into smaller sub-components to
retrieve intermediate values. Is the original repository written in Tensorflow 1? Then you might have to rely on
TensorFlow print operations like [tf.print](https://www.tensorflow.org/api_docs/python/tf/print) to output
intermediate values. Is the original repository written in Jax? Then make sure that the model is **not jitted** when
running the forward pass, *e.g.* check-out [this link](https://github.com/google/jax/issues/196).
- Use the smallest pretrained checkpoint you can find. The smaller the checkpoint, the faster your debug cycle
becomes. It is not efficient if your pretrained model is so big that your forward pass takes more than 10 seconds.
In case only very large checkpoints are available, it might make more sense to create a dummy model in the new
environment with randomly initialized weights and save those weights for comparison with the 🤗 Transformers version
of your model
- Make sure you are using the easiest way of calling a forward pass in the original repository. Ideally, you want to
find the function in the original repository that **only** calls a single forward pass, *i.e.* that is often called
`predict`, `evaluate`, `forward` or `__call__`. You don't want to debug a function that calls `forward`
multiple times, *e.g.* to generate text, like `autoregressive_sample`, `generate`.
- Try to separate the tokenization from the model's *forward* pass. If the original repository shows examples where
you have to input a string, then try to find out where in the forward call the string input is changed to input ids
and start from this point. This might mean that you have to possibly write a small script yourself or change the
original code so that you can directly input the ids instead of an input string.
- Make sure that the model in your debugging setup is **not** in training mode, which often causes the model to yield
random outputs due to multiple dropout layers in the model. Make sure that the forward pass in your debugging
environment is **deterministic** so that the dropout layers are not used. Or use *transformers.utils.set_seed*
if the old and new implementations are in the same framework.
The following section gives you more specific details/tips on how you can do this for *brand_new_bert*.
### 5.-14. Port BrandNewBert to 🤗 Transformers
Next, you can finally start adding new code to 🤗 Transformers. Go into the clone of your 🤗 Transformers' fork:
```bash
cd transformers
```
In the special case that you are adding a model whose architecture exactly matches the model architecture of an
existing model you only have to add a conversion script as described in [this section](#write-a-conversion-script).
In this case, you can just re-use the whole model architecture of the already existing model.
Otherwise, let's start generating a new model. You have two choices here:
- `transformers-cli add-new-model-like` to add a new model like an existing one
- `transformers-cli add-new-model` to add a new model from our template (will look like BERT or Bart depending on the type of model you select)
In both cases, you will be prompted with a questionnaire to fill the basic information of your model. The second command requires to install `cookiecutter`, you can find more information on it [here](https://github.com/huggingface/transformers/tree/main/templates/adding_a_new_model).
**Open a Pull Request on the main huggingface/transformers repo**
Before starting to adapt the automatically generated code, now is the time to open a “Work in progress (WIP)” pull
request, *e.g.* “[WIP] Add *brand_new_bert*”, in 🤗 Transformers so that you and the Hugging Face team can work
side-by-side on integrating the model into 🤗 Transformers.
You should do the following:
1. Create a branch with a descriptive name from your main branch
Now you can finally start coding :). The generated code in
`src/transformers/models/brand_new_bert/modeling_brand_new_bert.py` will either have the same architecture as BERT if
it's an encoder-only model or BART if it's an encoder-decoder model. At this point, you should remind yourself what
you've learned in the beginning about the theoretical aspects of the model: *How is the model different from BERT or
BART?*". Implement those changes which often means to change the *self-attention* layer, the order of the normalization
layer, etc… Again, it is often useful to look at the similar architecture of already existing models in Transformers to
get a better feeling of how your model should be implemented.
**Note** that at this point, you don't have to be very sure that your code is fully correct or clean. Rather, it is
advised to add a first *unclean*, copy-pasted version of the original code to
`src/transformers/models/brand_new_bert/modeling_brand_new_bert.py` until you feel like all the necessary code is
added. From our experience, it is much more efficient to quickly add a first version of the required code and
improve/correct the code iteratively with the conversion script as described in the next section. The only thing that
has to work at this point is that you can instantiate the 🤗 Transformers implementation of *brand_new_bert*, *i.e.* the
following command should work:
```python
from transformers import BrandNewBertModel, BrandNewBertConfig
model = BrandNewBertModel(BrandNewBertConfig())
```
The above command will create a model according to the default parameters as defined in `BrandNewBertConfig()` with
random weights, thus making sure that the `init()` methods of all components works.
**6. Write a conversion script**
Next, you should write a conversion script that lets you convert the checkpoint you used to debug *brand_new_bert* in
the original repository to a checkpoint compatible with your just created 🤗 Transformers implementation of
*brand_new_bert*. It is not advised to write the conversion script from scratch, but rather to look through already
existing conversion scripts in 🤗 Transformers for one that has been used to convert a similar model that was written in
the same framework as *brand_new_bert*. Usually, it is enough to copy an already existing conversion script and
slightly adapt it for your use case. Don't hesitate to ask the Hugging Face team to point you to a similar already
existing conversion script for your model.
- If you are porting a model from TensorFlow to PyTorch, a good starting point might be BERT's conversion script [here](https://github.com/huggingface/transformers/blob/7acfa95afb8194f8f9c1f4d2c6028224dbed35a2/src/transformers/models/bert/modeling_bert.py#L91)
- If you are porting a model from PyTorch to PyTorch, a good starting point might be BART's conversion script [here](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bart/convert_bart_original_pytorch_checkpoint_to_pytorch.py)
In the following, we'll quickly explain how PyTorch models store layer weights and define layer names. In PyTorch, the
name of a layer is defined by the name of the class attribute you give the layer. Let's define a dummy model in
PyTorch, called `SimpleModel` as follows:
```python
from torch import nn
class SimpleModel(nn.Module):
def __init__(self):
super().__init__()
self.dense = nn.Linear(10, 10)
self.intermediate = nn.Linear(10, 10)
self.layer_norm = nn.LayerNorm(10)
```
Now we can create an instance of this model definition which will fill all weights: `dense`, `intermediate`,
`layer_norm` with random weights. We can print the model to see its architecture
Having managed to correctly load the pretrained weights into the 🤗 Transformers implementation, you should now make
sure that the forward pass is correctly implemented. In [Get familiar with the original repository](#run-a-pretrained-checkpoint-using-the-original-repository), you have already created a script that runs a forward
pass of the model using the original repository. Now you should write an analogous script using the 🤗 Transformers
implementation instead of the original one. It should look as follows:
```python
model = BrandNewBertModel.from_pretrained("/path/to/converted/checkpoint/folder")
input_ids = [0, 4, 4, 3, 2, 4, 1, 7, 19]
output = model(input_ids).last_hidden_states
```
It is very likely that the 🤗 Transformers implementation and the original model implementation don't give the exact
same output the very first time or that the forward pass throws an error. Don't be disappointed - it's expected! First,
you should make sure that the forward pass doesn't throw any errors. It often happens that the wrong dimensions are
used leading to a *Dimensionality mismatch* error or that the wrong data type object is used, *e.g.* `torch.long`
instead of `torch.float32`. Don't hesitate to ask the Hugging Face team for help, if you don't manage to solve
certain errors.
The final part to make sure the 🤗 Transformers implementation works correctly is to ensure that the outputs are
equivalent to a precision of `1e-3`. First, you should ensure that the output shapes are identical, *i.e.*
`outputs.shape` should yield the same value for the script of the 🤗 Transformers implementation and the original
implementation. Next, you should make sure that the output values are identical as well. This one of the most difficult
parts of adding a new model. Common mistakes why the outputs are not identical are:
- Some layers were not added, *i.e.* an *activation* layer was not added, or the residual connection was forgotten
- The word embedding matrix was not tied
- The wrong positional embeddings are used because the original implementation uses on offset
- Dropout is applied during the forward pass. To fix this make sure *model.training is False* and that no dropout
layer is falsely activated during the forward pass, *i.e.* pass *self.training* to [PyTorch's functional dropout](https://pytorch.org/docs/stable/nn.functional.html?highlight=dropout#torch.nn.functional.dropout)
The best way to fix the problem is usually to look at the forward pass of the original implementation and the 🤗
Transformers implementation side-by-side and check if there are any differences. Ideally, you should debug/print out
intermediate outputs of both implementations of the forward pass to find the exact position in the network where the 🤗
Transformers implementation shows a different output than the original implementation. First, make sure that the
hard-coded `input_ids` in both scripts are identical. Next, verify that the outputs of the first transformation of
the `input_ids` (usually the word embeddings) are identical. And then work your way up to the very last layer of the
network. At some point, you will notice a difference between the two implementations, which should point you to the bug
in the 🤗 Transformers implementation. From our experience, a simple and efficient way is to add many print statements
in both the original implementation and 🤗 Transformers implementation, at the same positions in the network
respectively, and to successively remove print statements showing the same values for intermediate presentations.
When you're confident that both implementations yield the same output, verifying the outputs with
`torch.allclose(original_output, output, atol=1e-3)`, you're done with the most difficult part! Congratulations - the
work left to be done should be a cakewalk 😊.
**8. Adding all necessary model tests**
At this point, you have successfully added a new model. However, it is very much possible that the model does not yet
fully comply with the required design. To make sure, the implementation is fully compatible with 🤗 Transformers, all
common tests should pass. The Cookiecutter should have automatically added a test file for your model, probably under
the same `tests/test_modeling_brand_new_bert.py`. Run this test file to verify that all common tests pass:
```bash
pytest tests/test_modeling_brand_new_bert.py
```
Having fixed all common tests, it is now crucial to ensure that all the nice work you have done is well tested, so that
- a) The community can easily understand your work by looking at specific tests of *brand_new_bert*
- b) Future changes to your model will not break any important feature of the model.
At first, integration tests should be added. Those integration tests essentially do the same as the debugging scripts
you used earlier to implement the model to 🤗 Transformers. A template of those model tests is already added by the
Cookiecutter, called `BrandNewBertModelIntegrationTests` and only has to be filled out by you. To ensure that those
When both `input_ids` yield the same values, as a final step a tokenizer test file should also be added.
Analogous to the modeling test files of *brand_new_bert*, the tokenization test files of *brand_new_bert* should
contain a couple of hard-coded integration tests.
**10. Run End-to-end integration tests**
Having added the tokenizer, you should also add a couple of end-to-end integration tests using both the model and the
tokenizer to `tests/test_modeling_brand_new_bert.py` in 🤗 Transformers. Such a test should show on a meaningful
text-to-text sample that the 🤗 Transformers implementation works as expected. A meaningful text-to-text sample can
include *e.g.* a source-to-target-translation pair, an article-to-summary pair, a question-to-answer pair, etc… If none
of the ported checkpoints has been fine-tuned on a downstream task it is enough to simply rely on the model tests. In a
final step to ensure that the model is fully functional, it is advised that you also run all tests on GPU. It can
happen that you forgot to add some `.to(self.device)` statements to internal tensors of the model, which in such a
test would show in an error. In case you have no access to a GPU, the Hugging Face team can take care of running those
tests for you.
**11. Add Docstring**
Now, all the necessary functionality for *brand_new_bert* is added - you're almost done! The only thing left to add is
a nice docstring and a doc page. The Cookiecutter should have added a template file called
`docs/source/model_doc/brand_new_bert.rst` that you should fill out. Users of your model will usually first look at
this page before using your model. Hence, the documentation must be understandable and concise. It is very useful for
the community to add some *Tips* to show how the model should be used. Don't hesitate to ping the Hugging Face team
regarding the docstrings.
Next, make sure that the docstring added to `src/transformers/models/brand_new_bert/modeling_brand_new_bert.py` is
correct and included all necessary inputs and outputs. We have a detailed guide about writing documentation and our docstring format [here](writing-documentation). It is always to good to remind oneself that documentation should
be treated at least as carefully as the code in 🤗 Transformers since the documentation is usually the first contact
point of the community with the model.
**Code refactor**
Great, now you have added all the necessary code for *brand_new_bert*. At this point, you should correct some potential
incorrect code style by running:
```bash
make style
```
and verify that your coding style passes the quality check:
```bash
make quality
```
There are a couple of other very strict design tests in 🤗 Transformers that might still be failing, which shows up in
the tests of your pull request. This is often because of some missing information in the docstring or some incorrect
naming. The Hugging Face team will surely help you if you're stuck here.
Lastly, it is always a good idea to refactor one's code after having ensured that the code works correctly. With all
tests passing, now it's a good time to go over the added code again and do some refactoring.
You have now finished the coding part, congratulation! 🎉 You are Awesome! 😎
**12. Upload the models to the model hub**
In this final part, you should convert and upload all checkpoints to the model hub and add a model card for each
uploaded model checkpoint. You can get familiar with the hub functionalities by reading our [Model sharing and uploading Page](model_sharing). You should work alongside the Hugging Face team here to decide on a fitting name for each
checkpoint and to get the required access rights to be able to upload the model under the author's organization of
*brand_new_bert*. The `push_to_hub` method, present in all models in `transformers`, is a quick and efficient way to push your checkpoint to the hub. A little snippet is pasted below:
```python
brand_new_bert.push_to_hub("brand_new_bert")
# Uncomment the following line to push to an organization.
Try to keep the inputs/outputs very simple and ideally JSON-serializable as it makes the pipeline usage very easy
without requiring users to understand new kind of objects. It's also relatively common to support many different types
of arguments for ease of use (audio files, can be filenames, URLs or pure bytes)
## Adding it to the list of supported tasks
To register your `new-task` to the list of supported tasks, you have to add it to the `PIPELINE_REGISTRY`:
```python
from transformers.pipelines import PIPELINE_REGISTRY
PIPELINE_REGISTRY.register_pipeline(
"new-task",
pipeline_class=MyPipeline,
pt_model=AutoModelForSequenceClassification,
)
```
You can specify a default model if you want, in which case it should come with a specific revision (which can be the name of a branch or a commit hash, here we took `"abcdef"`) as well was the type:
```python
PIPELINE_REGISTRY.register_pipeline(
"new-task",
pipeline_class=MyPipeline,
pt_model=AutoModelForSequenceClassification,
default={"pt": ("user/awesome_model", "abcdef")},
type="text", # current support type: text, audio, image, multimodal
)
```
## Share your pipeline on the Hub
To share your custom pipeline on the Hub, you just have to save the custom code of your `Pipeline` subclass in a
python file. For instance, let's say we want to use a custom pipeline for sentence pair classification like this:
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Load pretrained instances with an AutoClass
With so many different Transformer architectures, it can be challenging to create one for your checkpoint. As a part of 🤗 Transformers core philosophy to make the library easy, simple and flexible to use, an `AutoClass` automatically infer and load the correct architecture from a given checkpoint. The `from_pretrained` method lets you quickly load a pretrained model for any architecture so you don't have to devote time and resources to train a model from scratch. Producing this type of checkpoint-agnostic code means if your code works for one checkpoint, it will work with another checkpoint - as long as it was trained for a similar task - even if the architecture is different.
<Tip>
Remember, architecture refers to the skeleton of the model and checkpoints are the weights for a given architecture. For example, [BERT](https://huggingface.co/bert-base-uncased) is an architecture, while `bert-base-uncased` is a checkpoint. Model is a general term that can mean either architecture or checkpoint.
</Tip>
In this tutorial, learn to:
* Load a pretrained tokenizer.
* Load a pretrained feature extractor.
* Load a pretrained processor.
* Load a pretrained model.
## AutoTokenizer
Nearly every NLP task begins with a tokenizer. A tokenizer converts your input into a format that can be processed by the model.
Load a tokenizer with [`AutoTokenizer.from_pretrained`]:
Multimodal tasks require a processor that combines two types of preprocessing tools. For example, the [LayoutLMV2](model_doc/layoutlmv2) model requires a feature extractor to handle images and a tokenizer to handle text; a processor combines both of them.
Load a processor with [`AutoProcessor.from_pretrained`]:
Finally, the `AutoModelFor` classes let you load a pretrained model for a given task (see [here](model_doc/auto) for a complete list of available tasks). For example, load a model for sequence classification with [`AutoModelForSequenceClassification.from_pretrained`]:
```py
>>> from transformers import AutoModelForSequenceClassification
>>> model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
```
Easily reuse the same checkpoint to load an architecture for a different task:
```py
>>> from transformers import AutoModelForTokenClassification
>>> model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased")
```
Generally, we recommend using the `AutoTokenizer` class and the `AutoModelFor` class to load pretrained instances of models. This will ensure you load the correct architecture every time. In the next [tutorial](preprocessing), learn how to use your newly loaded tokenizer, feature extractor and processor to preprocess a dataset for fine-tuning.
</pt>
<tf>
Finally, the `TFAutoModelFor` classes let you load a pretrained model for a given task (see [here](model_doc/auto) for a complete list of available tasks). For example, load a model for sequence classification with [`TFAutoModelForSequenceClassification.from_pretrained`]:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
```
Easily reuse the same checkpoint to load an architecture for a different task:
```py
>>> from transformers import TFAutoModelForTokenClassification
>>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert-base-uncased")
```
Generally, we recommend using the `AutoTokenizer` class and the `TFAutoModelFor` class to load pretrained instances of models. This will ensure you load the correct architecture every time. In the next [tutorial](preprocessing), learn how to use your newly loaded tokenizer, feature extractor and processor to preprocess a dataset for fine-tuning.
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Benchmarks
<Tip warning={true}>
Hugging Face's Benchmarking tools are deprecated and it is advised to use external Benchmarking libraries to measure the speed
and memory complexity of Transformer models.
</Tip>
[[open-in-colab]]
Let's take a look at how 🤗 Transformers models can be benchmarked, best practices, and already available benchmarks.
A notebook explaining in more detail how to benchmark 🤗 Transformers models can be found [here](https://github.com/huggingface/notebooks/tree/main/examples/benchmark.ipynb).
## How to benchmark 🤗 Transformers models
The classes [`PyTorchBenchmark`] and [`TensorFlowBenchmark`] allow to flexibly benchmark 🤗 Transformers models. The benchmark classes allow us to measure the _peak memory usage_ and _required time_ for both _inference_ and _training_.
<Tip>
Hereby, _inference_ is defined by a single forward pass, and _training_ is defined by a single forward pass and
backward pass.
</Tip>
The benchmark classes [`PyTorchBenchmark`] and [`TensorFlowBenchmark`] expect an object of type [`PyTorchBenchmarkArguments`] and
[`TensorFlowBenchmarkArguments`], respectively, for instantiation. [`PyTorchBenchmarkArguments`] and [`TensorFlowBenchmarkArguments`] are data classes and contain all relevant configurations for their corresponding benchmark class. In the following example, it is shown how a BERT model of type _bert-base-cased_ can be benchmarked.
<frameworkcontent>
<pt>
```py
>>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments
==================== ENVIRONMENT INFORMATION ====================
- transformers_version: 2.11.0
- framework: Tensorflow
- use_xla: False
- framework_version: 2.2.0
- python_version: 3.6.10
- system: Linux
- cpu: x86_64
- architecture: 64bit
- date: 2020-06-29
- time: 09:38:15.487125
- fp16: False
- use_multiprocessing: True
- only_pretrain_model: False
- cpu_ram_mb: 32088
- use_gpu: True
- num_gpus: 1
- gpu: TITAN RTX
- gpu_ram_mb: 24217
- gpu_power_watts: 280.0
- gpu_performance_state: 2
- use_tpu: False
```
</tf>
</frameworkcontent>
Again, _inference time_ and _required memory_ for _inference_ are measured, but this time for customized configurations
of the `BertModel` class. This feature can especially be helpful when deciding for which configuration the model
should be trained.
## Benchmark best practices
This section lists a couple of best practices one should be aware of when benchmarking a model.
- Currently, only single device benchmarking is supported. When benchmarking on GPU, it is recommended that the user
specifies on which device the code should be run by setting the `CUDA_VISIBLE_DEVICES` environment variable in the
shell, _e.g._ `export CUDA_VISIBLE_DEVICES=0` before running the code.
- The option `no_multi_processing` should only be set to `True` for testing and debugging. To ensure accurate
memory measurement it is recommended to run each memory benchmark in a separate process by making sure
`no_multi_processing` is set to `True`.
- One should always state the environment information when sharing the results of a model benchmark. Results can vary
heavily between different GPU devices, library versions, etc., so that benchmark results on their own are not very
useful for the community.
## Sharing your benchmark
Previously all available core models (10 at the time) have been benchmarked for _inference time_, across many different
settings: using PyTorch, with and without TorchScript, using TensorFlow, with and without XLA. All of those tests were
done across CPUs (except for TensorFlow XLA) and GPUs.
The approach is detailed in the [following blogpost](https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2) and the results are
available [here](https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit?usp=sharing).
With the new _benchmark_ tools, it is easier than ever to share your benchmark results with the community
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# BERTology
There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT
(that some call "BERTology"). Some good examples of this field are:
- BERT Rediscovers the Classical NLP Pipeline by Ian Tenney, Dipanjan Das, Ellie Pavlick:
https://arxiv.org/abs/1905.05950
- Are Sixteen Heads Really Better than One? by Paul Michel, Omer Levy, Graham Neubig: https://arxiv.org/abs/1905.10650
- What Does BERT Look At? An Analysis of BERT's Attention by Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D.
Manning: https://arxiv.org/abs/1906.04341
In order to help this new field develop, we have included a few additional features in the BERT/GPT/GPT-2 models to
help people access the inner representations, mainly adapted from the great work of Paul Michel
(https://arxiv.org/abs/1905.10650):
- accessing all the hidden-states of BERT/GPT/GPT-2,
- accessing all the attention weights for each head of BERT/GPT/GPT-2,
- retrieving heads output values and gradients to be able to compute head importance score and prune head as explained
in https://arxiv.org/abs/1905.10650.
To help you understand and use these features, we have added a specific example script: [bertology.py](https://github.com/huggingface/transformers/tree/main/examples/research_projects/bertology/run_bertology.py) while extract information and prune a model pre-trained on
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Instantiating a big model
When you want to use a very big pretrained model, one challenge is to minimize the use of the RAM. The usual workflow
from PyTorch is:
1. Create your model with random weights.
2. Load your pretrained weights.
3. Put those pretrained weights in your random model.
Step 1 and 2 both require a full version of the model in memory, which is not a problem in most cases, but if your model starts weighing several GigaBytes, those two copies can make you got our of RAM. Even worse, if you are using `torch.distributed` to launch a distributed training, each process will load the pretrained model and store these two copies in RAM.
<Tip>
Note that the randomly created model is initialized with "empty" tensors, which take the space in memory without filling it (thus the random values are whatever was in this chunk of memory at a given time). The random initialization following the appropriate distribution for the kind of model/parameters instatiated (like a normal distribution for instance) is only performed after step 3 on the non-initialized weights, to be as fast as possible!
</Tip>
In this guide, we explore the solutions Transformers offer to deal with this issue. Note that this is an area of active development, so the APIs explained here may change slightly in the future.
## Sharded checkpoints
Since version 4.18.0, model checkpoints that end up taking more than 10GB of space are automatically sharded in smaller pieces. In terms of having one single checkpoint when you do `model.save_pretrained(save_dir)`, you will end up with several partial checkpoints (each of which being of size < 10GB) and an index that maps parameter names to the files they are stored in.
You can control the maximum size before sharding with the `max_shard_size` parameter, so for the sake of an example, we'll use a normal-size models with a small shard size: let's take a traditional BERT model.
```py
from transformers import AutoModel
model = AutoModel.from_pretrained("bert-base-cased")
```
If you save it using [`~PreTrainedModel.save_pretrained`], you will get a new folder with two files: the config of the model and its weights:
```py
>>> import os
>>> import tempfile
>>> with tempfile.TemporaryDirectory() as tmp_dir:
... model.save_pretrained(tmp_dir)
... print(sorted(os.listdir(tmp_dir)))
['config.json', 'pytorch_model.bin']
```
Now let's use a maximum shard size of 200MB:
```py
>>> with tempfile.TemporaryDirectory() as tmp_dir:
On top of the configuration of the model, we see three different weights files, and an `index.json` file which is our index. A checkpoint like this can be fully reloaded using the [`~PreTrainedModel.from_pretrained`] method:
```py
>>> with tempfile.TemporaryDirectory() as tmp_dir:
The main advantage of doing this for big models is that during step 2 of the workflow shown above, each shard of the checkpoint is loaded after the previous one, capping the memory usage in RAM to the model size plus the size of the biggest shard.
Beind the scenes, the index file is used to determine which keys are in the checkpoint, and where the corresponding weights are stored. We can load that index like any json and get a dictionary:
```py
>>> import json
>>> with tempfile.TemporaryDirectory() as tmp_dir:
... with open(os.path.join(tmp_dir, "pytorch_model.bin.index.json"), "r") as f:
... index = json.load(f)
>>> print(index.keys())
dict_keys(['metadata', 'weight_map'])
```
The metadata just consists of the total size of the model for now. We plan to add several other informations in the future:
```py
>>> index["metadata"]
{'total_size': 433245184}
```
The weights map is the main part of this index, which maps each parameter name (as usually found in a PyTorch model `state_dict`) to the file it's stored in:
If you want to directly load such a sharded checkpoint inside a model without using [`~PreTrainedModel.from_pretrained`] (like you would do `model.load_state_dict()` for a full checkpoint) you should use [`~modeling_utils.load_sharded_checkpoint`]:
```py
>>> from transformers.modeling_utils import load_sharded_checkpoint
>>> with tempfile.TemporaryDirectory() as tmp_dir:
Sharded checkpoints reduce the memory usage during step 2 of the workflow mentioned above, but in order to use that model in a low memory setting, we recommend leveraging our tools based on the Accelerate library.
Please read the following guide for more information: [Large model loading using Accelerate](./main_classes/model#large-model-loading)
This page regroups resources around 🤗 Transformers developed by the community.
## Community resources:
| Resource | Description | Author |
|:----------|:-------------|------:|
| [Hugging Face Transformers Glossary Flashcards](https://www.darigovresearch.com/huggingface-transformers-glossary-flashcards) | A set of flashcards based on the [Transformers Docs Glossary](glossary) that has been put into a form which can be easily learnt/revised using [Anki ](https://apps.ankiweb.net/) an open source, cross platform app specifically designed for long term knowledge retention. See this [Introductory video on how to use the flashcards](https://www.youtube.com/watch?v=Dji_h7PILrw). | [Darigov Research](https://www.darigovresearch.com/) |
| [Fine-tune a pre-trained Transformer to generate lyrics](https://github.com/AlekseyKorshuk/huggingartists) | How to generate lyrics in the style of your favorite artist by fine-tuning a GPT-2 model | [Aleksey Korshuk](https://github.com/AlekseyKorshuk) | [](https://colab.research.google.com/github/AlekseyKorshuk/huggingartists/blob/master/huggingartists-demo.ipynb) |
| [Train T5 in Tensorflow 2 ](https://github.com/snapthat/TF-T5-text-to-text) | How to train T5 for any task using Tensorflow 2. This notebook demonstrates a Question & Answer task implemented in Tensorflow 2 using SQUAD | [Muhammad Harris](https://github.com/HarrisDePerceptron) |[](https://colab.research.google.com/github/snapthat/TF-T5-text-to-text/blob/master/snapthatT5/notebooks/TF-T5-Datasets%20Training.ipynb) |
| [Train T5 on TPU](https://github.com/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb) | How to train T5 on SQUAD with Transformers and Nlp | [Suraj Patil](https://github.com/patil-suraj) |[](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb#scrollTo=QLGiFCDqvuil) |
| [Fine-tune T5 for Classification and Multiple Choice](https://github.com/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) | How to fine-tune T5 for classification and multiple choice tasks using a text-to-text format with PyTorch Lightning | [Suraj Patil](https://github.com/patil-suraj) | [](https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/t5_fine_tuning.ipynb) |
| [Fine-tune DialoGPT on New Datasets and Languages](https://github.com/ncoop57/i-am-a-nerd/blob/master/_notebooks/2020-05-12-chatbot-part-1.ipynb) | How to fine-tune the DialoGPT model on a new dataset for open-dialog conversational chatbots | [Nathan Cooper](https://github.com/ncoop57) | [](https://colab.research.google.com/github/ncoop57/i-am-a-nerd/blob/master/_notebooks/2020-05-12-chatbot-part-1.ipynb) |
| [Long Sequence Modeling with Reformer](https://github.com/patrickvonplaten/notebooks/blob/master/PyTorch_Reformer.ipynb) | How to train on sequences as long as 500,000 tokens with Reformer | [Patrick von Platen](https://github.com/patrickvonplaten) | [](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/PyTorch_Reformer.ipynb) |
| [Fine-tune BART for Summarization](https://github.com/ohmeow/ohmeow_website/blob/master/_notebooks/2020-05-23-text-generation-with-blurr.ipynb) | How to fine-tune BART for summarization with fastai using blurr | [Wayde Gilliam](https://ohmeow.com/) | [](https://colab.research.google.com/github/ohmeow/ohmeow_website/blob/master/_notebooks/2020-05-23-text-generation-with-blurr.ipynb) |
| [Fine-tune a pre-trained Transformer on anyone's tweets](https://colab.research.google.com/github/borisdayma/huggingtweets/blob/master/huggingtweets-demo.ipynb) | How to generate tweets in the style of your favorite Twitter account by fine-tuning a GPT-2 model | [Boris Dayma](https://github.com/borisdayma) | [](https://colab.research.google.com/github/borisdayma/huggingtweets/blob/master/huggingtweets-demo.ipynb) |
| [Optimize 🤗 Hugging Face models with Weights & Biases](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/huggingface/Optimize_Hugging_Face_models_with_Weights_%26_Biases.ipynb) | A complete tutorial showcasing W&B integration with Hugging Face | [Boris Dayma](https://github.com/borisdayma) | [](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/huggingface/Optimize_Hugging_Face_models_with_Weights_%26_Biases.ipynb) |
| [Pretrain Longformer](https://github.com/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb) | How to build a "long" version of existing pretrained models | [Iz Beltagy](https://beltagy.net) | [](https://colab.research.google.com/github/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb) |
| [Fine-tune Longformer for QA](https://github.com/patil-suraj/Notebooks/blob/master/longformer_qa_training.ipynb) | How to fine-tune longformer model for QA task | [Suraj Patil](https://github.com/patil-suraj) | [](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/longformer_qa_training.ipynb) |
| [Evaluate Model with 🤗nlp](https://github.com/patrickvonplaten/notebooks/blob/master/How_to_evaluate_Longformer_on_TriviaQA_using_NLP.ipynb) | How to evaluate longformer on TriviaQA with `nlp` | [Patrick von Platen](https://github.com/patrickvonplaten) | [](https://colab.research.google.com/drive/1m7eTGlPmLRgoPkkA7rkhQdZ9ydpmsdLE?usp=sharing) |
| [Fine-tune T5 for Sentiment Span Extraction](https://github.com/enzoampil/t5-intro/blob/master/t5_qa_training_pytorch_span_extraction.ipynb) | How to fine-tune T5 for sentiment span extraction using a text-to-text format with PyTorch Lightning | [Lorenzo Ampil](https://github.com/enzoampil) | [](https://colab.research.google.com/github/enzoampil/t5-intro/blob/master/t5_qa_training_pytorch_span_extraction.ipynb) |
| [Fine-tune DistilBert for Multiclass Classification](https://github.com/abhimishra91/transformers-tutorials/blob/master/transformers_multiclass_classification.ipynb) | How to fine-tune DistilBert for multiclass classification with PyTorch | [Abhishek Kumar Mishra](https://github.com/abhimishra91) | [](https://colab.research.google.com/github/abhimishra91/transformers-tutorials/blob/master/transformers_multiclass_classification.ipynb)|
|[Fine-tune BERT for Multi-label Classification](https://github.com/abhimishra91/transformers-tutorials/blob/master/transformers_multi_label_classification.ipynb)|How to fine-tune BERT for multi-label classification using PyTorch|[Abhishek Kumar Mishra](https://github.com/abhimishra91) |[](https://colab.research.google.com/github/abhimishra91/transformers-tutorials/blob/master/transformers_multi_label_classification.ipynb)|
|[Fine-tune T5 for Summarization](https://github.com/abhimishra91/transformers-tutorials/blob/master/transformers_summarization_wandb.ipynb)|How to fine-tune T5 for summarization in PyTorch and track experiments with WandB|[Abhishek Kumar Mishra](https://github.com/abhimishra91) |[](https://colab.research.google.com/github/abhimishra91/transformers-tutorials/blob/master/transformers_summarization_wandb.ipynb)|
|[Speed up Fine-Tuning in Transformers with Dynamic Padding / Bucketing](https://github.com/ELS-RD/transformers-notebook/blob/master/Divide_Hugging_Face_Transformers_training_time_by_2_or_more.ipynb)|How to speed up fine-tuning by a factor of 2 using dynamic padding / bucketing|[Michael Benesty](https://github.com/pommedeterresautee) |[](https://colab.research.google.com/drive/1CBfRU1zbfu7-ijiOqAAQUA-RJaxfcJoO?usp=sharing)|
|[Pretrain Reformer for Masked Language Modeling](https://github.com/patrickvonplaten/notebooks/blob/master/Reformer_For_Masked_LM.ipynb)| How to train a Reformer model with bi-directional self-attention layers | [Patrick von Platen](https://github.com/patrickvonplaten) | [](https://colab.research.google.com/drive/1tzzh0i8PgDQGV3SMFUGxM7_gGae3K-uW?usp=sharing)|
|[Expand and Fine Tune Sci-BERT](https://github.com/lordtt13/word-embeddings/blob/master/COVID-19%20Research%20Data/COVID-SciBERT.ipynb)| How to increase vocabulary of a pretrained SciBERT model from AllenAI on the CORD dataset and pipeline it. | [Tanmay Thakur](https://github.com/lordtt13) | [](https://colab.research.google.com/drive/1rqAR40goxbAfez1xvF3hBJphSCsvXmh8)|
|[Fine Tune BlenderBotSmall for Summarization using the Trainer API](https://github.com/lordtt13/transformers-experiments/blob/master/Custom%20Tasks/fine-tune-blenderbot_small-for-summarization.ipynb)| How to fine tune BlenderBotSmall for summarization on a custom dataset, using the Trainer API. | [Tanmay Thakur](https://github.com/lordtt13) | [](https://colab.research.google.com/drive/19Wmupuls7mykSGyRN_Qo6lPQhgp56ymq?usp=sharing)|
|[Fine-tune Electra and interpret with Integrated Gradients](https://github.com/elsanns/xai-nlp-notebooks/blob/master/electra_fine_tune_interpret_captum_ig.ipynb) | How to fine-tune Electra for sentiment analysis and interpret predictions with Captum Integrated Gradients | [Eliza Szczechla](https://elsanns.github.io) | [](https://colab.research.google.com/github/elsanns/xai-nlp-notebooks/blob/master/electra_fine_tune_interpret_captum_ig.ipynb)|
|[fine-tune a non-English GPT-2 Model with Trainer class](https://github.com/philschmid/fine-tune-GPT-2/blob/master/Fine_tune_a_non_English_GPT_2_Model_with_Huggingface.ipynb) | How to fine-tune a non-English GPT-2 Model with Trainer class | [Philipp Schmid](https://www.philschmid.de) | [](https://colab.research.google.com/github/philschmid/fine-tune-GPT-2/blob/master/Fine_tune_a_non_English_GPT_2_Model_with_Huggingface.ipynb)|
|[Fine-tune a DistilBERT Model for Multi Label Classification task](https://github.com/DhavalTaunk08/Transformers_scripts/blob/master/Transformers_multilabel_distilbert.ipynb) | How to fine-tune a DistilBERT Model for Multi Label Classification task | [Dhaval Taunk](https://github.com/DhavalTaunk08) | [](https://colab.research.google.com/github/DhavalTaunk08/Transformers_scripts/blob/master/Transformers_multilabel_distilbert.ipynb)|
|[Fine-tune ALBERT for sentence-pair classification](https://github.com/NadirEM/nlp-notebooks/blob/master/Fine_tune_ALBERT_sentence_pair_classification.ipynb) | How to fine-tune an ALBERT model or another BERT-based model for the sentence-pair classification task | [Nadir El Manouzi](https://github.com/NadirEM) | [](https://colab.research.google.com/github/NadirEM/nlp-notebooks/blob/master/Fine_tune_ALBERT_sentence_pair_classification.ipynb)|
|[Fine-tune Roberta for sentiment analysis](https://github.com/DhavalTaunk08/NLP_scripts/blob/master/sentiment_analysis_using_roberta.ipynb) | How to fine-tune a Roberta model for sentiment analysis | [Dhaval Taunk](https://github.com/DhavalTaunk08) | [](https://colab.research.google.com/github/DhavalTaunk08/NLP_scripts/blob/master/sentiment_analysis_using_roberta.ipynb)|
|[Evaluating Question Generation Models](https://github.com/flexudy-pipe/qugeev) | How accurate are the answers to questions generated by your seq2seq transformer model? | [Pascal Zoleko](https://github.com/zolekode) | [](https://colab.research.google.com/drive/1bpsSqCQU-iw_5nNoRm_crPq6FRuJthq_?usp=sharing)|
|[Classify text with DistilBERT and Tensorflow](https://github.com/peterbayerle/huggingface_notebook/blob/main/distilbert_tf.ipynb) | How to fine-tune DistilBERT for text classification in TensorFlow | [Peter Bayerle](https://github.com/peterbayerle) | [](https://colab.research.google.com/github/peterbayerle/huggingface_notebook/blob/main/distilbert_tf.ipynb)|
|[Leverage BERT for Encoder-Decoder Summarization on CNN/Dailymail](https://github.com/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb) | How to warm-start a *EncoderDecoderModel* with a *bert-base-uncased* checkpoint for summarization on CNN/Dailymail | [Patrick von Platen](https://github.com/patrickvonplaten) | [](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/BERT2BERT_for_CNN_Dailymail.ipynb)|
|[Leverage RoBERTa for Encoder-Decoder Summarization on BBC XSum](https://github.com/patrickvonplaten/notebooks/blob/master/RoBERTaShared_for_BBC_XSum.ipynb) | How to warm-start a shared *EncoderDecoderModel* with a *roberta-base* checkpoint for summarization on BBC/XSum | [Patrick von Platen](https://github.com/patrickvonplaten) | [](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/RoBERTaShared_for_BBC_XSum.ipynb)|
|[Fine-tune TAPAS on Sequential Question Answering (SQA)](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/TAPAS/Fine_tuning_TapasForQuestionAnswering_on_SQA.ipynb) | How to fine-tune *TapasForQuestionAnswering* with a *tapas-base* checkpoint on the Sequential Question Answering (SQA) dataset | [Niels Rogge](https://github.com/nielsrogge) | [](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/TAPAS/Fine_tuning_TapasForQuestionAnswering_on_SQA.ipynb)|
|[Evaluate TAPAS on Table Fact Checking (TabFact)](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/TAPAS/Evaluating_TAPAS_on_the_Tabfact_test_set.ipynb) | How to evaluate a fine-tuned *TapasForSequenceClassification* with a *tapas-base-finetuned-tabfact* checkpoint using a combination of the 🤗 datasets and 🤗 transformers libraries | [Niels Rogge](https://github.com/nielsrogge) | [](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/TAPAS/Evaluating_TAPAS_on_the_Tabfact_test_set.ipynb)|
|[Fine-tuning mBART for translation](https://colab.research.google.com/github/vasudevgupta7/huggingface-tutorials/blob/main/translation_training.ipynb) | How to fine-tune mBART using Seq2SeqTrainer for Hindi to English translation | [Vasudev Gupta](https://github.com/vasudevgupta7) | [](https://colab.research.google.com/github/vasudevgupta7/huggingface-tutorials/blob/main/translation_training.ipynb)|
|[Fine-tune LayoutLM on FUNSD (a form understanding dataset)](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb) | How to fine-tune *LayoutLMForTokenClassification* on the FUNSD dataset for information extraction from scanned documents | [Niels Rogge](https://github.com/nielsrogge) | [](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb)|
|[Fine-Tune DistilGPT2 and Generate Text](https://colab.research.google.com/github/tripathiaakash/DistilGPT2-Tutorial/blob/main/distilgpt2_fine_tuning.ipynb) | How to fine-tune DistilGPT2 and generate text | [Aakash Tripathi](https://github.com/tripathiaakash) | [](https://colab.research.google.com/github/tripathiaakash/DistilGPT2-Tutorial/blob/main/distilgpt2_fine_tuning.ipynb)|
|[Fine-Tune LED on up to 8K tokens](https://github.com/patrickvonplaten/notebooks/blob/master/Fine_tune_Longformer_Encoder_Decoder_(LED)_for_Summarization_on_pubmed.ipynb) | How to fine-tune LED on pubmed for long-range summarization | [Patrick von Platen](https://github.com/patrickvonplaten) | [](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_tune_Longformer_Encoder_Decoder_(LED)_for_Summarization_on_pubmed.ipynb)|
|[Evaluate LED on Arxiv](https://github.com/patrickvonplaten/notebooks/blob/master/LED_on_Arxiv.ipynb) | How to effectively evaluate LED on long-range summarization | [Patrick von Platen](https://github.com/patrickvonplaten) | [](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/LED_on_Arxiv.ipynb)|
|[Fine-tune LayoutLM on RVL-CDIP (a document image classification dataset)](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForSequenceClassification_on_RVL_CDIP.ipynb) | How to fine-tune *LayoutLMForSequenceClassification* on the RVL-CDIP dataset for scanned document classification | [Niels Rogge](https://github.com/nielsrogge) | [](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForSequenceClassification_on_RVL_CDIP.ipynb)|
|[Wav2Vec2 CTC decoding with GPT2 adjustment](https://github.com/voidful/huggingface_notebook/blob/main/xlsr_gpt.ipynb) | How to decode CTC sequence with language model adjustment | [Eric Lam](https://github.com/voidful) | [](https://colab.research.google.com/drive/1e_z5jQHYbO2YKEaUgzb1ww1WwiAyydAj?usp=sharing)|
|[Fine-tune BART for summarization in two languages with Trainer class](https://github.com/elsanns/xai-nlp-notebooks/blob/master/fine_tune_bart_summarization_two_langs.ipynb) | How to fine-tune BART for summarization in two languages with Trainer class | [Eliza Szczechla](https://github.com/elsanns) | [](https://colab.research.google.com/github/elsanns/xai-nlp-notebooks/blob/master/fine_tune_bart_summarization_two_langs.ipynb)|
|[Evaluate Big Bird on Trivia QA](https://github.com/patrickvonplaten/notebooks/blob/master/Evaluating_Big_Bird_on_TriviaQA.ipynb) | How to evaluate BigBird on long document question answering on Trivia QA | [Patrick von Platen](https://github.com/patrickvonplaten) | [](https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Evaluating_Big_Bird_on_TriviaQA.ipynb)|
| [Create video captions using Wav2Vec2](https://github.com/Muennighoff/ytclipcc/blob/main/wav2vec_youtube_captions.ipynb) | How to create YouTube captions from any video by transcribing the audio with Wav2Vec | [Niklas Muennighoff](https://github.com/Muennighoff) |[](https://colab.research.google.com/github/Muennighoff/ytclipcc/blob/main/wav2vec_youtube_captions.ipynb) |
| [Fine-tune the Vision Transformer on CIFAR-10 using PyTorch Lightning](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/VisionTransformer/Fine_tuning_the_Vision_Transformer_on_CIFAR_10_with_PyTorch_Lightning.ipynb) | How to fine-tune the Vision Transformer (ViT) on CIFAR-10 using HuggingFace Transformers, Datasets and PyTorch Lightning | [Niels Rogge](https://github.com/nielsrogge) |[](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/VisionTransformer/Fine_tuning_the_Vision_Transformer_on_CIFAR_10_with_PyTorch_Lightning.ipynb) |
| [Fine-tune the Vision Transformer on CIFAR-10 using the 🤗 Trainer](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/VisionTransformer/Fine_tuning_the_Vision_Transformer_on_CIFAR_10_with_the_%F0%9F%A4%97_Trainer.ipynb) | How to fine-tune the Vision Transformer (ViT) on CIFAR-10 using HuggingFace Transformers, Datasets and the 🤗 Trainer | [Niels Rogge](https://github.com/nielsrogge) |[](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/VisionTransformer/Fine_tuning_the_Vision_Transformer_on_CIFAR_10_with_the_%F0%9F%A4%97_Trainer.ipynb) |
| [Evaluate LUKE on Open Entity, an entity typing dataset](https://github.com/studio-ousia/luke/blob/master/notebooks/huggingface_open_entity.ipynb) | How to evaluate *LukeForEntityClassification* on the Open Entity dataset | [Ikuya Yamada](https://github.com/ikuyamada) |[](https://colab.research.google.com/github/studio-ousia/luke/blob/master/notebooks/huggingface_open_entity.ipynb) |
| [Evaluate LUKE on TACRED, a relation extraction dataset](https://github.com/studio-ousia/luke/blob/master/notebooks/huggingface_tacred.ipynb) | How to evaluate *LukeForEntityPairClassification* on the TACRED dataset | [Ikuya Yamada](https://github.com/ikuyamada) |[](https://colab.research.google.com/github/studio-ousia/luke/blob/master/notebooks/huggingface_tacred.ipynb) |
| [Evaluate LUKE on CoNLL-2003, an important NER benchmark](https://github.com/studio-ousia/luke/blob/master/notebooks/huggingface_conll_2003.ipynb) | How to evaluate *LukeForEntitySpanClassification* on the CoNLL-2003 dataset | [Ikuya Yamada](https://github.com/ikuyamada) |[](https://colab.research.google.com/github/studio-ousia/luke/blob/master/notebooks/huggingface_conll_2003.ipynb) |
| [Evaluate BigBird-Pegasus on PubMed dataset](https://github.com/vasudevgupta7/bigbird/blob/main/notebooks/bigbird_pegasus_evaluation.ipynb) | How to evaluate *BigBirdPegasusForConditionalGeneration* on PubMed dataset | [Vasudev Gupta](https://github.com/vasudevgupta7) | [](https://colab.research.google.com/github/vasudevgupta7/bigbird/blob/main/notebooks/bigbird_pegasus_evaluation.ipynb) |
| [Speech Emotion Classification with Wav2Vec2](https://github/m3hrdadfi/soxan/blob/main/notebooks/Emotion_recognition_in_Greek_speech_using_Wav2Vec2.ipynb) | How to leverage a pretrained Wav2Vec2 model for Emotion Classification on the MEGA dataset | [Mehrdad Farahani](https://github.com/m3hrdadfi) | [](https://colab.research.google.com/github/m3hrdadfi/soxan/blob/main/notebooks/Emotion_recognition_in_Greek_speech_using_Wav2Vec2.ipynb) |
| [Detect objects in an image with DETR](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/DETR_minimal_example_(with_DetrFeatureExtractor).ipynb) | How to use a trained *DetrForObjectDetection* model to detect objects in an image and visualize attention | [Niels Rogge](https://github.com/NielsRogge) | [](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/DETR/DETR_minimal_example_(with_DetrFeatureExtractor).ipynb) |
| [Fine-tune DETR on a custom object detection dataset](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb) | How to fine-tune *DetrForObjectDetection* on a custom object detection dataset | [Niels Rogge](https://github.com/NielsRogge) | [](https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/DETR/Fine_tuning_DetrForObjectDetection_on_custom_dataset_(balloon).ipynb) |
| [Finetune T5 for Named Entity Recognition](https://github.com/ToluClassics/Notebooks/blob/main/T5_Ner_Finetuning.ipynb) | How to fine-tune *T5* on a Named Entity Recognition Task | [Ogundepo Odunayo](https://github.com/ToluClassics) | [](https://colab.research.google.com/drive/1obr78FY_cBmWY5ODViCmzdY6O1KB65Vc?usp=sharing) |
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Converting Tensorflow Checkpoints
A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints to models
that can be loaded using the `from_pretrained` methods of the library.
<Tip>
Since 2.3.0 the conversion script is now part of the transformers CLI (**transformers-cli**) available in any
transformers >= 2.3.0 installation.
The documentation below reflects the **transformers-cli convert** command format.
</Tip>
## BERT
You can convert any TensorFlow checkpoint for BERT (in particular [the pre-trained models released by Google](https://github.com/google-research/bert#pre-trained-models)) in a PyTorch save file by using the
This CLI takes as input a TensorFlow checkpoint (three files starting with `bert_model.ckpt`) and the associated
configuration file (`bert_config.json`), and creates a PyTorch model for this configuration, loads the weights from
the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can
be imported using `from_pretrained()` (see example in [quicktour](quicktour) , [run_glue.py](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification/run_glue.py) ).
You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow
checkpoint (the three files starting with `bert_model.ckpt`) but be sure to keep the configuration file (\
`bert_config.json`) and the vocabulary file (`vocab.txt`) as these are needed for the PyTorch model too.
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (`pip install tensorflow`). The rest of the repository only requires PyTorch.
Here is an example of the conversion process for a pre-trained `BERT-Base Uncased` model:
Here is an example of the conversion process for a pre-trained Transformer-XL model (see [here](https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models))
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Create a custom architecture
An [`AutoClass`](model_doc/auto) automatically infers the model architecture and downloads pretrained configuration and weights. Generally, we recommend using an `AutoClass` to produce checkpoint-agnostic code. But users who want more control over specific model parameters can create a custom 🤗 Transformers model from just a few base classes. This could be particularly useful for anyone who is interested in studying, training or experimenting with a 🤗 Transformers model. In this guide, dive deeper into creating a custom model without an `AutoClass`. Learn how to:
- Load and customize a model configuration.
- Create a model architecture.
- Create a slow and fast tokenizer for text.
- Create a feature extractor for audio or image tasks.
- Create a processor for multimodal tasks.
## Configuration
A [configuration](main_classes/configuration) refers to a model's specific attributes. Each model configuration has different attributes; for instance, all NLP models have the `hidden_size`, `num_attention_heads`, `num_hidden_layers` and `vocab_size` attributes in common. These attributes specify the number of attention heads or hidden layers to construct a model with.
Get a closer look at [DistilBERT](model_doc/distilbert) by accessing [`DistilBertConfig`] to inspect it's attributes:
```py
>>> from transformers import DistilBertConfig
>>> config = DistilBertConfig()
>>> print(config)
DistilBertConfig {
"activation": "gelu",
"attention_dropout": 0.1,
"dim": 768,
"dropout": 0.1,
"hidden_dim": 3072,
"initializer_range": 0.02,
"max_position_embeddings": 512,
"model_type": "distilbert",
"n_heads": 12,
"n_layers": 6,
"pad_token_id": 0,
"qa_dropout": 0.1,
"seq_classif_dropout": 0.2,
"sinusoidal_pos_embds": false,
"transformers_version": "4.16.2",
"vocab_size": 30522
}
```
[`DistilBertConfig`] displays all the default attributes used to build a base [`DistilBertModel`]. All attributes are customizable, creating space for experimentation. For example, you can customize a default model to:
- Try a different activation function with the `activation` parameter.
- Use a higher dropout ratio for the attention probabilities with the `attention_dropout` parameter.
Once you are satisfied with your model configuration, you can save it with [`~PretrainedConfig.save_pretrained`]. Your configuration file is stored as a JSON file in the specified save directory:
You can also save your configuration file as a dictionary or even just the difference between your custom configuration attributes and the default configuration attributes! See the [configuration](main_classes/configuration) documentation for more details.
</Tip>
## Model
The next step is to create a [model](main_classes/models). The model - also loosely referred to as the architecture - defines what each layer is doing and what operations are happening. Attributes like `num_hidden_layers` from the configuration are used to define the architecture. Every model shares the base class [`PreTrainedModel`] and a few common methods like resizing input embeddings and pruning self-attention heads. In addition, all models are also either a [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html), [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) or [`flax.linen.Module`](https://flax.readthedocs.io/en/latest/flax.linen.html#module) subclass. This means models are compatible with each of their respective framework's usage.
<frameworkcontent>
<pt>
Load your custom configuration attributes into the model:
This creates a model with random values instead of pretrained weights. You won't be able to use this model for anything useful yet until you train it. Training is a costly and time-consuming process. It is generally better to use a pretrained model to obtain better results faster, while using only a fraction of the resources required for training.
Create a pretrained model with [`~PreTrainedModel.from_pretrained`]:
```py
>>> model = DistilBertModel.from_pretrained("distilbert-base-uncased")
```
When you load pretrained weights, the default model configuration is automatically loaded if the model is provided by 🤗 Transformers. However, you can still replace - some or all of - the default model configuration attributes with your own if you'd like:
```py
>>> model = DistilBertModel.from_pretrained("distilbert-base-uncased", config=my_config)
```
</pt>
<tf>
Load your custom configuration attributes into the model:
This creates a model with random values instead of pretrained weights. You won't be able to use this model for anything useful yet until you train it. Training is a costly and time-consuming process. It is generally better to use a pretrained model to obtain better results faster, while using only a fraction of the resources required for training.
Create a pretrained model with [`~TFPreTrainedModel.from_pretrained`]:
When you load pretrained weights, the default model configuration is automatically loaded if the model is provided by 🤗 Transformers. However, you can still replace - some or all of - the default model configuration attributes with your own if you'd like:
At this point, you have a base DistilBERT model which outputs the *hidden states*. The hidden states are passed as inputs to a model head to produce the final output. 🤗 Transformers provides a different model head for each task as long as a model supports the task (i.e., you can't use DistilBERT for a sequence-to-sequence task like translation).
<frameworkcontent>
<pt>
For example, [`DistilBertForSequenceClassification`] is a base DistilBERT model with a sequence classification head. The sequence classification head is a linear layer on top of the pooled outputs.
```py
>>> from transformers import DistilBertForSequenceClassification
>>> model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
```
Easily reuse this checkpoint for another task by switching to a different model head. For a question answering task, you would use the [`DistilBertForQuestionAnswering`] model head. The question answering head is similar to the sequence classification head except it is a linear layer on top of the hidden states output.
```py
>>> from transformers import DistilBertForQuestionAnswering
>>> model = DistilBertForQuestionAnswering.from_pretrained("distilbert-base-uncased")
```
</pt>
<tf>
For example, [`TFDistilBertForSequenceClassification`] is a base DistilBERT model with a sequence classification head. The sequence classification head is a linear layer on top of the pooled outputs.
```py
>>> from transformers import TFDistilBertForSequenceClassification
Easily reuse this checkpoint for another task by switching to a different model head. For a question answering task, you would use the [`TFDistilBertForQuestionAnswering`] model head. The question answering head is similar to the sequence classification head except it is a linear layer on top of the hidden states output.
```py
>>> from transformers import TFDistilBertForQuestionAnswering
The last base class you need before using a model for textual data is a [tokenizer](main_classes/tokenizer) to convert raw text to tensors. There are two types of tokenizers you can use with 🤗 Transformers:
- [`PreTrainedTokenizer`]: a Python implementation of a tokenizer.
- [`PreTrainedTokenizerFast`]: a tokenizer from our Rust-based [🤗 Tokenizer](https://huggingface.co/docs/tokenizers/python/latest/) library. This tokenizer type is significantly faster - especially during batch tokenization - due to it's Rust implementation. The fast tokenizer also offers additional methods like *offset mapping* which maps tokens to their original words or characters.
Both tokenizers support common methods such as encoding and decoding, adding new tokens, and managing special tokens.
<Tip warning={true}>
Not every model supports a fast tokenizer. Take a look at this [table](index#supported-frameworks) to check if a model has fast tokenizer support.
</Tip>
If you trained your own tokenizer, you can create one from your *vocabulary* file:
It is important to remember the vocabulary from a custom tokenizer will be different from the vocabulary generated by a pretrained model's tokenizer. You need to use a pretrained model's vocabulary if you are using a pretrained model, otherwise the inputs won't make sense. Create a tokenizer with a pretrained model's vocabulary with the [`DistilBertTokenizer`] class:
By default, [`AutoTokenizer`] will try to load a fast tokenizer. You can disable this behavior by setting `use_fast=False` in `from_pretrained`.
</Tip>
## Feature Extractor
A feature extractor processes audio or image inputs. It inherits from the base [`~feature_extraction_utils.FeatureExtractionMixin`] class, and may also inherit from the [`ImageFeatureExtractionMixin`] class for processing image features or the [`SequenceFeatureExtractor`] class for processing audio inputs.
Depending on whether you are working on an audio or vision task, create a feature extractor associated with the model you're using. For example, create a default [`ViTFeatureExtractor`] if you are using [ViT](model_doc/vit) for image classification:
```py
>>> from transformers import ViTFeatureExtractor
>>> vit_extractor = ViTFeatureExtractor()
>>> print(vit_extractor)
ViTFeatureExtractor {
"do_normalize": true,
"do_resize": true,
"feature_extractor_type": "ViTFeatureExtractor",
"image_mean": [
0.5,
0.5,
0.5
],
"image_std": [
0.5,
0.5,
0.5
],
"resample": 2,
"size": 224
}
```
<Tip>
If you aren't looking for any customization, just use the `from_pretrained` method to load a model's default feature extractor parameters.
</Tip>
Modify any of the [`ViTFeatureExtractor`] parameters to create your custom feature extractor:
For models that support multimodal tasks, 🤗 Transformers offers a processor class that conveniently wraps a feature extractor and tokenizer into a single object. For example, let's use the [`Wav2Vec2Processor`] for an automatic speech recognition task (ASR). ASR transcribes audio to text, so you will need a feature extractor and a tokenizer.
Create a feature extractor to handle the audio inputs:
```py
>>> from transformers import Wav2Vec2FeatureExtractor
With two basic classes - configuration and model - and an additional preprocessing class (tokenizer, feature extractor, or processor), you can create any of the models supported by 🤗 Transformers. Each of these base classes are configurable, allowing you to use the specific attributes you want. You can easily setup a model for training or modify an existing pretrained model to fine-tune.
Now to send the model to the Hub, make sure you are logged in. Either run in your terminal:
```bash
huggingface-cli login
```
or from a notebook:
```py
from huggingface_hub import notebook_login
notebook_login()
```
You can then push to your own namespace (or an organization you are a member of) like this:
```py
resnet50d.push_to_hub("custom-resnet50d")
```
On top of the modeling weights and the configuration in json format, this also copied the modeling and
configuration `.py` files in the folder `custom-resnet50d` and uploaded the result to the Hub. You can check the result
in this [model repo](https://huggingface.co/sgugger/custom-resnet50d).
See the [sharing tutorial](model_sharing) for more information on the push to Hub method.
## Using a model with custom code
You can use any configuration, model or tokenizer with custom code files in its repository with the auto-classes and
the `from_pretrained` method. All files and code uploaded to the Hub are scanned for malware (refer to the [Hub security](https://huggingface.co/docs/hub/security#malware-scanning) documentation for more information), but you should still
review the model code and author to avoid executing malicious code on your machine. Set `trust_remote_code=True` to use
a model with custom code:
```py
from transformers import AutoModelForImageClassification
model = AutoModelForImageClassification.from_pretrained("sgugger/custom-resnet50d", trust_remote_code=True)
```
It is also strongly encouraged to pass a commit hash as a `revision` to make sure the author of the models did not
update the code with some malicious new lines (unless you fully trust the authors of the models).
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Debugging
## Multi-GPU Network Issues Debug
When training or inferencing with `DistributedDataParallel` and multiple GPU, if you run into issue of inter-communication between processes and/or nodes, you can use the following script to diagnose network issues.
This will dump a lot of NCCL-related debug information, which you can then search online if you find that some problems are reported. Or if you're not sure how to interpret the output you can share the log file in an Issue.
## Underflow and Overflow Detection
<Tip>
This feature is currently available for PyTorch-only.
</Tip>
<Tip>
For multi-GPU training it requires DDP (`torch.distributed.launch`).
</Tip>
<Tip>
This feature can be used with any `nn.Module`-based model.
</Tip>
If you start getting `loss=NaN` or the model inhibits some other abnormal behavior due to `inf` or `nan` in
activations or weights one needs to discover where the first underflow or overflow happens and what led to it. Luckily
you can accomplish that easily by activating a special module that will do the detection automatically.
If you're using [`Trainer`], you just need to add:
```bash
--debug underflow_overflow
```
to the normal command line arguments, or pass `debug="underflow_overflow"` when creating the
[`TrainingArguments`] object.
If you're using your own training loop or another Trainer you can accomplish the same with:
```python
from .debug_utils import DebugUnderflowOverflow
debug_overflow = DebugUnderflowOverflow(model)
```
[`~debug_utils.DebugUnderflowOverflow`] inserts hooks into the model that immediately after each
forward call will test input and output variables and also the corresponding module's weights. As soon as `inf` or
`nan` is detected in at least one element of the activations or weights, the program will assert and print a report
like this (this was caught with `google/mt5-small` under fp16 mixed precision):
The example output has been trimmed in the middle for brevity.
The second column shows the value of the absolute largest element, so if you have a closer look at the last few frames,
the inputs and outputs were in the range of `1e4`. So when this training was done under fp16 mixed precision the very
last step overflowed (since under `fp16` the largest number before `inf` is `64e3`). To avoid overflows under
`fp16` the activations must remain way below `1e4`, because `1e4 * 1e4 = 1e8` so any matrix multiplication with
large activations is going to lead to a numerical overflow condition.
At the very start of the trace you can discover at which batch number the problem occurred (here `Detected inf/nan during batch_number=0` means the problem occurred on the first batch).
Each reported frame starts by declaring the fully qualified entry for the corresponding module this frame is reporting
for. If we look just at this frame:
```
encoder.block.2.layer.1.layer_norm T5LayerNorm
8.69e-02 4.18e-01 weight
2.65e-04 3.42e+03 input[0]
1.79e-06 4.65e+00 output
```
Here, `encoder.block.2.layer.1.layer_norm` indicates that it was a layer norm for the first layer, of the second
block of the encoder. And the specific calls of the `forward` is `T5LayerNorm`.
Let's look at the last few frames of that report:
```
Detected inf/nan during batch_number=0
Last 21 forward frames:
abs min abs max metadata
[...]
encoder.block.2.layer.1.DenseReluDense.wi_0 Linear
2.17e-07 4.50e+00 weight
1.79e-06 4.65e+00 input[0]
2.68e-06 3.70e+01 output
encoder.block.2.layer.1.DenseReluDense.wi_1 Linear
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Use tokenizers from 🤗 Tokenizers
The [`PreTrainedTokenizerFast`] depends on the [🤗 Tokenizers](https://huggingface.co/docs/tokenizers) library. The tokenizers obtained from the 🤗 Tokenizers library can be
loaded very simply into 🤗 Transformers.
Before getting in the specifics, let's first start by creating a dummy tokenizer in a few lines:
```python
>>> from tokenizers import Tokenizer
>>> from tokenizers.models import BPE
>>> from tokenizers.trainers import BpeTrainer
>>> from tokenizers.pre_tokenizers import Whitespace
These tokens can then be converted into IDs which are understandable by the model. This can be done by directly feeding
the sentence to the tokenizer, which leverages the Rust implementation of [🤗 Tokenizers](https://github.com/huggingface/tokenizers) for peak performance.
```python
>>> inputs = tokenizer(sequence)
```
The tokenizer returns a dictionary with all the arguments necessary for its corresponding model to work properly. The
The first sequence, the "context" used for the question, has all its tokens represented by a `0`, whereas the
second sequence, corresponding to the "question", has all its tokens represented by a `1`.
Some models, like [`XLNetModel`] use an additional token represented by a `2`.
### Position IDs
Contrary to RNNs that have the position of each token embedded within them, transformers are unaware of the position of
each token. Therefore, the position IDs (`position_ids`) are used by the model to identify each token's position in
the list of tokens.
They are an optional parameter. If no `position_ids` are passed to the model, the IDs are automatically created as
absolute positional embeddings.
Absolute positional embeddings are selected in the range `[0, config.max_position_embeddings - 1]`. Some models use
other types of positional embeddings, such as sinusoidal position embeddings or relative position embeddings.
### Labels
The labels are an optional argument which can be passed in order for the model to compute the loss itself. These labels
should be the expected prediction of the model: it will use the standard loss in order to compute the loss between its
predictions and the expected value (the label).
These labels are different according to the model head, for example:
- For sequence classification models (e.g., [`BertForSequenceClassification`]), the model expects a
tensor of dimension `(batch_size)` with each value of the batch corresponding to the expected label of the
entire sequence.
- For token classification models (e.g., [`BertForTokenClassification`]), the model expects a tensor
of dimension `(batch_size, seq_length)` with each value corresponding to the expected label of each individual
token.
- For masked language modeling (e.g., [`BertForMaskedLM`]), the model expects a tensor of dimension
`(batch_size, seq_length)` with each value corresponding to the expected label of each individual token: the
labels being the token ID for the masked token, and values to be ignored for the rest (usually -100).
- For sequence to sequence tasks,(e.g., [`BartForConditionalGeneration`],
[`MBartForConditionalGeneration`]), the model expects a tensor of dimension `(batch_size, tgt_seq_length)` with each value corresponding to the target sequences associated with each input sequence. During
training, both *BART* and *T5* will make the appropriate *decoder_input_ids* and decoder attention masks internally.
They usually do not need to be supplied. This does not apply to models leveraging the Encoder-Decoder framework. See
the documentation of each model for more information on each specific model's labels.
The base models (e.g., [`BertModel`]) do not accept labels, as these are the base transformer
models, simply outputting features.
### Decoder input IDs
This input is specific to encoder-decoder models, and contains the input IDs that will be fed to the decoder. These
inputs should be used for sequence to sequence tasks, such as translation or summarization, and are usually built in a
way specific to each model.
Most encoder-decoder models (BART, T5) create their `decoder_input_ids` on their own from the `labels`. In
such models, passing the `labels` is the preferred way to handle training.
Please check each model's docs to see how they handle these input IDs for sequence to sequence training.
### Feed Forward Chunking
In each residual attention block in transformers the self-attention layer is usually followed by 2 feed forward layers.
The intermediate embedding size of the feed forward layers is often bigger than the hidden size of the model (e.g., for
`bert-base-uncased`).
For an input of size `[batch_size, sequence_length]`, the memory required to store the intermediate feed forward
embeddings `[batch_size, sequence_length, config.intermediate_size]` can account for a large fraction of the memory
use. The authors of [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) noticed that since the
computation is independent of the `sequence_length` dimension, it is mathematically equivalent to compute the output
embeddings of both feed forward layers `[batch_size, config.hidden_size]_0, ..., [batch_size, config.hidden_size]_n`
individually and concat them afterward to `[batch_size, sequence_length, config.hidden_size]` with `n = sequence_length`, which trades increased computation time against reduced memory use, but yields a mathematically
**equivalent** result.
For models employing the function [`apply_chunking_to_forward`], the `chunk_size` defines the
number of output embeddings that are computed in parallel and thus defines the trade-off between memory and time
complexity. If `chunk_size` is set to 0, no feed forward chunking is done.
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# 🤗 Transformers
State-of-the-art Machine Learning for PyTorch, TensorFlow and JAX.
🤗 Transformers provides APIs to easily download and train state-of-the-art pretrained models. Using pretrained models can reduce your compute costs, carbon footprint, and save you time from training a model from scratch. The models can be used across different modalities such as:
* 📝 Text: text classification, information extraction, question answering, summarization, translation, and text generation in over 100 languages.
* 🖼️ Images: image classification, object detection, and segmentation.
* 🗣️ Audio: speech recognition and audio classification.
* 🐙 Multimodal: table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
Our library supports seamless integration between three of the most popular deep learning libraries: [PyTorch](https://pytorch.org/), [TensorFlow](https://www.tensorflow.org/) and [JAX](https://jax.readthedocs.io/en/latest/). Train your model in three lines of code in one framework, and load it for inference with another.
Each 🤗 Transformers architecture is defined in a standalone Python module so they can be easily customized for research and experiments.
## If you are looking for custom support from the Hugging Face team
- **GET STARTED** contains a quick tour and installation instructions to get up and running with 🤗 Transformers.
- **TUTORIALS** are a great place to begin if you are new to our library. This section will help you gain the basic skills you need to start using 🤗 Transformers.
- **HOW-TO GUIDES** will show you how to achieve a specific goal like fine-tuning a pretrained model for language modeling or how to create a custom model head.
- **CONCEPTUAL GUIDES** provides more discussion and explanation of the underlying concepts and ideas behind models, tasks, and the design philosophy of 🤗 Transformers.
- **API** describes each class and function, grouped in:
- **MAIN CLASSES** for the main classes exposing the important APIs of the library.
- **MODELS** for the classes and functions related to each model implemented in the library.
- **INTERNAL HELPERS** for the classes and functions we use internally.
The library currently contains JAX, PyTorch and TensorFlow implementations, pretrained model weights, usage scripts and conversion utilities for the following models.
### Supported models
<!--This list is updated automatically from the README with _make fix-copies_. Do not update manually! -->
1. **[ALBERT](model_doc/albert)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
1. **[BART](model_doc/bart)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BARThez](model_doc/barthez)** (from École polytechnique) released with the paper [BARThez: a Skilled Pretrained French Sequence-to-Sequence Model](https://arxiv.org/abs/2010.12321) by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
1. **[BARTpho](model_doc/bartpho)** (from VinAI Research) released with the paper [BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese](https://arxiv.org/abs/2109.09701) by Nguyen Luong Tran, Duong Minh Le and Dat Quoc Nguyen.
1. **[BEiT](model_doc/beit)** (from Microsoft) released with the paper [BEiT: BERT Pre-Training of Image Transformers](https://arxiv.org/abs/2106.08254) by Hangbo Bao, Li Dong, Furu Wei.
1. **[BERT](model_doc/bert)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERT For Sequence Generation](model_doc/bert-generation)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[BERTweet](model_doc/bertweet)** (from VinAI Research) released with the paper [BERTweet: A pre-trained language model for English Tweets](https://aclanthology.org/2020.emnlp-demos.2/) by Dat Quoc Nguyen, Thanh Vu and Anh Tuan Nguyen.
1. **[BigBird-Pegasus](model_doc/bigbird_pegasus)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[BigBird-RoBERTa](model_doc/big_bird)** (from Google Research) released with the paper [Big Bird: Transformers for Longer Sequences](https://arxiv.org/abs/2007.14062) by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
1. **[Blenderbot](model_doc/blenderbot)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BlenderbotSmall](model_doc/blenderbot-small)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[BLOOM](model_doc/bloom)** (from BigScience workshop) released by the [BigSicence Workshop](https://bigscience.huggingface.co/).
1. **[BORT](model_doc/bort)** (from Alexa) released with the paper [Optimal Subarchitecture Extraction For BERT](https://arxiv.org/abs/2010.10499) by Adrian de Wynter and Daniel J. Perry.
1. **[ByT5](model_doc/byt5)** (from Google Research) released with the paper [ByT5: Towards a token-free future with pre-trained byte-to-byte models](https://arxiv.org/abs/2105.13626) by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
1. **[CamemBERT](model_doc/camembert)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CANINE](model_doc/canine)** (from Google Research) released with the paper [CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation](https://arxiv.org/abs/2103.06874) by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
1. **[CLIP](model_doc/clip)** (from OpenAI) released with the paper [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
1. **[CodeGen](model_doc/codegen)** (from Salesforce) released with the paper [A Conversational Paradigm for Program Synthesis](https://arxiv.org/abs/2203.13474) by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong.
1. **[ConvBERT](model_doc/convbert)** (from YituTech) released with the paper [ConvBERT: Improving BERT with Span-based Dynamic Convolution](https://arxiv.org/abs/2008.02496) by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
1. **[ConvNeXT](model_doc/convnext)** (from Facebook AI) released with the paper [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.
1. **[CPM](model_doc/cpm)** (from Tsinghua University) released with the paper [CPM: A Large-scale Generative Chinese Pre-trained Language Model](https://arxiv.org/abs/2012.00413) by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
1. **[CTRL](model_doc/ctrl)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[CvT](model_doc/cvt)** (from Microsoft) released with the paper [CvT: Introducing Convolutions to Vision Transformers](https://arxiv.org/abs/2103.15808) by Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang.
1. **[Data2Vec](model_doc/data2vec)** (from Facebook) released with the paper [Data2Vec: A General Framework for Self-supervised Learning in Speech, Vision and Language](https://arxiv.org/abs/2202.03555) by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu, Michael Auli.
1. **[DeBERTa](model_doc/deberta)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DeBERTa-v2](model_doc/deberta-v2)** (from Microsoft) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[Decision Transformer](model_doc/decision_transformer)** (from Berkeley/Facebook/Google) released with the paper [Decision Transformer: Reinforcement Learning via Sequence Modeling](https://arxiv.org/abs/2106.01345) by Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch.
1. **[DeiT](model_doc/deit)** (from Facebook) released with the paper [Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877) by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
1. **[DETR](model_doc/detr)** (from Facebook) released with the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
1. **[DialoGPT](model_doc/dialogpt)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](model_doc/distilbert)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) and a German version of DistilBERT.
1. **[DiT](model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[Donut](model_doc/donut)** (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
1. **[DPR](model_doc/dpr)** (from Facebook) released with the paper [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[DPT](master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
1. **[ELECTRA](model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[EncoderDecoder](model_doc/encoder-decoder)** (from Google Research) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[FlauBERT](model_doc/flaubert)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[FLAVA](model_doc/flava)** (from Facebook AI) released with the paper [FLAVA: A Foundational Language And Vision Alignment Model](https://arxiv.org/abs/2112.04482) by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela.
1. **[FNet](model_doc/fnet)** (from Google Research) released with the paper [FNet: Mixing Tokens with Fourier Transforms](https://arxiv.org/abs/2105.03824) by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
1. **[Funnel Transformer](model_doc/funnel)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GLPN](model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://arxiv.org/abs/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
1. **[GPT](model_doc/openai-gpt)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT Neo](model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
1. **[GPT NeoX](model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://arxiv.org/abs/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
1. **[GPT-2](model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[GPT-J](model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
1. **[GroupViT](model_doc/groupvit)** (from UCSD, NVIDIA) released with the paper [GroupViT: Semantic Segmentation Emerges from Text Supervision](https://arxiv.org/abs/2202.11094) by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang.
1. **[Hubert](model_doc/hubert)** (from Facebook) released with the paper [HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units](https://arxiv.org/abs/2106.07447) by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
1. **[I-BERT](model_doc/ibert)** (from Berkeley) released with the paper [I-BERT: Integer-only BERT Quantization](https://arxiv.org/abs/2101.01321) by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
1. **[ImageGPT](model_doc/imagegpt)** (from OpenAI) released with the paper [Generative Pretraining from Pixels](https://openai.com/blog/image-gpt/) by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever.
1. **[LayoutLM](model_doc/layoutlm)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[LayoutLMv2](model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding](https://arxiv.org/abs/2012.14740) by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
1. **[LayoutLMv3](model_doc/layoutlmv3)** (from Microsoft Research Asia) released with the paper [LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking](https://arxiv.org/abs/2204.08387) by Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei.
1. **[LayoutXLM](model_doc/layoutlmv2)** (from Microsoft Research Asia) released with the paper [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
1. **[LED](model_doc/led)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LeViT](model_doc/levit)** (from Meta AI) released with the paper [LeViT: A Vision Transformer in ConvNet's Clothing for Faster Inference](https://arxiv.org/abs/2104.01136) by Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze.
1. **[Longformer](model_doc/longformer)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LongT5](model_doc/longt5)** (from Google AI) released with the paper [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/abs/2112.07916) by Mandy Guo, Joshua Ainslie, David Uthus, Santiago Ontanon, Jianmo Ni, Yun-Hsuan Sung, Yinfei Yang.
1. **[LUKE](model_doc/luke)** (from Studio Ousia) released with the paper [LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention](https://arxiv.org/abs/2010.01057) by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
1. **[LXMERT](model_doc/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[M-CTC-T](model_doc/mctct)** (from Facebook) released with the paper [Pseudo-Labeling For Massively Multilingual Speech Recognition](https://arxiv.org/abs/2111.00161) by Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, and Ronan Collobert.
1. **[M2M100](model_doc/m2m_100)** (from Facebook) released with the paper [Beyond English-Centric Multilingual Machine Translation](https://arxiv.org/abs/2010.11125) by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
1. **[MarianMT](model_doc/marian)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MaskFormer](model_doc/maskformer)** (from Meta and UIUC) released with the paper [Per-Pixel Classification is Not All You Need for Semantic Segmentation](https://arxiv.org/abs/2107.06278) by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.
1. **[mBART](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[mBART-50](model_doc/mbart)** (from Facebook) released with the paper [Multilingual Translation with Extensible Multilingual Pretraining and Finetuning](https://arxiv.org/abs/2008.00401) by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
1. **[Megatron-BERT](model_doc/megatron-bert)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[Megatron-GPT2](model_doc/megatron_gpt2)** (from NVIDIA) released with the paper [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/abs/1909.08053) by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
1. **[mLUKE](model_doc/mluke)** (from Studio Ousia) released with the paper [mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models](https://arxiv.org/abs/2110.08151) by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka.
1. **[MobileBERT](model_doc/mobilebert)** (from CMU/Google Brain) released with the paper [MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices](https://arxiv.org/abs/2004.02984) by Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou.
1. **[MobileViT](model_doc/mobilevit)** (from Apple) released with the paper [MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer](https://arxiv.org/abs/2110.02178) by Sachin Mehta and Mohammad Rastegari.
1. **[MPNet](model_doc/mpnet)** (from Microsoft Research) released with the paper [MPNet: Masked and Permuted Pre-training for Language Understanding](https://arxiv.org/abs/2004.09297) by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
1. **[MT5](model_doc/mt5)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[MVP](model_doc/mvp)** (from RUC AI Box) released with the paper [MVP: Multi-task Supervised Pre-training for Natural Language Generation](https://arxiv.org/abs/2206.12131) by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen.
1. **[Nezha](model_doc/nezha)** (from Huawei Noah’s Ark Lab) released with the paper [NEZHA: Neural Contextualized Representation for Chinese Language Understanding](https://arxiv.org/abs/1909.00204) by Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen and Qun Liu.
1. **[NLLB](model_doc/nllb)** (from Meta) released with the paper [No Language Left Behind: Scaling Human-Centered Machine Translation](https://arxiv.org/abs/2207.04672) by the NLLB team.
1. **[Nyströmformer](model_doc/nystromformer)** (from the University of Wisconsin - Madison) released with the paper [Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention](https://arxiv.org/abs/2102.03902) by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, Vikas Singh.
1. **[OPT](master/model_doc/opt)** (from Meta AI) released with the paper [OPT: Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) by Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen et al.
1. **[OWL-ViT](model_doc/owlvit)** (from Google AI) released with the paper [Simple Open-Vocabulary Object Detection with Vision Transformers](https://arxiv.org/abs/2205.06230) by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby.
1. **[Pegasus](model_doc/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777) by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[Perceiver IO](model_doc/perceiver)** (from Deepmind) released with the paper [Perceiver IO: A General Architecture for Structured Inputs & Outputs](https://arxiv.org/abs/2107.14795) by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.
1. **[PhoBERT](model_doc/phobert)** (from VinAI Research) released with the paper [PhoBERT: Pre-trained language models for Vietnamese](https://www.aclweb.org/anthology/2020.findings-emnlp.92/) by Dat Quoc Nguyen and Anh Tuan Nguyen.
1. **[PLBart](model_doc/plbart)** (from UCLA NLP) released with the paper [Unified Pre-training for Program Understanding and Generation](https://arxiv.org/abs/2103.06333) by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
1. **[PoolFormer](model_doc/poolformer)** (from Sea AI Labs) released with the paper [MetaFormer is Actually What You Need for Vision](https://arxiv.org/abs/2111.11418) by Yu, Weihao and Luo, Mi and Zhou, Pan and Si, Chenyang and Zhou, Yichen and Wang, Xinchao and Feng, Jiashi and Yan, Shuicheng.
1. **[ProphetNet](model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[QDQBert](model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[RAG](model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
1. **[REALM](model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RegNet](model_doc/regnet)** (from META Platforms) released with the paper [Designing Network Design Space](https://arxiv.org/abs/2003.13678) by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.
1. **[RemBERT](model_doc/rembert)** (from Google Research) released with the paper [Rethinking embedding coupling in pre-trained language models](https://arxiv.org/abs/2010.12821) by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
1. **[ResNet](model_doc/resnet)** (from Microsoft Research) released with the paper [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun.
1. **[RoBERTa](model_doc/roberta)** (from Facebook), released together with the paper [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
1. **[RoFormer](model_doc/roformer)** (from ZhuiyiTechnology), released together with the paper [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
1. **[SegFormer](model_doc/segformer)** (from NVIDIA) released with the paper [SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers](https://arxiv.org/abs/2105.15203) by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo.
1. **[SEW](model_doc/sew)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
1. **[SEW-D](model_doc/sew_d)** (from ASAPP) released with the paper [Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition](https://arxiv.org/abs/2109.06870) by Felix Wu, Kwangyoun Kim, Jing Pan, Kyu Han, Kilian Q. Weinberger, Yoav Artzi.
1. **[SpeechToTextTransformer](model_doc/speech_to_text)** (from Facebook), released together with the paper [fairseq S2T: Fast Speech-to-Text Modeling with fairseq](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
1. **[SpeechToTextTransformer2](model_doc/speech_to_text_2)** (from Facebook), released together with the paper [Large-Scale Self- and Semi-Supervised Learning for Speech Translation](https://arxiv.org/abs/2104.06678) by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
1. **[Splinter](model_doc/splinter)** (from Tel Aviv University), released together with the paper [Few-Shot Question Answering by Pretraining Span Selection](https://arxiv.org/abs/2101.00438) by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
1. **[SqueezeBERT](model_doc/squeezebert)** (from Berkeley) released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[Swin Transformer](model_doc/swin)** (from Microsoft) released with the paper [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.
1. **[Swin Transformer V2](model_doc/swinv2)** (from Microsoft) released with the paper [Swin Transformer V2: Scaling Up Capacity and Resolution](https://arxiv.org/abs/2111.09883) by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.
1. **[T5](model_doc/t5)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[T5v1.1](model_doc/t5v1.1)** (from Google AI) released in the repository [google-research/text-to-text-transfer-transformer](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#t511) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[TAPAS](model_doc/tapas)** (from Google AI) released with the paper [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349) by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
1. **[TAPEX](model_doc/tapex)** (from Microsoft Research) released with the paper [TAPEX: Table Pre-training via Learning a Neural SQL Executor](https://arxiv.org/abs/2107.07653) by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou.
1. **[Trajectory Transformer](model_doc/trajectory_transformers)** (from the University of California at Berkeley) released with the paper [Offline Reinforcement Learning as One Big Sequence Modeling Problem](https://arxiv.org/abs/2106.02039) by Michael Janner, Qiyang Li, Sergey Levine
1. **[Transformer-XL](model_doc/transfo-xl)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[TrOCR](model_doc/trocr)** (from Microsoft), released together with the paper [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) by Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei.
1. **[UL2](model_doc/ul2)** (from Google Research) released with the paper [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1) by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler
1. **[UniSpeech](model_doc/unispeech)** (from Microsoft Research) released with the paper [UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data](https://arxiv.org/abs/2101.07597) by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang.
1. **[UniSpeechSat](model_doc/unispeech-sat)** (from Microsoft Research) released with the paper [UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING](https://arxiv.org/abs/2110.05752) by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.
1. **[VAN](model_doc/van)** (from Tsinghua University and Nankai University) released with the paper [Visual Attention Network](https://arxiv.org/abs/2202.09741) by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.
1. **[VideoMAE](model_doc/videomae)** (from Multimedia Computing Group, Nanjing University) released with the paper [VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training](https://arxiv.org/abs/2203.12602) by Zhan Tong, Yibing Song, Jue Wang, Limin Wang.
1. **[ViLT](model_doc/vilt)** (from NAVER AI Lab/Kakao Enterprise/Kakao Brain) released with the paper [ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision](https://arxiv.org/abs/2102.03334) by Wonjae Kim, Bokyung Son, Ildoo Kim.
1. **[Vision Transformer (ViT)](model_doc/vit)** (from Google AI) released with the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
1. **[VisualBERT](model_doc/visual_bert)** (from UCLA NLP) released with the paper [VisualBERT: A Simple and Performant Baseline for Vision and Language](https://arxiv.org/pdf/1908.03557) by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
1. **[ViTMAE](model_doc/vit_mae)** (from Meta AI) released with the paper [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377) by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.
1. **[Wav2Vec2](model_doc/wav2vec2)** (from Facebook AI) released with the paper [wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations](https://arxiv.org/abs/2006.11477) by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
1. **[Wav2Vec2-Conformer](model_doc/wav2vec2-conformer)** (from Facebook AI) released with the paper [FAIRSEQ S2T: Fast Speech-to-Text Modeling with FAIRSEQ](https://arxiv.org/abs/2010.05171) by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino.
1. **[Wav2Vec2Phoneme](model_doc/wav2vec2_phoneme)** (from Facebook AI) released with the paper [Simple and Effective Zero-shot Cross-lingual Phoneme Recognition](https://arxiv.org/abs/2109.11680) by Qiantong Xu, Alexei Baevski, Michael Auli.
1. **[WavLM](model_doc/wavlm)** (from Microsoft Research) released with the paper [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.
1. **[XGLM](model_doc/xglm)** (From Facebook AI) released with the paper [Few-shot Learning with Multilingual Language Models](https://arxiv.org/abs/2112.10668) by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.
1. **[XLM](model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLM-RoBERTa-XL](model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
1. **[XLNet](model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[XLS-R](model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
1. **[XLSR-Wav2Vec2](model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
1. **[YOLOS](model_doc/yolos)** (from Huazhong University of Science & Technology) released with the paper [You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection](https://arxiv.org/abs/2106.00666) by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu.
1. **[YOSO](model_doc/yoso)** (from the University of Wisconsin - Madison) released with the paper [You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling](https://arxiv.org/abs/2111.09714) by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.
### Supported frameworks
The table below represents the current support in the library for each of those models, whether they have a Python
tokenizer (called "slow"). A "fast" tokenizer backed by the 🤗 Tokenizers library, whether they have support in Jax (via
Flax), PyTorch, and/or TensorFlow.
<!--This table is updated automatically from the auto modules with _make fix-copies_. Do not update manually!-->
| Model | Tokenizer slow | Tokenizer fast | PyTorch support | TensorFlow support | Flax Support |
Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Installation
Install 🤗 Transformers for whichever deep learning library you're working with, setup your cache, and optionally configure 🤗 Transformers to run offline.
🤗 Transformers is tested on Python 3.6+, PyTorch 1.1.0+, TensorFlow 2.0+, and Flax. Follow the installation instructions below for the deep learning library you are using:
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, take a look at this [guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). A virtual environment makes it easier to manage different projects, and avoid compatibility issues between dependencies.
Start by creating a virtual environment in your project directory:
```bash
python -m venv .env
```
Activate the virtual environment. On Linux and MacOs:
```bash
source .env/bin/activate
```
Activate Virtual environment on Windows
```bash
.env/Scripts/activate
```
Now you're ready to install 🤗 Transformers with the following command:
```bash
pip install transformers
```
For CPU-support only, you can conveniently install 🤗 Transformers and a deep learning library in one line. For example, install 🤗 Transformers and PyTorch with:
```bash
pip install transformers[torch]
```
🤗 Transformers and TensorFlow 2.0:
```bash
pip install transformers[tf-cpu]
```
🤗 Transformers and Flax:
```bash
pip install transformers[flax]
```
Finally, check if 🤗 Transformers has been properly installed by running the following command. It will download a pretrained model:
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))"
This command installs the bleeding edge `main` version rather than the latest `stable` version. The `main` version is useful for staying up-to-date with the latest developments. For instance, if a bug has been fixed since the last official release but a new release hasn't been rolled out yet. However, this means the `main` version may not always be stable. We strive to keep the `main` version operational, and most issues are usually resolved within a few hours or a day. If you run into a problem, please open an [Issue](https://github.com/huggingface/transformers/issues) so we can fix it even sooner!
Check if 🤗 Transformers has been properly installed by running the following command:
```bash
python -c "from transformers import pipeline; print(pipeline('sentiment-analysis')('I love you'))"
```
## Editable install
You will need an editable install if you'd like to:
* Use the `main` version of the source code.
* Contribute to 🤗 Transformers and need to test changes in the code.
Clone the repository and install 🤗 Transformers with the following commands:
These commands will link the folder you cloned the repository to and your Python library paths. Python will now look inside the folder you cloned to in addition to the normal library paths. For example, if your Python packages are typically installed in `~/anaconda3/envs/main/lib/python3.7/site-packages/`, Python will also search the folder you cloned to: `~/transformers/`.
<Tip warning={true}>
You must keep the `transformers` folder if you want to keep using the library.
</Tip>
Now you can easily update your clone to the latest version of 🤗 Transformers with the following command:
```bash
cd ~/transformers/
git pull
```
Your Python environment will find the `main` version of 🤗 Transformers on the next run.
## Install with conda
Install from the conda channel `huggingface`:
```bash
conda install -c huggingface transformers
```
## Cache setup
Pretrained models are downloaded and locally cached at: `~/.cache/huggingface/hub`. This is the default directory given by the shell environment variable `TRANSFORMERS_CACHE`. On Windows, the default directory is given by `C:\Users\username\.cache\huggingface\hub`. You can change the shell environment variables shown below - in order of priority - to specify a different cache directory:
1. Shell environment variable (default): `HUGGINGFACE_HUB_CACHE` or `TRANSFORMERS_CACHE`.
🤗 Transformers will use the shell environment variables `PYTORCH_TRANSFORMERS_CACHE` or `PYTORCH_PRETRAINED_BERT_CACHE` if you are coming from an earlier iteration of this library and have set those environment variables, unless you specify the shell environment variable `TRANSFORMERS_CACHE`.
</Tip>
## Offline mode
🤗 Transformers is able to run in a firewalled or offline environment by only using local files. Set the environment variable `TRANSFORMERS_OFFLINE=1` to enable this behavior.
<Tip>
Add [🤗 Datasets](https://huggingface.co/docs/datasets/) to your offline training workflow by setting the environment variable `HF_DATASETS_OFFLINE=1`.
</Tip>
For example, you would typically run a program on a normal network firewalled to external instances with the following command:
The script should now run without hanging or waiting to timeout because it knows it should only look for local files.
### Fetch models and tokenizers to use offline
Another option for using 🤗 Transformers offline is to download the files ahead of time, and then point to their local path when you need to use them offline. There are three ways to do this:
* Download a file through the user interface on the [Model Hub](https://huggingface.co/models) by clicking on the ↓ icon.
>>> model = AutoModel.from_pretrained("./your/path/bigscience_t0")
```
* Programmatically download files with the [huggingface_hub](https://github.com/huggingface/huggingface_hub/tree/main/src/huggingface_hub) library:
1. Install the `huggingface_hub` library in your virtual environment:
```bash
python -m pip install huggingface_hub
```
2. Use the [`hf_hub_download`](https://huggingface.co/docs/hub/adding-a-library#download-files-from-the-hub) function to download a file to a specific path. For example, the following command downloads the `config.json` file from the [T0](https://huggingface.co/bigscience/T0_3B) model to your desired path:
See the [How to download files from the Hub](https://huggingface.co/docs/hub/how-to-downstream) section for more details on downloading files stored on the Hub.
</Tip>
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.