* working on LongformerForSequenceClassification
* add TFLongformerForMultipleChoice
* add TFLongformerForTokenClassification
* use add_start_docstrings_to_model_forward
* test TFLongformerForSequenceClassification
* test TFLongformerForMultipleChoice
* test TFLongformerForTokenClassification
* remove test from repo
* add test and doc for TFLongformerForSequenceClassification, TFLongformerForTokenClassification, TFLongformerForMultipleChoice
* add requested classes to modeling_tf_auto.py
update dummy_tf_objects
fix tests
fix bugs in requested classes
* pass all tests except test_inputs_embeds
* sync with master
* pass all tests except test_inputs_embeds
* pass all tests
* pass all tests
* work on test_inputs_embeds
* fix style and quality
* make multi choice work
* fix TFLongformerForTokenClassification signature
* fix TFLongformerForMultipleChoice, TFLongformerForSequenceClassification signature
* fix mult choice
* fix mc hint
* fix input embeds
* fix input embeds
* refactor input embeds
* fix copy issue
* apply sylvains changes and clean more
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Updated the Extractive Question Answering code snippets
The Extractive Question Answering code snippets do not work anymore since the models return task-specific output objects. This commit fixes the pytorch and tensorflow examples but adding `.values()` to the model call.
* Update task_summary.rst
Modified Model in Action section. The class `AutoModelWithLMHead` is deprecated so changed it to `AutoModelForSeq2SeqLM` for encoder-decoder models. Removed duplicate eos token.
* Adding PrefixConstrainedLogitsProcessor
* fixing RAG and style_doc
* fixing black (v20 instead of v19)
* Improving doc in generation_logits_process.py
* Improving docs and typing in generation_utils.py
* docs improvement
* adding test and fixing doc typo
* fixing doc_len
* isort on test
* fixed test
* improve docstring a bit
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* make tr_loss regular float
* Revert "make tr_loss regular float"
This reverts commit c9d7ccfaf0c4387187b0841694f01ec0ffd5f4ba.
* reset loss at each logging step
* keep track of total loss with _total_loss_scalar
* add remaining tr_loss at the end
* Tokenizers should be framework agnostic
* Run the slow tests
* Not testing
* Fix documentation
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* <small>tiny typo</small>
* Tokenizers: ability to load from model subfolder
* use subfolder for local files as well
* Uniformize model shortcut name => model id
* from s3 => from huggingface.co
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
* Put models in subfolders
* Styling
* Fix imports in tests
* More fixes in test imports
* Sneaky hidden imports
* Fix imports in doc files
* More sneaky imports
* Finish fixing tests
* Fix examples
* Fix path for copies
* More fixes for examples
* Fix dummy files
* More fixes for example
* More model import fixes
* Is this why you're unhappy GitHub?
* Fix imports in conver command
* Use the CI to identify failing tests
* Remove from all examples and tests
* More default switch
* Fixes
* More test fixes
* More fixes
* Last fixes hopefully
* Use the CI to identify failing tests
* Remove from all examples and tests
* More default switch
* Fixes
* More test fixes
* More fixes
* Last fixes hopefully
* Run on the real suite
* Fix slow tests
* Fix passing token_type_ids during GPT2DoubleHeadsModel.generate() if used
and for GPT2LMHeadModel too
* Update tests to check token_type_ids usage in GPT2 models
* Simply insert T5Tokenizer's prepare_seq2seq_batch
* Update/Add some 'import'
* fix RunTimeError caused by '.view'
* Moves .view related error avoidance from seq2seq_trainer to inside prophetnet
* Update test_tokenization_prophetnet.py
* Format the test code with black
* Re-format the test code
* Update test_tokenization_prophetnet.py
* Add importing require_torch in the test code
* Add importing BatchEncoding in the test code
* Re-format the test code on Colab
* Fixing roberta for slow-fast tests
* WIP getting equivalence on pipelines
* slow-to-fast equivalence - working on question-answering pipeline
* optional FAISS tests
* Pipeline Q&A
* Move pipeline tests to their own test job again
* update tokenizer to add sequence id methods
* update to tokenizers 0.9.4
* set sentencepiecce as optional
* clean up squad
* clean up pipelines to use sequence_ids
* style/quality
* wording
* Switch to use_fast = True by default
* update tests for use_fast at True by default
* fix rag tokenizer test
* removing protobuf from required dependencies
* fix NER test for use_fast = True by default
* fixing example tests (Q&A examples use slow tokenizers for now)
* protobuf in main deps extras["sentencepiece"] and example deps
* fix protobug install test
* try to fix seq2seq by switching to slow tokenizers for now
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update some tests
* Small update
* Apply style
* Use max_position_embeddings
* Create a fake attribute
* Create a fake attribute
* Update wrong name
* Wrong TransfoXL model file
* Keep the common tests agnostic
* Update deploy-docs dependencies on CI to enable Flax
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added pair of ""
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* First addition of Flax/Jax documentation
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* make style
* Ensure input order match between Bert & Roberta
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Install dependencies "all" when building doc
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* wraps build_doc deps with ""
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing @sgugger comments.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Use list to highlight JAX features.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Make style.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Let's not look to much into the future for now.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Style
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Create modeling_tf_dpr.py
* Add TFDPR
* Add back TFPegasus, TFMarian, TFMBart, TFBlenderBot
last commit accidentally deleted these 4 lines, so I recover them back
* Add TFDPR
* Add TFDPR
* clean up some comments, add TF input-style doc string
* Add TFDPR
* Make return_dict=False as default
* Fix return_dict bug (in .from_pretrained)
* Add get_input_embeddings()
* Create test_modeling_tf_dpr.py
The current version is already passed all 27 tests!
Please see the test run at :
https://colab.research.google.com/drive/1czS_m9zy5k-iSJbzA_DP1k1xAAC_sdkf?usp=sharing
* fix quality
* delete init weights
* run fix copies
* fix repo consis
* del config_class, load_tf_weights
They shoud be 'pytorch only'
* add config_class back
after removing it, test failed ... so totally only removing "use_tf_weights = None" on Lysandre suggestion
* newline after .. note::
* import tf, np (Necessary for ModelIntegrationTest)
* slow_test from_pretrained with from_pt=True
At the moment we don't have TF weights (since we don't have official official TF model)
Previously, I did not run slow test, so I missed this bug
* Add simple TFDPRModelIntegrationTest
Note that this is just a test that TF and Pytorch gives approx. the same output.
However, I could not test with the official DPR repo's output yet
* upload correct tf model
* remove position_ids as missing keys
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patrickvonplaten <patrick@huggingface.co>
The new run_ner.py script tries to run prediction on the input
test set `datasets["test"]`, but it should be the tokenized set
`tokenized_datasets["test"]`
* Add next sentence prediction loss computation
* Apply style
* Fix tests
* Add forgotten import
* Add forgotten import
* Use a new parameter
* Remove kwargs and use positional arguments
* [testing utils] get_auto_remove_tmp_dir default change
Now that I have been using `get_auto_remove_tmp_dir default change` for a while, I realized that the defaults aren't most optimal.
99% of the time we want the tmp dir to be empty at the beginning of the test - so changing the default to `before=True` - this shouldn't impact any tests since this feature is used only during debug.
* simplify things
* update docs
* fix doc layout
* style
* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* better 3-state doc
* style
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* s/tmp/temporary/ + style
* correct the statement
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix typo
* rm use_cdn & references, and implement new hf_bucket_url
* I'm pretty sure we don't need to `read` this file
* same here
* [BIG] file_utils.networking: do not gobble up errors anymore
* Fix CI 😇
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Tiny doc tweak
* Add doc + pass kwarg everywhere
* Add more tests and explain
cc @sshleifer let me know if better
Co-Authored-By: Sam Shleifer <sshleifer@gmail.com>
* Also implement revision in pipelines
In the case where we're passing a task name or a string model identifier
* Fix CI 😇
* Fix CI
* [hf_api] new methods + command line implem
* make style
* Final endpoints post-migration
* Fix post-migration
* Py3.6 compat
cc @stefan-it
Thank you @stas00
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* add a multi-gpu job for all example tests
* run only ported tests
* rename
* explain why env is re-activated on each step
* mark all unported/checked tests with @require_torch_non_multigpu_but_fix_me
* style
* Apply suggestions from code review
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* add training tests
* correct longformer
* fix docs
* fix some tests
* fix some more train tests
* remove ipdb
* fix multiple edge case model training
* fix funnel and prophetnet
* clean gpt models
* undo renaming of albert
* Add new token classification example
* Remove txt file
* Add test
* With actual testing done
* Less warmup is better
* Update examples/token-classification/run_ner_new.py
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Address review comments
* Fix test
* Make Lysandre happy
* Last touches and rename
* Rename in tests
* Address review comments
* More run_ner -> run_ner_old
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Output cross-attention with decoder attention output
* Update src/transformers/modeling_bert.py
* add cross-attention for t5 and bart as well
* fix tests
* correct typo in docs
* add sylvains and sams comments
* correct typo
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* use decorator
* remove hardcoded paths
* make the test use more data and do real quality tests
* shave off 10 secs
* add --eval_beams 2, reformat
* reduce train size, use smaller custom dataset
* Make Trainer evaluation handle dynamic seq_length
* Document behavior.
* Fix test
* Better fix
* Fixes for realsies this time
* Address review comments
* Without forgetting to save...
* Output global_attentions in Longformer models
* make style
* small refactoring
* fix tests
* make fix-copies
* add for tf as well
* remove comments in test
* make fix-copies
* make style
* add docs
* make docstring pretty
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* change TokenClassificationTask class methods to static methods
Since we do not require self in the class methods of TokenClassificationTask we should probably switch to static methods. Also, since the class TokenClassificationTask does not contain a constructor it is currently unusable as is. By switching to static methods this fixes the issue of having to document the intent of the broken class.
Also, since the get_labels and read_examples_from_file methods are ought to be implemented. Static method definitions are unchanged even after inheritance, which means that it can be overridden, similar to other class methods.
* Trigger Build
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
- The issue is that with previous code we would have the following:
```python
qa_pipeline = (...)
qa_pipeline(question="Where was he born ?", context="")
-> IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
```
The goal here is to improve this to actually return a ValueError
wherever possible.
While at it, I tried to simplify QuestionArgumentHandler's code to
make it smaller and more compat while keeping backward compat.
* Bug fix: NER pipeline shouldn't group separate entities of same type
* style fix
* [Bug Fix] Shouldn't group entities that are both 'B' even if they are same type
(B-type1 B-type1) != (B-type1 I-type1)
[Bug Fix] add an option `ignore_subwords` to ignore subsequent ##wordpieces in predictions. Because some models train on only the first token of a word and not on the subsequent wordpieces (BERT NER default). So it makes sense doing the same thing at inference time.
The simplest fix is to just group the subwords with the first wordpiece.
[TODO] how to handle ignored scores? just set them to 0 and calculate zero invariant mean ?
[TODO] handle different wordpiece_prefix ## ? possible approaches:
get it from tokenizer? but currently most tokenizers dont have a wordpiece_prefix property?
have an _is_subword(token)
[Feature add] added option to `skip_special_tokens`. Cause It was harder to remove them after grouping.
[Additional Changes] remove B/I prefix on returned grouped_entities
[Feature Request/TODO] Return indexes?
[Bug TODO] can't use fast tokenizer with grouped_entities ('BertTokenizerFast' object has no attribute 'convert_tokens_to_string')
* use offset_mapping to fix [UNK] token problem
* ignore score for subwords
* modify ner_pipeline test
* modify ner_pipeline test
* modify ner_pipeline test
* ner_pipeline change ignore_subwords default to true
* add ner_pipeline ignore_subword=False test case
* fix offset_mapping index
* fix style again duh
* change is_subword and convert_tokens_to_string logic
* merge tests with new test structure
* change test names
* remove old tests
* ner tests for fast tokenizer
* fast tokenizers have convert_tokens_to_string
* Fix the incorrect merge
Co-authored-by: Ceyda Cinarel <snu-ceyda@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* make it possible to invoke testconf.py in both test suites without crashing on having the same option added
* perl -pi -e 's|--make_reports|--make-reports|' to be consistent with other opts
* add `pytest --make-reports` to all CIs (and artifacts)
* fix
* Updated ConversationalPipeline to work with encoder-decoder models (e.g. BlenderBot)
* Addition of integration test for EncoderDecoder conversation model
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* [FIX] TextGenerationPipeline is currently broken.
It's most likely due to #8180.
What's missing is a multi vs single string handler at the beginning of
the pipe.
And also there was no testing of this pipeline.
* Fixing Conversational tests too.
* first draft
* show design proposition for new generate method
* up
* make better readable
* make first version
* gpt2 tests pass
* make beam search for gpt2 work
* add first encoder-decoder code
* delete typo
* make t5 work
* save indermediate
* make bart work with beam search
* finish beam search bart / t5
* add default kwargs
* make more tests pass
* fix no bad words sampler
* some fixes and tests for all distribution processors
* fix test
* fix rag slow tests
* merge to master
* add nograd to generate
* make all slow tests pass
* speed up generate
* fix edge case bug
* small fix
* correct typo
* add type hints and docstrings
* fix typos in tests
* add beam search tests
* add tests for beam scorer
* fix test rag
* finish beam search tests
* move generation tests in seperate file
* fix generation tests
* more tests
* add aggressive generation tests
* fix tests
* add gpt2 sample test
* add more docstring
* add more docs
* finish doc strings
* apply some more of sylvains and sams comments
* fix some typos
* make fix copies
* apply lysandres and sylvains comments
* final corrections on examples
* small fix for reformer
* Make line by line optional in run_mlm
* Add option to disable dynamic padding
* Add option to plm too and update README
* Typos
* More typos
* Even more typos
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* make sure that logging_first_step evaluates
* fix bug with incorrect loss on logging_first_step
* fix style
* logging_first_step only logs, not evals
* Minor style improvements:
1. Use `@nn.compact` rather than `@compact` (as to not make it seem
like compact is a standard Python decorator.
2. Move attribute docstrings from two `__call__` methods to comments
on the attributes themselves. (This was probably a remnant from
the pre-Linen version where the attributes were arguments to
`call`.)
* Use black on the Flax modeling code
* Replace swish with silu
* revert nn.silu to nn.swish due to older version
* simplify optimized silu conditional and fix format
* Update activations.py
* Update activations_tf.py
* Update modeling_flax_utils.py
* Update modeling_openai.py
* add swish testcase
* add pytorch swish testcase
* Add more robust python version check
* more formatting fixes
Co-authored-by: TFUsers <TFUsers@gmail.com>
* Test TF GPU CI
* Change cache
* Fix missing torch requirement
* Fix some model tests
Style
* LXMERT
* MobileBERT
* Longformer skip test
* XLNet
* The rest of the tests
* RAG goes OOM in multi gpu setup
* YAML test files
* Last fixes
* Skip doctests
* Fill mask tests
* Yaml files
* Last test fix
* Style
* Update cache
* Change ONNX tests to slow + use tiny model
* Add a template for example scripts and apply it to mlm
* Formatting
* Fix test
* Add plm script
* Add a template for example scripts and apply it to mlm
* Formatting
* Fix test
* Add plm script
* Add a template for example scripts and apply it to mlm
* Formatting
* Fix test
* Add plm script
* Styling
* Add model card for Gujarati-XLM-R-Base
* Update README.md
Add the model card for the Gujarati-XLM-R-Base.
* Apply suggestions from code review
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* move the helper code into testing_utils
* port test_trainer_distributed to work with pytest
* improve docs
* simplify notes
* doc
* doc
* style
* doc
* further improvements
* torch might not be available
* real fix
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* New run_clm script
* Formatting
* More comments
* Remove unused imports
* Apply suggestions from code review
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Address review comments
* Change link to the hub
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* first attempt to add AzureML callbacks
* func arg fix
* var name fix, but still won't fix error...
* fixing as in https://discuss.huggingface.co/t/how-to-integrate-an-azuremlcallback-for-logging-in-azure/1713/2
* Avoid lint check of azureml import
* black compliance
* Make isort happy
* Fix point typo in docs
* Add AzureML to Callbacks docs
* Attempt to make sphinx happy
* Format callback docs
* Make documentation style happy
* Make docs compliant to style
Co-authored-by: Davide Fiocco <davide.fiocco@frontiersin.net>
* better reports
* a whole bunch of reports in their own files
* clean up
* improvements
* github artifacts experiment
* style
* complete the report generator with multiple improvements/fixes
* fix
* save all reports under one dir to easy upload
* can remove temp failing tests
* doc fix
* some cleanup
* Fix comet_ml import and add ensure availability
* Make isort happy
* Make flake8 happy
* Don't show comet_ml warn if COMET_MODE=DISABLED
* Make isort happy
* Important files
* Styling them all
* Revert "Styling them all"
This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e.
* Syling them for realsies
* Fix syntax error
* Fix benchmark_utils
* More fixes
* Fix modeling auto and script
* Remove new line
* Fixes
* More fixes
* Fix more files
* Style
* Add FSMT
* More fixes
* More fixes
* More fixes
* More fixes
* Fixes
* More fixes
* More fixes
* Last fixes
* Make sphinx happy
* mc for new cross lingual sentence model
* fat text
* url spelling fix
* more url spelling fixes
* slight thanks change
* small improvements in text
* multilingual word xchange
* change colab link
* xval fold number
* add model links
* line break in model names
* Update README.md
* Update README.md
* new examples link
* new examples link
* add evaluation dataset name
* add more about multi lingual
* typo fix
* typo
* typos
* hyperparameter typos
* hyperparameter typo
* add metadata
* add metadata
* Update README.md
* typo fix
* Small improvement
* Add MLflow integration class
Add integration code for MLflow in integrations.py along with the code
that checks that MLflow is installed.
* Add MLflowCallback import
Add import of MLflowCallback in trainer.py
* Handle model argument
Allow the callback to handle model argument and store model config items as hyperparameters.
* Log parameters to MLflow in batches
MLflow cannot log more than a hundred parameters at once.
Code added to split the parameters into batches of 100 items and log the batches one by one.
* Fix style
* Add docs on MLflow callback
* Fix issue with unfinished runs
The "fluent" api used in MLflow integration allows only one run to be active at any given moment. If the Trainer is disposed off and a new one is created, but the training is not finished, it will refuse to log the results when the next trainer is created.
* Add MLflow integration class
Add integration code for MLflow in integrations.py along with the code
that checks that MLflow is installed.
* Add MLflowCallback import
Add import of MLflowCallback in trainer.py
* Handle model argument
Allow the callback to handle model argument and store model config items as hyperparameters.
* Log parameters to MLflow in batches
MLflow cannot log more than a hundred parameters at once.
Code added to split the parameters into batches of 100 items and log the batches one by one.
* Fix style
* Add docs on MLflow callback
* Fix issue with unfinished runs
The "fluent" api used in MLflow integration allows only one run to be active at any given moment. If the Trainer is disposed off and a new one is created, but the training is not finished, it will refuse to log the results when the next trainer is created.
* Make Seq2Seq Trainer more similar to Trainer
* fix typo
* fix seq2seq trainer
* remove from tests
* remove lock
* remove train files
* delete test files
* correct typo
* check at init
* make sure trainer is not slowed down on TPU
* correct isort
* remove use cache
* fix use cache
* add last use chache = false
* model card German Sentence Embeddings V2
- for German RoBERTa for Sentence Embeddings V2
- marked old as outdated
* small correction
* small improvement in description
* small spelling fix
* spelling fix
* add evaluation results
* spearman explanation
* add number of trials
Updating the run_squad training script to handle the "longformer" `model_type`. The longformer is trained in the same was as RoBERTa, so I've added the "longformer" `model_type` (that's the right hugginface name for the LongFormer model, right?) everywhere there was a "roberta" `model_type` reference. The longformer (like RoBERTa) doesn't use `token_type_ids` (as I understand from looking at the [longformer notebook](https://github.com/patil-suraj/Notebooks/blob/master/longformer_qa_training.ipynb), which is what gets updated after this change.
This fix might be related to [this issue](https://github.com/huggingface/transformers/issues/7249) with SQuAD training when using run_squad.py
* WIP refactoring pipeline tests - switching to fast tokenizers
* fix dialog pipeline and fill-mask
* refactoring pipeline tests backbone
* make large tests slow
* fix tests (tf Bart inactive for now)
* fix doc...
* clean up for merge
* fixing tests - remove bart from summarization until there is TF
* fix quality and RAG
* Add new translation pipeline tests - fix JAX tests
* only slow for dialog
* Fixing the missing TF-BART imports in modeling_tf_auto
* spin out pipeline tests in separate CI job
* adding pipeline test to CI YAML
* add slow pipeline tests
* speed up tf and pt join test to avoid redoing all the standalone pt and tf tests
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* Update src/transformers/pipelines.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/pipelines.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add require_torch and require_tf in is_pt_tf_cross_test
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Start simplification
* More progress
* Finished script
* Address comments and update tests instructions
* Wrong test
* Accept files as inputs and fix test
* Update src/transformers/trainer_utils.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Fix labels and add combined score
* Add special labels
* Update TPU command
* Revert to old label strategy
* Use model labels
* Fix for STT-B
* Styling
* Apply suggestions from code review
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Code styling
* Fix review comments
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Actually make the "translation", "translation_XX_to_YY" task behave correctly.
Background:
- Currently "translation_cn_to_ar" does not work. (only 3 pairs are
supported)
- Some models, contain in their config the correct values for the (src,
tgt) pair they can translate. It's usually just one pair, and we can
infer it automatically from the `model.config.task_specific_params`. If
it's not defined we can still probably load the TranslationPipeline
nevertheless.
Proposed fix:
- A simplified version of what could become more general which is
a `parametrized` task. "translation" + (src, tgt) in this instance
it what we need in the general case. The way we go about it for now
is simply parsing "translation_XX_to_YY". If cases of parametrized task arise
we should preferably go in something closer to what `datasets` propose
which is having a secondary argument `task_options`? that will be close
to what that task requires.
- Should be backward compatible in all cases for instance
`pipeline(task="translation_en_to_de") should work out of the box.
- Should provide a warning when a specific translation pair has been
selected on behalf of the user using
`model.config.task_specific_params`.
* Update src/transformers/pipelines.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* fix config save
* add test
* add config class variable and another test
* line break
* fix fsmt and typo
* god am I making many errors today :-/
* Update src/transformers/configuration_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Looking at the current community notebooks, it seems that few are targeted for absolute beginners and even fewer are written with TensorFlow. This notebook describes absolutely everything a beginner would need to know, including how to save/load their model and use it for new predictions (this is often omitted in tutorials)
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* slow tests should be slow
* exception note
* style
* integrate LysandreJik's notes with some expansions
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* another slow test
* fix link, and prose
* clarify.
* note from Sam
* typo
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* make the save_load special key tests common
* handle mbart
* cleaner solution
* fix
* move test_save_load_missing_keys back into fstm for now
* restore
* style
* add marian
* add pegasus
* blenderbot
* revert - no static embed
I'm using transformers 3.3.1 and run a training script with `--save_total_limit 3`. I hit the exception below, and after debugging the code found that it wrongly tries to index into the `best_model_checkpoint`'s *str* rather than the `sorted_checkpoints` array. When running without the fix I got this exception:
```
Traceback (most recent call last):
File "/<HOME>/.conda/envs/transformers/lib/python3.7/site-packages/transformers/trainer.py", line 921, in _save_training
self._rotate_checkpoints(use_mtime=True)
File "/<HOME>/.conda/envs/transformers/lib/python3.7/site-packages/transformers/trainer.py", line 1283, in _rotate_checkpoints
checkpoints_sorted = self._sorted_checkpoints(use_mtime=use_mtime)
File "/<HOME>/.conda/envs/transformers/lib/python3.7/site-packages/transformers/trainer.py", line 1274, in _sorted_checkpoints
checkpoints_sorted[best_model_index],
TypeError: 'str' object does not support item assignment
```
Seeing error when sending `decoder_config` as a parameter while initializing a encoder-decoder model from pretrained.
fixed "UnboundLocalError: local variable 'decoder_config' referenced before assignment"
* add CustomHFIndex
* typo in config
* update tests
* add custom dataset example
* clean script
* update test data
* minor in test
* docs
* docs
* style
* fix imports
* allow to pass the indexed dataset directly
* update tests
* use multiset DPR
* address thom and patrick's comments
* style
* update dpr tokenizer
* add output_dir flag in use_own_knowledge_dataset.py
* allow custom datasets in examples/rag/finetune.py
* add test for custom dataset in distributed rag retriever
* fix 5990
* accomodate iterable dataset without predefined length
* set it as 1 use case: provide max_steps, and NO num_epochs
* Is a merge of master and PR 5995
* fix trainer test under TF
* fix only for torch
* TF trainer untouched
* trainer tests are skipped when no torch
* address comments
* fix quality checks
* remove torch.dataset from test_trainer
* unnecessary inheritance
* RegressionDataset implements all needed methods __len__ and __getitem__
* fix quality checks
* restore RegressionDataset
* was wrongly under is_torch_available()
* WIP flax bert
* Initial commit Bert Jax/Flax implementation.
* Embeddings working and equivalent to PyTorch.
* Move embeddings in its own module BertEmbeddings
* Added jax.jit annotation on forward call
* BertEncoder on par with PyTorch ! :D
* Add BertPooler on par with PyTorch !!
* Working Jax+Flax implementation of BertModel with < 1e-5 differences on the last layer.
* Fix pooled output to take only the first token of the sequence.
* Refactoring to use BertConfig from transformers.
* Renamed FXBertModel to FlaxBertModel
* Model is now initialized in FlaxBertModel constructor and reused.
* WIP JaxPreTrainedModel
* Cleaning up the code of FlaxBertModel
* Added ability to load Flax model saved through save_pretrained()
* Added ability to convert Pytorch Bert model to FlaxBert
* FlaxBert can now load every Pytorch Bert model with on-the-fly conversion
* Fix hardcoded shape values in conversion scripts.
* Improve the way we handle LayerNorm conversion from PyTorch to Flax.
* Added positional embeddings as parameter of BertModel with default to np.arange.
* Let's roll FlaxRoberta !
* Fix missing position_ids parameters on predict for Bert
* Flax backend now supports batched inputs
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Make it possible to load msgpacked model on convert from pytorch in last resort.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Moved save_pretrained to Jax base class along with more constructor parameters.
* Use specialized, model dependent conversion functio.
* Expose `is_flax_available` in file_utils.
* Added unittest for Flax models.
* Added run_tests_flax to the CI.
* Introduce FlaxAutoModel
* Added more unittests
* Flax model reference the _MODEL_ARCHIVE_MAP from PyTorch model.
* Addressing review comments.
* Expose seed in both Bert and Roberta
* Fix typo suggested by @stefan-it
Co-Authored-By: Stefan Schweter <stefan@schweter.it>
* Attempt to make style
* Attempt to make style in tests too
* Added jax & jaxlib to the flax optional dependencies.
* Attempt to fix flake8 warnings ...
* Redo black again and again
* When black and flake8 fight each other for a space ... 💥💥💥
* Try removing trailing comma to make both black and flake happy!
* Fix invalid is_<framework>_available call, thanks @LysandreJik 🎉
* Fix another invalid import in flax_roberta test
* Bump and pin flax release to 0.1.0.
* Make flake8 happy, remove unused jax import
* Change the type of the catch for msgpack.
* Remove unused import.
* Put seed as optional constructor parameter.
* trigger ci again
* Fix too much parameters in BertAttention.
* Formatting.
* Simplify Flax unittests to avoid machine crashes.
* Fix invalid number of arguments when raising issue for an unknown model.
* Address @bastings comment in PR, moving jax.jit decorated outside of __call__
* Fix incorrect path to require_flax/require_pytorch functions.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Attempt to make style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Correct rebasing of circle-ci dependencies
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix import sorting.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix unused imports.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Again import sorting...
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Installing missing nlp dependency for flax unittests.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix laoding of model for Flax implementations.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* jit the inner function call to make JAX-compatible
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Format !
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Flake one more time 🎶
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Rewrites BERT in Flax to the new Linen API (#7211)
* Rewrite Flax HuggingFace PR to Linen
* Some fixes
* Fix tests
* Fix CI with change of name of nlp (#7054)
* nlp -> datasets
* More nlp -> datasets
* Woopsie
* More nlp -> datasets
* One last
* Expose `is_flax_available` in file_utils.
* Added run_tests_flax to the CI.
* Attempt to make style
* trigger ci again
* Fix import sorting.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Revert "Rewrites BERT in Flax to the new Linen API (#7211)"
This reverts commit 23703a5eb3364e26a1cbc3ee34b4710d86a674b0.
* Remove jnp.lax references
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Reintroduce Linen changes ...
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Use jax native's gelu function.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Renaming BertModel to BertModule to highlight the fact this is the Flax Module object.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Rewrite FlaxAutoModel test to not rely on pretrained_model_archive_map
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove unused variable in BertModule.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove unused variable in BertModule again
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Attempt to have is_flax_available working again.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Introduce JAX TensorType
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Improve ImportError message when trying to convert to various TensorType format.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Makes Flax model jittable.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Ensure flax models are jittable in unittests.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Remove unused imports.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Ensure jax imports are guarded behind is_flax_available.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style again
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style again again
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style again again again
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Update src/transformers/file_utils.py
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
* Bump flax to it's latest version
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
* Bump jax version to at least 0.2.0
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Update the unittest to use TensorType.JAX
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* isort import in tests.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Match new flax parameters name "params"
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove unused imports.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Add flax models to transformers __init__
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Attempt to address all CI related comments.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Correct circle.yml indent.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Correct circle.yml indent (2)
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove coverage from flax tests
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Addressing many naming suggestions from comments
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Simplify for loop logic to interate over layers in FlaxBertLayerCollection
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* use f-string syntax for formatting logs.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Use config property from FlaxPreTrainedModel.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* use "cls_token" instead of "first_token" variable name.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* use "hidden_state" instead of "h" variable name.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Correct class reference in docstring to link to Flax related modules.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added HF + Google Flax team copyright.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make Roberta independent from Bert
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Move activation functions to flax_utils.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Move activation functions to flax_utils for bert.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added docstring for BERT
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Update import for Bert and Roberta tokenizers
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* fix-copies
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Correct FlaxRobertaLayer to match PyTorch.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Use the same store_artifact for flax unittest
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Style.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make sure gradient are disabled only locally for flax unittest using torch equivalence.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Use relative imports
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
Co-authored-by: Stefan Schweter <stefan@schweter.it>
Co-authored-by: Marc van Zee <marcvanzee@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Propagating n_docs as parameter to all RagModel's related functions that defaults to self.config.n_docs
* Making n_docs parameter's default value to None in marginalize function
* Fixing code quality issues
* Handle the special case when generator is of T5PreTrainedModel instance type. T5PreTrainedModel do not have n_docs as parameter
* T5PreTrainedModel do not have n_docs as parameter
* Addressing review comment
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Correcting comment by addressing review comment
* Adding assert statement verifying that n_docs is correctly set. n_docs should be the same for both retriever and generator.
* Fixing flake8 reported issue
* Correcting test datasets for rag
* Using doc_scores instead of context_input_ids to check assert as in RagSequenceForGeneration context_input_ids can be null
* doc_scores second dimension have number of retrieved docs
* Changing assert comment
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* splitting fast and slow tokenizers [WIP]
* [WIP] splitting sentencepiece and tokenizers dependencies
* update dummy objects
* add name_or_path to models and tokenizers
* prefix added to file names
* prefix
* styling + quality
* spliting all the tokenizer files - sorting sentencepiece based ones
* update tokenizer version up to 0.9.0
* remove hard dependency on sentencepiece 🎉
* and removed hard dependency on tokenizers 🎉
* update conversion script
* update missing models
* fixing tests
* move test_tokenization_fast to main tokenization tests - fix bugs
* bump up tokenizers
* fix bert_generation
* update ad fix several tokenizers
* keep sentencepiece in deps for now
* fix funnel and deberta tests
* fix fsmt
* fix marian tests
* fix layoutlm
* fix squeezebert and gpt2
* fix T5 tokenization
* fix xlnet tests
* style
* fix mbart
* bump up tokenizers to 0.9.2
* fix model tests
* fix tf models
* fix seq2seq examples
* fix tests without sentencepiece
* fix slow => fast conversion without sentencepiece
* update auto and bert generation tests
* fix mbart tests
* fix auto and common test without tokenizers
* fix tests without tokenizers
* clean up tests lighten up when tokenizers + sentencepiece are both off
* style quality and tests fixing
* add sentencepiece to doc/examples reqs
* leave sentencepiece on for now
* style quality split hebert and fix pegasus
* WIP Herbert fast
* add sample_text_no_unicode and fix hebert tokenization
* skip FSMT example test for now
* fix style
* fix fsmt in example tests
* update following Lysandre and Sylvain's comments
* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* HerBERT transformer model for Polish language understanding.
* HerbertTokenizerFast generated with HerbertConverter
* Herbert base and large model cards
* Herbert model cards with tags
* Herbert tensorflow models
* Herbert model tests based on Bert test suit
* src/transformers/tokenization_herbert.py edited online with Bitbucket
* src/transformers/tokenization_herbert.py edited online with Bitbucket
* docs/source/model_doc/herbert.rst edited online with Bitbucket
* Herbert tokenizer tests and bug fixes
* src/transformers/configuration_herbert.py edited online with Bitbucket
* Copyrights and tests for TFHerbertModel
* model_cards/allegro/herbert-base-cased/README.md edited online with Bitbucket
* model_cards/allegro/herbert-large-cased/README.md edited online with Bitbucket
* Bug fixes after testing
* Reformat modified_only_fixup
* Proper order of configuration
* Herbert proper documentation formatting
* Formatting with make modified_only_fixup
* Dummies fixed
* Adding missing models to documentation
* Removing HerBERT model as it is a simple extension of BERT
* Update model_cards/allegro/herbert-base-cased/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Update model_cards/allegro/herbert-large-cased/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* HerbertTokenizer deprecated configuration removed
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
in `tests/test_utils_check_copies.py` I was getting intermittently:
```
utils/check_copies.py:52
/mnt/nvme1/code/transformers-comet/utils/check_copies.py:52: DeprecationWarning: invalid escape sequence \s
while line_index < len(lines) and re.search(f"^{indent}(class|def)\s+{name}", lines[line_index]) is None:
```
So this should fix it.
* model card for bert-base-NER
* add meta data up top
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
- TFAutoModelForCausalLM
- TFAutoModelForMaskedLM
- TFAutoModelForSeq2SeqLM
as per deprecation warning. No tests as it simply removes current
warnings from tests.
* Improving Pipelines by defaulting to framework='tf' when
pytorch seems unavailable.
* Actually changing the default resolution order to account for model
defaults
Adding a new tests for each pipeline to check that pipeline(task) works
too without manually adding the framework too.
* use DDP no_sync when possible
* fix is_nlp_available addition mistake
* reformat trainer.py
* reformat trainer.py
* drop support for pytorch < 1.2
* return support for pytorch < 1.2
* Add Documentation for GPT-1 Classification
* Add GPT-1 with Classification head
* Add tests for GPT-1 Classification
* Add GPT-1 For Classification to auto models
* Remove authorized missing keys, change checkpoint to openai-gpt
Added is_torch_tpu_available() to the condition
for saving a model as xla model. "xla_device"
property of config can also be True on a non-xla
device, when loading a checkpointthat was trained
on xla before.
Resolves#7695
* Import intergration libraries first
* isort and black happiness
* flake8 happiness
* Add a test
* Black reformat
* Ignore import order in tests
* A heavy-handed method of disabling comet for tests
* Remove comet_ml tests
* Run black on setup.py
* Reintroduce clean_text call which was removed by mistake in #4723
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added unittest for clean_text parameter on Bert tokenizer.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Better unittest name.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Adapt unittest to use untrained tokenizer.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Code quality + update test
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* [WIP] SP tokenizers
* fixing tests for T5
* WIP tokenizers
* serialization
* update T5
* WIP T5 tokenization
* slow to fast conversion script
* Refactoring to move tokenzier implementations inside transformers
* Adding gpt - refactoring - quality
* WIP adding several tokenizers to the fast world
* WIP Roberta - moving implementations
* update to dev4 switch file loading to in-memory loading
* Updating and fixing
* advancing on the tokenizers - updating do_lower_case
* style and quality
* moving forward with tokenizers conversion and tests
* MBart, T5
* dumping the fast version of transformer XL
* Adding to autotokenizers + style/quality
* update init and space_between_special_tokens
* style and quality
* bump up tokenizers version
* add protobuf
* fix pickle Bert JP with Mecab
* fix newly added tokenizers
* style and quality
* fix bert japanese
* fix funnel
* limite tokenizer warning to one occurence
* clean up file
* fix new tokenizers
* fast tokenizers deep tests
* WIP adding all the special fast tests on the new fast tokenizers
* quick fix
* adding more fast tokenizers in the fast tests
* all tokenizers in fast version tested
* Adding BertGenerationFast
* bump up setup.py for CI
* remove BertGenerationFast (too early)
* bump up tokenizers version
* Clean old docstrings
* Typo
* Update following Lysandre comments
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
* Replaced torch.load for loading the pretrained vocab of TransformerXL to pickle.load
* Replaced torch.save with pickle.dump when saving the vocabulary
* updating transformer-xl
* uploaded on S3 - compatibility
* fix tests
* style
* Address review comments
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* [Model card] SinhalaBERTo model.
This is the model card for keshan/SinhalaBERTo model.
* Update model_cards/keshan/SinhalaBERTo/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Initial callback proposal
* Finish various callbacks
* Post-rebase conflicts
* Fix tests
* Don't use something that's not set
* Documentation
* Remove unwanted print.
* Document all models can work
* Add tests + small fixes
* Update docs/source/internal/trainer_utils.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments
* Fix TF tests
* Real fix this time
* This one should work
* Fix typo
* Really fix typo
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
- Use cuda:10.2 image instead of 10.1 (to address version mismatch
warning with pytorch)
- Use devel version that is built on the runtime and includes headers
and development tools (was otherwise failing to build apex)
* Create README.md
Model description for all LEGAL-BERT models, published as part of "LEGAL-BERT: The Muppets straight out of Law School". Chalkidis et al., 2018, In Findings of EMNLP 2020
* Update model_cards/nlpaueb/legal-bert-base-uncased/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Fixing top_k and min_length assertions, and a typo fix
* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* PoC on RAG
* Format class name/obj name
* Better name in message
* PoC on one TF model
* Add PyTorch and TF dummy objects + script
* Treat scikit-learn
* Bad copy pastes
* Typo
'The class `AutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `AutoModelForCausalLM` for causal language models, `AutoModelForMaskedLM` for masked language models and `AutoModelForSeq2SeqLM` for encoder-decoder models.'
I dont know how to change the 'How to use this model directly from the 🤗/transformers library:' part since it is not part of the model-paper
* 🚩 Add `power` argument for TF PolynomialDecay
* 🚩 Create default optimizer with power
* 🚩 Add argument to training args
* 🚨 Clean code format
* 🚨 Fix black warning
* 🚨 Fix code format
* configuration_squeezebert.py
thin wrapper around bert tokenizer
fix typos
wip sb model code
wip modeling_squeezebert.py. Next step is to get the multi-layer-output interface working
set up squeezebert to use BertModelOutput when returning results.
squeezebert documentation
formatting
allow head mask that is an array of [None, ..., None]
docs
docs cont'd
path to vocab
docs and pointers to cloud files (WIP)
line length and indentation
squeezebert model cards
formatting of model cards
untrack modeling_squeezebert_scratchpad.py
update aws paths to vocab and config files
get rid of stub of NSP code, and advise users to pretrain with mlm only
fix rebase issues
redo rebase of modeling_auto.py
fix issues with code formatting
more code format auto-fixes
move squeezebert before bert in tokenization_auto.py and modeling_auto.py because squeezebert inherits from bert
tests for squeezebert modeling and tokenization
fix typo
move squeezebert before bert in modeling_auto.py to fix inheritance problem
disable test_head_masking, since squeezebert doesn't yet implement head masking
fix issues exposed by the test_modeling_squeezebert.py
fix an issue exposed by test_tokenization_squeezebert.py
fix issue exposed by test_modeling_squeezebert.py
auto generated code style improvement
issue that we inherited from modeling_xxx.py: SqueezeBertForMaskedLM.forward() calls self.cls(), but there is no self.cls, and I think the goal was actually to call self.lm_head()
update copyright
resolve failing 'test_hidden_states_output' and remove unused encoder_hidden_states and encoder_attention_mask
docs
add integration test. rename squeezebert-mnli --> squeezebert/squeezebert-mnli
autogenerated formatting tweaks
integrate feedback from patrickvonplaten and sgugger to programming style and documentation strings
* tiny change to order of imports
* LayoutLM: add exception handling for bbox values
To replicate unhandled error:
- In `test_modelling_layoutlm.py` set `range_bbox=1025`, i.e. greater 1024
- Run `pytest tests/test_modeling_layoutlm.py`
Requirement for bbox values to be within the range 0-1000 is documented
but if it is violated then it isa not clear what is the issue from error
message.
* Update src/transformers/modeling_layoutlm.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Trainer should not modify its TrainingArguments
* Trainer should not modify its TrainingArguments
* Trainer should not modify its TrainingArguments
* Add test of resumed training
* Fixes
* Non multiGPU test
* Clean Trainer state
* Add more to the state
* Documentation
* One last test
* Make resume training test more complete
* Unwanted changes
* t5 t5 community notebook added
* author link updated
* t5 t5 community notebook added
* author link updated
* new colab link updated
Co-authored-by: harris <muhammad.harris@visionx.io>
* GPT2 gradient checkpointing
* find_unused_parameters removed if checkpointing
* find_unused_parameters removed if checkpointing
* Update src/transformers/configuration_gpt2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Added a test for generation with checkpointing
* Update src/transformers/configuration_gpt2.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [makefile] check/fix only modified since branching files
* fix phonies
* parametrize dirs
* have only one source for dirs to check
* look ma, no autoformatters here
* [code quality] merge style and quality targets
Any reason why we don't run `flake8` in `make style`? I find myself needing to run `make style` and `make quality` all the time, but I need the latter just for the last 2 checks. Since we have no control over the source code why bother with separating checking and fixing - let's just have one target that fixes and then performs the remaining checks, as we know the first two have been done already.
This PR suggests to merge the 2 targets into one efficient target.
I will edit the docs if this change resonates with the team.
* move checks into style, re-use target
* better name
* add fixup target
* document new target
Previously, the TFTrainingArguments object did a check to see if XLA was enabled, but did this by referencing `self.args.xla`, when it should be `self.xla`, because it is the args object. This can be verified a few lines above, where the XLA field is set.
* Changed name to all no_... arguments and all references to them, inverting the boolean condition
* Change benchmark tests to use new Benchmark Args
* Update src/transformers/benchmark/benchmark_args_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/benchmark/benchmark.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix Style. Add --no options in help
* fix some part of tests
* Update src/transformers/benchmark/benchmark_args_utils.py
* Update src/transformers/benchmark/benchmark_args_utils.py
* Update src/transformers/benchmark/benchmark_args_utils.py
* fix all tests
* make style
* add backwards compability
* make backwards compatible
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: fmcurti <fcurti@DESKTOP-RRQURBM.localdomain>
* Ensure that intergrations are imported before transformers or ml libs
* Black reformatter wanted a newline
* isort requests
* black requests
* flake8 requests
* Clean up model documentation
* Formatting
* Preparation work
* Long lines
* Main work on rst files
* Cleanup all config files
* Syntax fix
* Clean all tokenizers
* Work on first models
* Models beginning
* FaluBERT
* All PyTorch models
* All models
* Long lines again
* Fixes
* More fixes
* Update docs/source/model_doc/bert.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update docs/source/model_doc/electra.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Last fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* skip decorators: docs, tests, bugs
* another important note
* style
* bloody style
* add @pytest.mark.parametrize
* add note
* no idea what it wants :(
* fix confused flake
We run `black --target-version py35 ...` but flake8 doesn't know that, so currently with py38 flake8 fails suggesting that black should have reformatted 63 files. Indeed if I run:
```
black --line-length 119 --target-version py38 examples templates tests src utils
```
it indeed reformats 63 files.
The only solution I found is to create a black config file as explained at https://github.com/psf/black#configuration-format, which is what this PR adds.
Now flake8 knows that py35 is the standard and no longer gets confused regardless of the user's python version.
* adjust the other files that will now rely on black's config file
* Add dataloader_num_workers to TrainingArguments
This argument is meant to be used to set the
number of workers for the PyTorch DataLoader.
* Pass num_workers argument on DataLoader init
* added rag WIP
* path fix
* Formatting / renaming prior to actual work
* added rag WIP
* path fix
* Formatting / renaming prior to actual work
* added rag WIP
* path fix
* Formatting / renaming prior to actual work
* added rag WIP
* Formatting / renaming prior to actual work
* First commit
* improve comments
* Retrieval evaluation scripts
* refactor to include modeling outputs + MPI retriever
* Fix rag-token model + refactor
* Various fixes + finetuning logic
* use_bos fix
* Retrieval refactor
* Finetuning refactoring and cleanup
* Add documentation and cleanup
* Remove set_up_rag_env.sh file
* Fix retrieval wit HF index
* Fix import errors
* Fix quality errors
* Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867
* fix quality
* Fix RAG Sequence generation
* minor cleanup plus initial tests
* fix test
* fix tests 2
* Comments fix
* post-merge fixes
* Improve readme + post-rebase refactor
* Extra dependencied for tests
* Fix tests
* Fix tests 2
* Refactor test requirements
* Fix tests 3
* Post-rebase refactor
* rename nlp->datasets
* RAG integration tests
* add tokenizer to slow integration test and allow retriever to run on cpu
* add tests; fix position ids warning
* change structure
* change structure
* add from encoder generator
* save working solution
* make all integration tests pass
* add RagTokenizer.save/from_pretrained and RagRetriever.save/from_pretrained
* don't save paths
* delete unnecessary imports
* pass config to AutoTokenizer.from_pretrained for Rag tokenizers
* init wiki_dpr only once
* hardcode legacy index and passages paths (todo: add the right urls)
* finalize config
* finalize retriver api and config api
* LegacyIndex index download refactor
* add dpr to autotokenizer
* make from pretrained more flexible
* fix ragfortokengeneration
* small name changes in tokenizer
* add labels to models
* change default index name
* add retrieval tests
* finish token generate
* align test with previous version and make all tests pass
* add tests
* finalize tests
* implement thoms suggestions
* add first version of test
* make first tests work
* make retriever platform agnostic
* naming
* style
* add legacy index URL
* docstrings + simple retrieval test for distributed
* clean model api
* add doc_ids to retriever's outputs
* fix retrieval tests
* finish model outputs
* finalize model api
* fix generate problem for rag
* fix generate for other modles
* fix some tests
* save intermediate
* set generate to default
* big refactor generate
* delete rag_api
* correct pip faiss install
* fix auto tokenization test
* fix faiss install
* fix test
* move the distributed logic to examples
* model page
* docs
* finish tests
* fix dependencies
* fix import in __init__
* Refactor eval_rag and finetune scripts
* start docstring
* add psutil to test
* fix tf test
* move require torch to top
* fix retrieval test
* align naming
* finish automodel
* fix repo consistency
* test ragtokenizer save/load
* add rag model output docs
* fix ragtokenizer save/load from pretrained
* fix tokenizer dir
* remove torch in retrieval
* fix docs
* fixe finetune scripts
* finish model docs
* finish docs
* remove auto model for now
* add require torch
* remove solved todos
* integrate sylvains suggestions
* sams comments
* correct mistake on purpose
* improve README
* Add generation test cases
* fix rag token
* clean token generate
* fix test
* add note to test
* fix attention mask
* add t5 test for rag
* Fix handling prefix in finetune.py
* don't overwrite index_name
Co-authored-by: Patrick Lewis <plewis@fb.com>
Co-authored-by: Aleksandra Piktus <piktus@devfair0141.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5102.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5067.h2.fair>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>
* Copy code from Bert to Roberta and add safeguard script
* Fix docstring
* Comment code
* Formatting
* Update src/transformers/modeling_roberta.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Add test and fix bugs
* Fix style and make new comand
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* fix USE_CUDA, add pipeline
* USE_CUDA fix
* recode SinusoidalPositionalEmbedding into nn.Embedding subclass
was needed for torchscript to work - this is now part of the state_dict, so will have to remove these keys during save_pretrained
* back out (ci debug)
* restore
* slow last?
* facilitate not saving certain keys and test
* remove no longer used keys
* style
* fix logging import
* cleanup
* Update src/transformers/modeling_utils.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* fix bug in max_positional_embeddings
* rename keys to keys_to_never_save per suggestion, improve the setup
* Update src/transformers/modeling_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Two new pre-trained models "vinai/bertweet-covid19-base-cased" and "vinai/bertweet-covid19-base-uncased" are resulted by further pre-training the pre-trained model "vinai/bertweet-base" on a corpus of 23M COVID-19 English Tweets for 40 epochs.
* Add BERTweet and PhoBERT models
* Update modeling_auto.py
Re-add `bart` to LM_MAPPING
* Update tokenization_auto.py
Re-add `from .configuration_mobilebert import MobileBertConfig`
not sure why it's replaced by `from transformers.configuration_mobilebert import MobileBertConfig`
* Add BERTweet and PhoBERT to pretrained_models.rst
* Update tokenization_auto.py
Remove BertweetTokenizer and PhobertTokenizer out of tokenization_auto.py (they are currently not supported by AutoTokenizer.
* Update BertweetTokenizer - without nltk
* Update model card for BERTweet
* PhoBERT - with Auto mode - without import fastBPE
* PhoBERT - with Auto mode - without import fastBPE
* BERTweet - with Auto mode - without import fastBPE
* Add PhoBERT and BERTweet to TF modeling auto
* Improve Docstrings for PhobertTokenizer and BertweetTokenizer
* Update PhoBERT and BERTweet model cards
* Fixed a merge conflict in tokenization_auto
* Used black to reformat BERTweet- and PhoBERT-related files
* Used isort to reformat BERTweet- and PhoBERT-related files
* Reformatted BERTweet- and PhoBERT-related files based on flake8
* Updated test files
* Updated test files
* Updated tf test files
* Updated tf test files
* Updated tf test files
* Updated tf test files
* Update commits from huggingface
* Delete unnecessary files
* Add tokenizers to auto and init files
* Add test files for tokenizers
* Revised model cards
* Update save_vocabulary function in BertweetTokenizer and PhobertTokenizer and test files
* Revised test files
* Update orders of Phobert and Bertweet tokenizers in auto tokenization file
* [model cards] ported allenai Deep Encoder, Shallow Decoder models
* typo
* fix references
* add allenai/wmt19-de-en-6-6 model cards
* fill-in the missing info for the build script as provided by the searcher.
* ready for PR
* cleanup
* correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST
* fix
* perfectionism
* revert change from another PR
* odd, already committed this one
* non-interactive upload workaround
* backup the failed experiment
* store langs in config
* workaround for localizing model path
* doc clean up as in https://github.com/huggingface/transformers/pull/6956
* style
* back out debug mode
* document: run_eval.py --num_beams 10
* remove unneeded constant
* typo
* re-use bart's Attention
* re-use EncoderLayer, DecoderLayer from bart
* refactor
* send to cuda and fp16
* cleanup
* revert (moved to another PR)
* better error message
* document run_eval --num_beams
* solve the problem of tokenizer finding the right files when model is local
* polish, remove hardcoded config
* add a note that the file is autogenerated to avoid losing changes
* prep for org change, remove unneeded code
* switch to model4.pt, update scores
* s/python/bash/
* missing init (but doesn't impact the finetuned model)
* cleanup
* major refactor (reuse-bart)
* new model, new expected weights
* cleanup
* cleanup
* full link
* fix model type
* merge porting notes
* style
* cleanup
* have to create a DecoderConfig object to handle vocab_size properly
* doc fix
* add note (not a public class)
* parametrize
* - add bleu scores integration tests
* skip test if sacrebleu is not installed
* cache heavy models/tokenizers
* some tweaks
* remove tokens that aren't used
* more purging
* simplify code
* switch to using decoder_start_token_id
* add doc
* Revert "major refactor (reuse-bart)"
This reverts commit 226dad15ca6a9ef4e26178526e878e8fc5c85874.
* decouple from bart
* remove unused code #1
* remove unused code #2
* remove unused code #3
* update instructions
* clean up
* move bleu eval to examples
* check import only once
* move data+gen script into files
* reuse via import
* take less space
* add prepare_seq2seq_batch (auto-tested)
* cleanup
* recode test to use json instead of yaml
* ignore keys not needed
* use the new -y in transformers-cli upload -y
* [xlm tok] config dict: fix str into int to match definition (#7034)
* [s2s] --eval_max_generate_length (#7018)
* Fix CI with change of name of nlp (#7054)
* nlp -> datasets
* More nlp -> datasets
* Woopsie
* More nlp -> datasets
* One last
* extending to support allen_nlp wmt models
- allow a specific checkpoint file to be passed
- more arg settings
- scripts for allen_nlp models
* sync with changes
* s/fsmt-wmt/wmt/ in model names
* s/fsmt-wmt/wmt/ in model names (p2)
* s/fsmt-wmt/wmt/ in model names (p3)
* switch to a better checkpoint
* typo
* make non-optional args such - adjust tests where possible or skip when there is no other choice
* consistency
* style
* adjust header
* cards moved (model rename)
* use best custom hparams
* update info
* remove old cards
* cleanup
* s/stas/facebook/
* update scores
* s/allen_nlp/allenai/
* url maps aren't needed
* typo
* move all the doc / build /eval generators to their own scripts
* cleanup
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* fix indent
* duplicated line
* style
* use the correct add_start_docstrings
* oops
* resizing can't be done with the core approach, due to 2 dicts
* check that the arg is a list
* style
* style
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Removed 'tgt_len' and 'ext_len' from Transfomer-XL
* Some changes are still to be done
* Removed 'tgt_len' and 'ext_len' from Transfomer-XL (2)
* Removed comments
* Fixed quality
* Changed warning to info
* added multilabel classification using distilbert notebook to community notebooks
* added multilabel classification using distilbert notebook to community notebooks
```
/home/circleci/.local/lib/python3.6/site-packages/isort/main.py:915: UserWarning: W0501: The following deprecated CLI flags were used and ignored: --recursive!
"W0501: The following deprecated CLI flags were used and ignored: "
```
* fix ZeroDivisionError and epoch counting
* Add test for num_train_epochs calculation in trainer.py
* Remove @require_non_multigpu for test_num_train_epochs_in_training
* create branch for issue #6968
* First attempt to fix incorrect tf trainer loss calculation
* Fix training loss in metric
* fix tf trainer evaluation loss
* apply count_instances_in_batch() for eval and test datasets
* prototype of using a new argument in trainer_tf.py to fix loss issue
* some renaming and fix, in particular for evaluation methods
* fix bugs to have a running version
* change to @staticmethod
* apply style
* Add Tuna Mirror for Downloads from China
* format fix
* Use preset instead of hardcoding URL
* Fix
* make style
* update the mirror option doc
* update the mirror
* adding demo
* Update examples/lxmert/requirements.txt
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update examples/lxmert/checkpoint.sh
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* added user input for .py demo
* updated model loading, data extrtaction, checkpoints, and lots of other automation
* adding normalizing for bounding boxes
* Update requirements.txt
* some optimizations for extracting data
* added data extracting file
* added data extraction file
* minor fixes to reqs and readme
* Style
* remove options
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Add tests and fix various bugs in ModelOutput
* Update tests/test_model_output.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix to ensure that returned tensors after the tokenization is Long
* fix to ensure that returned tensors after the tokenization is Long
Co-authored-by: Ashwin Geet Dsa <adsa@grvingt-6.nancy.grid5000.fr>
* add dataset for albert pretrain
* datacollator for albert pretrain
* naming, comprehension, file reading change
* data cleaning is no needed after this modification
* delete prints
* fix a bug
* file structure change
* add tests for albert datacollator
* remove random seed
* add back len and get item function
* sample file for testing and test code added
* format change for black
* more format change
* Style
* var assignment issue resolve
* add back wrongly deleted DataCollatorWithPadding in init file
* Style
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Currently beam search returns inconsistent outputs - if hypos have different lengths we get eos, if they are the same - we don't.
This PR makes the output consistent.
Also why not also replace:
```
if sent_lengths[i] < max_length:
decoded[i, sent_lengths[i]] = eos_token_id
```
with:
```
decoded[i, sent_lengths[i]] = eos_token_id
```
Shouldn't eos always be there? If the data gets truncated, the caller needs to user a larger `max_length`.
Please correct me if my logic is flawed.
* Should check if `torch` is available
* fixed samples_count error, distributed_concat arguments
* style
* Import torch at beginning of file
Co-authored-by: TevenLeScao <teven.lescao@gmail.com>
* Initial model
* Fix upsampling
* Add special cls token id and test
* Formatting
* Test and fist FunnelTokenizerFast
* Common tests
* Fix the check_repo script and document Funnel
* Doc fixes
* Add all models
* Write doc
* Fix test
* Initial model
* Fix upsampling
* Add special cls token id and test
* Formatting
* Test and fist FunnelTokenizerFast
* Common tests
* Fix the check_repo script and document Funnel
* Doc fixes
* Add all models
* Write doc
* Fix test
* Fix copyright
* Forgot some layers can be repeated
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/modeling_funnel.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments
* Update src/transformers/modeling_funnel.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Address review comments
* Update src/transformers/modeling_funnel.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* Slow integration test
* Make small integration test
* Formatting
* Add checkpoint and separate classification head
* Formatting
* Expand list, fix link and add in pretrained models
* Styling
* Add the model in all summaries
* Typo fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* fixed trainer tr_loss memory leak
* detached returned training loss from computation graph in the Trainer class' training_step() method
* Revert "fixed trainer tr_loss memory leak"
This reverts commit 47226e4e
ParsBERT v2.0 is a fine-tuned and vocab-reconstructed version of ParsBERT, and it's able to be used in other scopes!
It includes these features:
- We added some unused-vocab for use in summarization and other scopes.
- We fine-tuned the model on vast styles of writing in the Persian language.
my flake8 wasn't up-to-date enough `make quality` wasn't reporting the same things CI did - this PR adds the actual required version.
Thinking more about some of these minimal versions - CI will always install afresh and thus will always run the latest version. Is there a way to tell pip to always install the latest versions of certain dependencies on `pip install -i ".[dev]"`, rather than hardcoding the minimals which quickly become outdated?
* [gen utils] missing else case
1. `else` is missing - I hit that case while porting a model. Probably needs to assert there?
2. also the comment on top seems to be outdated (just vocab_size is being set there)
* typo
unittest doesn't support pytest's super-handy `@pytest.mark.parametrize`, I researched and there are many proposed workarounds, most tedious at best. If we include https://pypi.org/project/parameterized/ in dev dependencies - it will provide a very easy to write parameterization in tests. Same as pytest's fixture, plus quite a few other ways.
Example:
```
from parameterized import parameterized
@parameterized([
(2, 2, 4),
(2, 3, 8),
(1, 9, 1),
(0, 9, 0),
])
def test_pow(base, exponent, expected):
assert_equal(math.pow(base, exponent), expected)
```
(extra `self`var if inside a test class)
To remind the pytest style is slightly different:
```
@pytest.mark.parametrize("test_input,expected", [("3+5", 8), ("2+4", 6), ("6*9", 42)])
def test_eval(test_input, expected):
```
More examples here: https://pypi.org/project/parameterized
May I suggest that it will make it much easier to write some types of tests?
* Create Readme.MD for KanBERTo
KanBERTo language model readme for Kannada language.
* Update model_cards/Naveen-k/KanBERTo/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Remove hard-coded uses of float32 to fix mixed precision use in TF Distilbert
* fix style
* fix gelu dtype issue in TF Distilbert
* fix numeric overflow while using half precision
Since `generate()` does:
```
num_beams = num_beams if num_beams is not None else self.config.num_beams
```
This test fails if `model.config.num_beams > 1` (which is the case in the model I'm porting).
This fix makes the test setup unambiguous by passing an explicit `num_beams=1` to `generate()`.
Thanks.
* Add cache_dir to save features TextDataset
This is in case the dataset is in a RO filesystem, for which is the case
in tests (GKE TPU tests).
* style
* Introduce HPO checkpointing for PBT
* Moved checkpoint saving
* Fixed checkpoint subdir pass
* Fixed style
* Enable/disable checkpointing, check conditions for various tune schedulers incl. PBT
* Adjust number of GPUs to number of jobs
* Avoid mode pickling in ray
* Move hp search to integrations
* Only access loss tensor every logging_steps
* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards
* Fix style (#6803)
* t5 model should make decoder_attention_mask (#6800)
* [s2s] Test hub configs in self-scheduled CI (#6809)
* [s2s] round runtime in run_eval (#6798)
* Pegasus finetune script: add --adafactor (#6811)
* [bart] rename self-attention -> attention (#6708)
* [tests] fix typos in inputs (#6818)
* Fixed open in colab link (#6825)
* Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827)
* BR_BERTo model card (#6793)
* clearly indicate shuffle=False (#6312)
* Clarify shuffle
* clarify shuffle
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* [s2s README] Add more dataset download instructions (#6737)
* Style
* Patch logging issue
* Set default logging level to `WARNING` instead of `INFO`
* TF Flaubert w/ pre-norm (#6841)
* Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644)
* add datacollator and dataset for next sentence prediction task
* bug fix (numbers of special tokens & truncate sequences)
* bug fix (+ dict inputs support for data collator)
* add padding for nsp data collator; renamed cached files to avoid conflict.
* add test for nsp data collator
* Style
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Fix in Adafactor docstrings (#6845)
* Fix resuming training for Windows (#6847)
* Only access loss tensor every logging_steps
* tensor.item() was being called every step. This must not be done
for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU
communication at each step. On RoBERTa MLM for example, it reduces step
time by 30%, should be larger for smaller step time models/tasks.
* Train batch size was not correct in case a user uses the
`per_gpu_train_batch_size` flag
* Avg reduce loss accross eval shards
* comments
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Thomas Ashish Cherian <6967017+PandaWhoCodes@users.noreply.github.com>
Co-authored-by: Zane Lim <zyuanlim@gmail.com>
Co-authored-by: Rodolfo De Nadai <rdenadai@gmail.com>
Co-authored-by: xujiaze13 <37360975+xujiaze13@users.noreply.github.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Huang Lianzhe <hlz@pku.edu.cn>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* add datacollator and dataset for next sentence prediction task
* bug fix (numbers of special tokens & truncate sequences)
* bug fix (+ dict inputs support for data collator)
* add padding for nsp data collator; renamed cached files to avoid conflict.
* add test for nsp data collator
* Style
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Improved tokenization with sacremoses
* The TransfoXLTokenizer is now using sacremoses for tokenization
* Added tokenization of comma-separated and floating point numbers.
* Removed prepare_for_tokenization() from tokenization_transfo_xl.py because punctuation is handled by sacremoses
* Added corresponding tests
* Removed test comapring TransfoXLTokenizer and TransfoXLTokenizerFast
* Added deprecation warning to TransfoXLTokenizerFast
* isort change
Co-authored-by: Teven <teven.lescao@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* AdaFactor optimizer ported from fairseq. Tested for T5 finetuning and MLM -- reduced memory consumption compared to ADAM.
* update PR fixes, add basic test
* bug -- incorrect params in test
* bugfix -- import Adafactor into test
* bugfix -- removed accidental T5 include
* resetting T5 to master
* bugfix -- include Adafactor in __init__
* longer loop for adafactor test
* remove double error class declare
* lint
* black
* isort
* Update src/transformers/optimization.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* single docstring
* Cleanup docstring
Co-authored-by: Nikolai Y <nikolai.yakovenko@point72.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* add tf graph compile tests
* fix conflict
* remove more tf transpose statements
* fix conflicts
* fix comment typos
* move function to class function
* fix black
* fix black
* make style
* add tie_word_embeddings
* correct word embeddings in modeling utils
* make style
* make config param only relevant for torch
* make style
* correct typo
* delete deprecated arg in transo-xl
* Allow tests in examples to use cuda or fp16,if they are available
The tests in examples didn't use the cuda or fp16 even if they where available.
- The text classification example (`run_glue.py`) didn't use the fp16 even if it was available but
the device was take based on the availablity(cuda/cpu).
- The language-modeling example (`run_language_modeling.py`) was having `--no_cuda` argument
which made the test to work without cuda. This example is having issue when running with fp16
thus it not enabled (got an assertion error for perplexity due to it higher value).
- The cuda and fp16 is not enabled for question-answering example (`run_squad.py`) as it is having a
difference in the f1 score.
- The text-generation example (`run_generation.py`) will take the cuda or fp16 whenever it is available.
Resolves some of: #5057
* Unwanted import of is_apex_available was removed
* Made changes to test examples file to have the pass --fp16 only if cuda and apex is avaliable
- run_glue.py: Removed the check for cuda and fp16.
- run_generation.py: Removed the check for cuda and fp16 also removed unwanted flag creation.
* Incorrectly sorted imports fixed
* The model needs to be converted to half precision
* Formatted single line if condition statement to multiline
* The torch_device also needed to be checked before running the test on examples
- The tests in examples which uses cuda should also depend from the USE_CUDA flag,
similarly to the rest of the test suite. Even if we decide to set USE_CUDA to
True by default, setting USE_CUDA to False should result in the examples not using CUDA
* Format some of the code in test_examples file
* The improper import of is_apex_available was sorted
* Formatted the code to keep the style standards
* The comma at the end of list giving a flake8 issue was fixed
* Import sort was fixed
* Removed the clean_test_dir function as its not used right now
* Add model card for singbert.
Adding a model card for singbert- bert for singlish and manglish.
* Update README.md
Add additional tags and model name.
* Update README.md
Fix tag for malay.
* Update model_cards/zanelim/singbert/README.md
Fix language
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* Add examples and custom widget input.
Add examples and custom widget input.
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* Create PULL_REQUEST_TEMPLATE.md
Proposing to copy this neat feature from pytorch. This is a small template that let's a PR submitter tell which issue that PR closes.
* Update .github/PULL_REQUEST_TEMPLATE.md
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* Add optuna hyperparameter search to Trainer
* @julien-c suggestions
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Make compute_objective an arg function
* Formatting
* Rework to make it easier to add ray
* Formatting
* Initial support for Ray
* Formatting
* Polish and finalize
* Add trial id to checkpoint with Ray
* Smaller default
* Use GPU in ray if available
* Formatting
* Fix test
* Update install instruction
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
* Address review comments
* Formatting post-merge
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
Tested in a local build of the docs.
e.g. Just above https://huggingface.co/transformers/task_summary.html#causal-language-modeling
Copy will copy the full code, e.g.
for token in top_5_tokens:
print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))
Instead of currently only:
for token in top_5_tokens:
>>> for token in top_5_tokens:
... print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help reduce our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help increase our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help decrease our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help offset our carbon footprint.
Distilled models are smaller than the models they mimic. Using them instead of the large versions would help improve our carbon footprint.
Docs for the option fix:
https://sphinx-copybutton.readthedocs.io/en/latest/
* Feed forward chunking for Distilbert & Albert
* Added ff chunking for many other models
* Change model signature
* Added chunking for XLM
* Cleaned up by removing some variables.
* remove test_chunking flag
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
As discussed at https://github.com/huggingface/transformers/issues/6317 codecov currently sends an invalid report when it fails to find a code coverage report for the base it checks against, so this gets fixed by:
- require_base: yes # don't report if there is no base coverage report
let's add this for clarity, this supposedly is already the default.
- require_head: yes # don't report if there is no head coverage report
and perhaps no point reporting on doc changes as they don't make any difference and it just generates noise:
- require_changes: true # only comment if there was change in coverage
* [doc] multiple corrections to "Summary of the tasks"
* fix indentation
* correction
* fix links, add links to examples/seq2seq/README.md instead of non-existing script
* [testing] switch to a new TestCasePlus + get_auto_remove_tmp_dir() for auto-removal of tmp dirs
* respect after=True for tempfile, simplify code
* comments
* comment fix
* put `before` last in args, so can make debug even faster
Currently with the bug introduced we're taking two optimizer steps per
batch: one global one, where `xm.optimizer_step` injects a CRS between
all cores in training, and one without. This has been affecting training
accuracy (for example, XLNet GLUE on MNLI is not converging, etc.).
* Generation doc
* MBartForConditionalGeneration (#6441)
* add MBartForConditionalGeneration
* style
* rebase and fixes
* add mbart test in TEST_FILES_WITH_NO_COMMON_TESTS
* fix docs
* don't ignore mbart
* doc
* fix mbart fairseq link
* put mbart before bart
* apply doc suggestions
* Use hash to clean the test dirs (#6475)
* Use hash to clean the test dirs
* Use hash to clean the test dirs
* Use hash to clean the test dirs
* fix
* [EncoderDecoder] Add Cross Attention for GPT2 (#6415)
* add cross attention layers for gpt2
* make gpt2 cross attention work
* finish bert2gpt2
* add explicit comments
* remove attention mask since not yet supported
* revert attn mask in pipeline
* Update src/transformers/modeling_gpt2.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/modeling_encoder_decoder.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Sort unique_no_split_tokens to make it deterministic (#6461)
* change unique_no_split_tokens's type to set
* use sorted list instead of set
* style
* Import accuracy_score (#6480)
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address comments
* Styling
* Generation doc
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address comments
* Styling
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
Co-authored-by: gijswijnholds <gijswijnholds@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Add more token classification examples
* POS tagging example
* Phrase chunking example
* PR review fixes
* Add conllu to third party list (used in token classification examples)
* cleanup torch unittests: part 2
* remove trailing comma added by isort, and which breaks flake
* one more comma
* revert odd balls
* part 3: odd cases
* more ["key"] -> .key refactoring
* .numpy() is not needed
* more unncessary .numpy() removed
* more simplification
* Data collator with padding
* Add type annotation
* Support tensors as well
* Add comment
* Fix for labels wrong shape
* Data collator with padding
* Add type annotation
* Support tensors as well
* Add comment
* Fix for labels wrong shape
* Remove changes rendered unnecessary
* allow using tokenizer.pad as a collate_fn in pytorch
* allow using tokenizer.pad as a collate_fn in pytorch
* Add documentation and tests
* Make attention mask the right shape
* Better test
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* replace capsys with the more refined CaptureStderr/CaptureStdout
* Update examples/seq2seq/test_seq2seq_examples.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* [wip] add get_polynomial_decay_schedule_with_warmup
* style
* add assert
* change lr_end to a much smaller default number
* check for exact equality
* [model_cards] electra-base-turkish-cased-ner (#6350)
* for electra-base-turkish-cased-ner
* Add metadata
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Temporarily de-activate TPU CI
* Update modeling_tf_utils.py (#6372)
fix typo: ckeckpoint->checkpoint
* the test now works again (#6371)
* correct pl link in readme (#6364)
* refactor almost identical tests (#6339)
* refactor almost identical tests
* important to add a clear assert error message
* make the assert error even more descriptive than the original bt
* Small docfile fixes (#6328)
* Patch models (#6326)
* TFAlbertFor{TokenClassification, MultipleChoice}
* Patch models
* BERT and TF BERT info
s
* Update check_repo
* Ci GitHub caching (#6382)
* Cache Github Actions CI
* Remove useless file
* Colab button (#6389)
* Add colab button
* Add colab link for tutorials
* Fix links for open in colab (#6391)
* Update src/transformers/optimization.py
consistently use lr_end=1e-7 default
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* [wip] add get_polynomial_decay_schedule_with_warmup
* style
* add assert
* change lr_end to a much smaller default number
* check for exact equality
* Update src/transformers/optimization.py
consistently use lr_end=1e-7 default
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* remove dup (leftover from merge)
* convert the test into the new refactored format
* stick to using the current_step as is, without ++
Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Alexander Measure <ameasure@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Optimized banned token masking
* Avoid duplicate EOS masking if in bad_words_id
* Updated mask generation to handle empty banned token list
* Addition of unit tests for the updated bad_words_ids masking
* Updated timeout handling in `test_postprocess_next_token_scores_large_bad_words_list` unit test
* Updated timeout handling in `test_postprocess_next_token_scores_large_bad_words_list` unit test (timeout does not work on Windows)
* Moving Marian import to the test context to allow TF only environments to run
* Moving imports to torch_available test
* Updated operations device and test
* Updated operations device and test
* Added docstring and comment for in-place scores modification
* Moving test to own test_generation_utils, use of lighter models for testing
* removed unneded imports in test_modeling_common
* revert formatting change for ModelTesterMixin
* Updated caching, simplified eos token id test, removed unnecessary @require_torch
* formatting compliance
* Warn if debug requested without TPU fixes (#6308)
Check whether a PyTorch compatible TPU is available before attempting to print TPU metrics after training has completed. This way, users who apply `--debug` without reading the documentation aren't suprised by a stacktrace.
* Style
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* add pl_glue example test
* for now just test that it runs, next validate results of eval or predict?
* complete the run_pl_glue test to validate the actual outcome
* worked on my machine, CI gets less accuracy - trying higher epochs
* match run_pl.sh hparms
* more epochs?
* trying higher lr
* for now just test that the script runs to a completion
* correct the comment
* if cuda is available, add --fp16 --gpus=1 to cover more bases
* style
* Chunked feed forward for Bert
This is an initial implementation to test applying feed forward chunking for BERT.
Will need additional modifications based on output and benchmark results.
* Black and cleanup
* Feed forward chunking in BertLayer class.
* Isort
* add chunking for all models
* fix docs
* Fix typo
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* improve names and tests longformer
* more and better tests for longformer
* add first tf test
* finalize tf basic op functions
* fix merge
* tf shape test passes
* narrow down discrepancies
* make longformer local attn tf work
* correct tf longformer
* add first global attn function
* add more global longformer func
* advance tf longformer
* finish global attn
* upload big model
* finish all tests
* correct false any statement
* fix common tests
* make all tests pass except keras save load
* fix some tests
* fix torch test import
* finish tests
* fix test
* fix torch tf tests
* add docs
* finish docs
* Update src/transformers/modeling_longformer.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update src/transformers/modeling_tf_longformer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* apply Lysandres suggestions
* reverse to assert statement because function will fail otherwise
* applying sylvains recommendations
* Update src/transformers/modeling_longformer.py
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* Update src/transformers/modeling_tf_longformer.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
as discussed with @sshleifer, removing this TODO to switch to a tiny model, since it won't be able to test the results of the evaluation (i.e. the results are meaningless).
* Add a script to check all models are tested and documented
* Apply suggestions from code review
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* Address comments
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* Single workflow cache test
Remove cache dir, re-trigger cache
Only pip archives
Not sudo when pip
* All workflow cache
Remove no-cache-dir instruction
Remove last sudo occurrences
v0.3
* Support for Comet.ml
* Need to import comet first
* Log this model, not the one in the backprop step
* Log args as hyperparameters; use framework to allow fine control
* Log hyperparameters with context
* Apply black formatting
* isort fix integrations
* isort fix __init__
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/trainer.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/trainer_tf.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Address review comments
* Style + Quality, remove Tensorboard import test
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Add strip_accents to basic tokenizer
* Add tests for strip_accents.
* fix style with black
* Fix strip_accents test
* empty commit to trigger CI
* Improved strip_accents check
* Add code quality with is not False
* TF outputs and test on BERT
* Albert to DistilBert
* All remaining TF models except T5
* Documentation
* One file forgotten
* TF outputs and test on BERT
* Albert to DistilBert
* All remaining TF models except T5
* Documentation
* One file forgotten
* Add new models and fix issues
* Quality improvements
* Add T5
* A bit of cleanup
* Fix for slow tests
* Style
* Add SequenceClassification and MultipleChoice TF models to Electra
* Apply style
* Add summary_proj_to_labels to Electra config
* Finally mirroring the PT version of these models
* Apply style
* Fix Electra test
* support --lr_scheduler with multiple possibilities
* correct the error message
* add a note about supported schedulers
* cleanup
* cleanup2
* needs the argument default
* style
* add another assert in the test
* implement requested changes
* cleanups
* fix relative import
* cleanup
* Update to match renamed attributes in fairseq master
RobertaModel no longer have model.encoder and args.num_classes attributes as of 5/28/20.
* Quality
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
* Adding docs for how to load encoder_decoder pretrained model with individual config objects
* Adding docs for loading encoder_decoder config from pretrained folder
* Fixing W293 blank line contains whitespace
* Update src/transformers/modeling_encoder_decoder.py
* Update src/transformers/modeling_encoder_decoder.py
* Update src/transformers/modeling_encoder_decoder.py
* Apply suggestions from code review
model file should only show examples for how to load save model
* Update src/transformers/configuration_encoder_decoder.py
* Update src/transformers/configuration_encoder_decoder.py
* fix space
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* improve unit tests
this is a sample of one test according to the request in https://github.com/huggingface/transformers/issues/5973
before I apply it to the rest
* batch 1
* batch 2
* batch 3
* batch 4
* batch 5
* style
* non-tf template
* last deletion of check_loss_output
* Fix TF Serving when output_hidden_states and output_attentions are True
* Add tests for saved model creation + bug fix for multiple choices models
* remove unused import
* Fix the input for several layers
* Fix test
* Fix conflict printing
* Apply style
* Fix XLM and Flaubert for TensorFlow
* Apply style
* Fix TF check version
* Apply style
* Trigger CI
* Add script to convert tf2.x checkpoint to pytorch
The script converts the newer TF2.x checkpoints (as published on their official GitHub: https://github.com/tensorflow/models/tree/master/official/nlp/bert) to Pytorch.
* rename file in order to stay consistent with naming convention
* Replace mecab-python3 with fugashi
This replaces mecab-python3 with fugashi for Japanese tokenization. I am
the maintainer of both projects.
Both projects are MeCab wrappers, so the underlying C++ code is the
same. fugashi is the newer wrapper and doesn't use SWIG, so for basic
use of the MeCab API it's easier to use.
This code insures the use of a version of ipadic installed via pip,
which should make versioning and tracking down issues easier.
fugashi has wheels for Windows, OSX, and Linux, which will help with
issues with installing old versions of mecab-python3 on Windows.
Compared to mecab-python3, because fugashi doesn't use SWIG, it doesn't
require a C++ runtime to be installed on Windows.
In adding this change I removed some code dealing with `cursor`,
`token_start`, and `token_end` variables. These variables didn't seem to
be used for anything, it is unclear to me why they were there.
I ran the tests and they passed, though I couldn't figure out how to run
the slow tests (`--runslow` gave an error) and didn't try testing with
Tensorflow.
* Style fix
* Remove unused variable
Forgot to delete this...
* Adapt doc with install instructions
* Fix typo
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* enable easy checkout switch
allow having multiple repository checkouts and not needing to remember to rerun 'pip install -e .[dev]' when switching between checkouts and running tests.
* make isort happy
* examples needs one too
* fixed type; add Pytorch Native CUDA AMP support
* reverted commit on modeling_utils
* confirming to HF black formatting rule
* changed bool value of _use_apex
* scaler support for gradient clipping
* fix inplace operation of clip_grad_norm
* removed not while version comparison
* Add onnxruntime transformers optimization support
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added Optimization section in ONNX/ONNXRuntime documentation.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Improve note reference
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fixing imports order.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Add warning about different level of optimization between torch and tf export.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Address @LysandreJik wording suggestion
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address @LysandreJik wording suggestion
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Always optimize model before quantization for maximum performances.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Address comments on the documentation.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Improve TensorFlow optimization message as suggested by @yufenglee
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Removed --optimize parameter
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Warn the user about current quantization limitation when model is larger than 2GB.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Trigger CI for last check
* Small change in print for the optimization section.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* initial commit for pipeline implementation
Addition of input processing and history concatenation
* Conversation pipeline tested and working for single & multiple conversation inputs
* Added docstrings for dialogue pipeline
* Addition of dialogue pipeline integration tests
* Delete test_t5.py
* Fixed max code length
* Updated styling
* Fixed test broken by formatting tools
* Removed unused import
* Added unit test for DialoguePipeline
* Fixed Tensorflow compatibility
* Fixed multi-framework support using framework flag
* - Fixed docstring
- Added `min_length_for_response` as an initialization parameter
- Renamed `*args` to `conversations`, `conversations` being a `Conversation` or a `List[Conversation]`
- Updated truncation to truncate entire segments of conversations, instead of cutting in the middle of a user/bot input
* - renamed pipeline name from dialogue to conversational
- removed hardcoded default value of 1000 and use config.max_length instead
- added `append_response` and `set_history` method to the Conversation class to avoid direct fields mutation
- fixed bug in history truncation method
* - Updated ConversationalPipeline to accept only active conversations (otherwise a ValueError is raised)
* - Simplified input tensor conversion
* - Updated attention_mask value for Tensorflow compatibility
* - Updated last dialogue reference to conversational & fixed integration tests
* Fixed conflict with master
* Updates following review comments
* Updated formatting
* Added Conversation and ConversationalPipeline to the library __init__, addition of docstrings for Conversation, added both to the docs
* Update src/transformers/pipelines.py
Updated docsting following review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Switch from return_tuple to return_dict
* Fix test
* [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614)
* Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests
* AutoModels
Tiny tweaks
* Style
* Final changes before merge
* Re-order for simpler review
* Final fixes
* Addressing @sgugger's comments
* Test MultipleChoice
* Rework TF trainer (#6038)
* Fully rework training/prediction loops
* fix method name
* Fix variable name
* Fix property name
* Fix scope
* Fix method name
* Fix tuple index
* Fix tuple index
* Fix indentation
* Fix variable name
* fix eval before log
* Add drop remainder for test dataset
* Fix step number + fix logging datetime
* fix eval loss value
* use global step instead of step + fix logging at step 0
* Fix logging datetime
* Fix global_step usage
* Fix breaking loop + logging datetime
* Fix step in prediction loop
* Fix step breaking
* Fix train/test loops
* Force TF at least 2.2 for the trainer
* Use assert_cardinality to facilitate the dataset size computation
* Log steps per epoch
* Make tfds compliant with TPU
* Make tfds compliant with TPU
* Use TF dataset enumerate instead of the Python one
* revert previous commit
* Fix data_dir
* Apply style
* rebase on master
* Address Sylvain's comments
* Address Sylvain's and Lysandre comments
* Trigger CI
* Remove unused import
* Switch from return_tuple to return_dict
* Fix test
* Add recent model
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Plu <plu.julien@gmail.com>
* Fully rework training/prediction loops
* fix method name
* Fix variable name
* Fix property name
* Fix scope
* Fix method name
* Fix tuple index
* Fix tuple index
* Fix indentation
* Fix variable name
* fix eval before log
* Add drop remainder for test dataset
* Fix step number + fix logging datetime
* fix eval loss value
* use global step instead of step + fix logging at step 0
* Fix logging datetime
* Fix global_step usage
* Fix breaking loop + logging datetime
* Fix step in prediction loop
* Fix step breaking
* Fix train/test loops
* Force TF at least 2.2 for the trainer
* Use assert_cardinality to facilitate the dataset size computation
* Log steps per epoch
* Make tfds compliant with TPU
* Make tfds compliant with TPU
* Use TF dataset enumerate instead of the Python one
* revert previous commit
* Fix data_dir
* Apply style
* rebase on master
* Address Sylvain's comments
* Address Sylvain's and Lysandre comments
* Trigger CI
* Remove unused import
* Added capability to quantize a model while exporting through ONNX.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
We do not support multiple extensions
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Reformat files
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* More quality
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Ensure test_generate_identified_name compares the same object types
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added documentation everywhere on ONNX exporter
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Use pathlib.Path instead of plain-old string
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Use f-string everywhere
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Use the correct parameters for black formatting
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Use Python 3 super() style.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Use packaging.version to ensure installed onnxruntime version match requirements
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fixing imports sorting order.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Missing raise(s)
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added quantization documentation
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix some spelling.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix bad list header format
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Move torchscript and add ONNX documentation under modle_export
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Let's follow guidelines by the gurus: Renamed torchscript.rst to serialization.rst
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove previously introduced tree element
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* WIP doc
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* ONNX documentation
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix invalid link
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Improve spelling
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Final wording pass
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Moving rom transformers statements to relative imports in some files under src/
* Import order
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Currently, it's hard to derive which example tests were run on CI, and which weren't. Adding `-rA` flag to `pytest`, will now include a summary like:
```
==================================================================== short test summary info =====================================================================
PASSED examples/test_examples.py::ExamplesTests::test_generation
PASSED examples/test_examples.py::ExamplesTests::test_run_glue
PASSED examples/test_examples.py::ExamplesTests::test_run_language_modeling
PASSED examples/test_examples.py::ExamplesTests::test_run_squad
FAILED examples/test_examples.py::ExamplesTests::test_run_pl_glue - AttributeError: 'Namespace' object has no attribute 'gpus'
============================================================ 1 failed, 4 passed, 8 warnings in 42.96s ============================================================
```
which makes it easier to validate whether some example is being covered by CI or not.
* Ensure OpenAI GPT position_ids is correctly initialized and registered as buffer at init.
This will make it compatible with TorchScript export.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix missing slice operator on the tensor data accessor.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Style.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fixed BertEmbedding position_ids buffer created at forward.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fixed MobileBertEmbedding position_ids buffer created at forward.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fixed XLM position_ids buffer created at forward.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Describe usage of sentence model
* fix typo usage
* add use and description to readme
* fix typo in readme
* readme formatting
* add training procedure to readme
* description name and company
* readme formatting
* dataset training readme
* typo
* readme
* minor doc fixes
correct superclass name and small grammar fixes
* correct the instance name in the error message
It appears to be `BaseTokenizer` from looking at:
`from tokenizers.implementations import BaseTokenizer as BaseTokenizerFast`
and not `Tokenizer` as it currently says.
* Attempt to fix the way squad_convert_examples_to_features pad the elements for the QA pipeline.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Quality
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make the code easier to read and avoid testing multiple test the same thing.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* missing enum value on truncation_strategy.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Rethinking for the easiest fix: expose the padding strategy on squad_convert_examples_to_features.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove unused imports.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Created model card for my extreme summarization model
* Update model_cards/yuvraj/xSumm/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Created model card for my summarization model
* Update model_cards/yuvraj/summarizer-cnndm/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* DataParallel fixes:
1. switched to a more precise check
- if self.args.n_gpu > 1:
+ if isinstance(model, nn.DataParallel):
2. fix tests - require the same fixup under DataParallel as the training module
* another fix
* Don't pass sampler for iterable dataset
* Added check for test and eval dataloaders.
* Formatting
* Don't pass sampler for iterable dataset
* Added check for test and eval dataloaders.
* Formatting
* Cleaner if nesting.
* Added test for trainer and iterable dataset
* Formatting for test
* Fixed import when torch is available only.
* Added require torch decorator to helper class
* Moved dataset class inside unittest
* Removed nested if and changed model in test
* Checking torch availability for IterableDataset
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
Slightly breaking change, changes functionality for `use_cache` in XLNet: if use_cache is True and mem_len is 0 or None (which is the case in the base model config), the model behaves like GPT-2 and returns mems to be used as past in generation. At training time `use_cache` is overriden and always True.
* fix merge rebase
* add intermediate reformer code
* save intermediate caching results
* save intermediate
* save intermediate results
* save intermediate
* upload next step
* fix generate tests
* make tests work
* add named tuple output
* Apply suggestions from code review
* fix use_cache for False case
* fix tensor to gpu
* fix tensor to gpu
* refactor
* refactor and make style
* language tag addition on albert-mongolian
* Update model_cards/bayartsogt/albert-mongolian/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Reformer model head classification implementation for text classification
* Reformat the reformer model classification code
* PR review comments, and test case implementation for reformer for classification head changes
* CI/CD reformer for classification head test import error fix
* CI/CD test case implementation added ReformerForSequenceClassification to all_model_classes
* Code formatting- fixed
* Normal test cases added for reformer classification head
* Fix test cases implementation for the reformer classification head
* removed token_type_id parameter from the reformer classification head
* fixed the test case for reformer classification head
* merge conflict with master fixed
* merge conflict, changed reformer classification to accept the choice_label parameter added in latest code
* refactored the the reformer classification head test code
* reformer classification head, common transform test cases fixed
* final set of the review comment, rearranging the reformer classes and docstring add to classification forward method
* fixed the compilation error and text case fix for reformer classification head
* Apply suggestions from code review
Remove unnecessary dup
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* fix longformer global attention output
* fix multi gpu problem
* replace -10000 with 0
* better comment
* make attention output equal local and global
* Update src/transformers/modeling_longformer.py
* Add model type check for pipelines
* Add model type check for pipelines
* rename func
* Fix the init parameters
* Fix format
* rollback unnecessary refactor
* Pytorch gpu => cpu proper device
* Memoryless XLNet warning + fixed memories during generation
* Revert "Memoryless XLNet warning + fixed memories during generation"
This reverts commit 3d3251ff
* Took the operations on the generated_sequence out of the ensure_device scope
* Ensure padding and question cannot have higher probs than context.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Add bart the the list of tokenizers adding two <sep> tokens for squad_convert_example_to_feature
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Format.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Addressing @patrickvonplaten comments.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Addressing @patrickvonplaten comments about masking non-context element when generating the answer.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Addressing @sshleifer comments.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Make sure we mask CLS after handling impossible answers
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Mask in the correct vectors ...
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Add B I handling to grouping
* Add fix to include separate entity as last token
* move last_idx definition outside loop
* Use first entity in entity group as reference for entity type
* Add test cases
* Take out extra class accidentally added
* Return tf ner grouped test to original
* Take out redundant last entity
* Get last_idx safely
Co-authored-by: ColleterVi <36503688+ColleterVi@users.noreply.github.com>
* Fix first entity comment
* Create separate functions for group_sub_entities and group_entities (splitting call method to testable functions)
* Take out unnecessary last_idx
* Remove additional forward pass test
* Move token classification basic tests to separate class
* Move token classification basic tests back to monocolumninputtestcase
* Move base ner tests to nerpipelinetests
* Take out unused kwargs
* Add back mandatory_keys argument
* Add unitary tests for group_entities in _test_ner_pipeline
* Fix last entity handling
* Fix grouping fucntion used
* Add typing to group_sub_entities and group_entities
Co-authored-by: ColleterVi <36503688+ColleterVi@users.noreply.github.com>
* Add deebert code
* Add readme of deebert
* Add test for deebert
Update test for Deebert
* Update DeeBert (README, class names, function refactoring); remove requirements.txt
* Format update
* Update test
* Update readme and model init methods
* Default decoder inputs to encoder ones for T5 if neither are specified.
* Fixing typo, now all tests are passing.
* Changing einsum to operations supported by onnx
* Adding a test to ensure T5 can be exported to onnx op>9
* Modified test for onnx export to make it faster
* Styling changes.
* Styling changes.
* Changing notation for matrix multiplication
Co-authored-by: Abel Riboulot <tkai@protomail.com>
* Added data collator for XLNet language modeling and related calls
Added DataCollatorForXLNetLanguageModeling in data/data_collator.py
to generate necessary inputs for language modeling training with
XLNetLMHeadModel. Also added related arguments, logic and calls in
examples/language-modeling/run_language_modeling.py.
Resolves: #4739, #2008 (partially)
* Changed name to `DataCollatorForPermutationLanguageModeling`
Changed the name of `DataCollatorForXLNetLanguageModeling` to the more general `DataCollatorForPermutationLanguageModelling`.
Removed the `--mlm` flag requirement for the new collator and defined a separate `--plm_probability` flag for its use.
CTRL uses a CLM loss just like GPT and GPT-2, so should work out of the box with this script (provided `past` is taken care of
similar to `mems` for XLNet).
Changed calls and imports appropriately.
* Added detailed comments, changed variable names
Added more detailed comments to `DataCollatorForPermutationLanguageModeling` in `data/data_collator.py` to explain working. Also cleaned up variable names and made them more informative.
* Added tests for new data collator
Added tests in `tests/test_trainer.py` for DataCollatorForPermutationLanguageModeling based on those in DataCollatorForLanguageModeling. A specific test has been added to check for odd-length sequences.
* Fixed styling issues
* Exposing prepare_for_model for both slow & fast tokenizers
* Update method signature
* The traditional style commit
* Hide the warnings behind the verbose flag
* update default truncation strategy and prepare_for_model
* fix tests and prepare_for_models methods
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Make QA pipeline supports models with more than 2 outputs such as BART assuming start/end are the two first outputs.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* When using the new padding/truncation paradigm setting padding="max_length" + max_length=X actually pads the input up to max_length.
This result in every sample going through QA pipelines to be of size 384 whatever the actual input size is making the overall pipeline very slow.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Mask padding & question before applying softmax. Softmax has been refactored to operate in log space for speed and stability.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Format.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Use PaddingStrategy.LONGEST instead of DO_NOT_PAD
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Revert "When using the new padding/truncation paradigm setting padding="max_length" + max_length=X actually pads the input up to max_length."
This reverts commit 1b00a9a2
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Trigger CI after unattended failure
* Trigger CI
* Work on tokenizer summary
* Finish tutorial
* Link to it
* Apply suggestions from code review
Co-authored-by: Anthony MOI <xn1t0x@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Add vocab definition
Co-authored-by: Anthony MOI <xn1t0x@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Added PipelineException
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* fill-mask pipeline raises exception when more than one mask_token detected.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Put everything in a function.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Added tests on pipeline fill-mask when input has != 1 mask_token
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Fix numel() computation for TF
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Addressing PR comments.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Remove function typing to avoid import on specific framework.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Quality.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Retry typing with @julien-c tip.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Quality².
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Simplify fill-mask mask_token checking.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* Trigger CI
* Add support for past states
* Style and forgotten self
* You mean, documenting is not enough? I have to actually add it too?
* Add memory support during evaluation
* Fix tests in eval and add TF support
* No need to change this line anymore
Otherwise, if label is not specified, the following error occurs:
Traceback (most recent call last):
File "run_ner.py", line 303, in <module>
main()
File "run_ner.py", line 101, in main
model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
File "/home/user/anaconda3/envs/bert/lib/python3.7/site-packages/transformers/hf_argparser.py", line 159, in parse_json_file
obj = dtype(**inputs)
TypeError: __init__() missing 1 required positional argument: 'labels'
* Fix the bug 'Attempted relative import with no known parent package' when using the bertabs example. Also change the used model from bertabs-finetuned-cnndm, since it seems not be accessible anymore
* Update run_summarization.py
Co-authored-by: Kevin Canwen Xu <canwenxu@126.com>
* remove references to old API in docstring - update data processors
* style
* fix tests - better type checking error messages
* better type checking
* include awesome fix by @LysandreJik for #5310
* updated doc and examples
* Add new parameter `pad_to_multiple_of` on tokenizers.
* unittest for pad_to_multiple_of
* Add .name when logging enum.
* Fix missing .items() on dict in tests.
* Add special check + warning if the tokenizer doesn't have proper pad_token.
* Use the correct logger format specifier.
* Ensure tokenizer with no pad_token do not modify the underlying padding strategy.
* Skip test if tokenizer doesn't have pad_token
* Fix RobertaTokenizer on empty input
* Format.
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
* fix and updating to simpler API
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* avoid recursion in id checks for fast tokenizers
* better typings and fix#5232
* align slow and fast tokenizers behaviors for Roberta and GPT2
* style and quality
* fix tests - improve typings
* fix-5181
Padding to max sequence length while truncation to another length was wrong on slow tokenizers
* clean up and fix#5155
* fix XLM test
* Fix tests for Transfo-XL
* logging only above WARNING in tests
* switch slow tokenizers tests in @slow
* fix Marian truncation tokenization test
* style and quality
* make the test a lot faster by limiting the sequence length used in tests
* Add return lengths
* make pad a bit more flexible so it can be used as collate_fn
* check all kwargs sent to encoding method are known
* fixing kwargs in encodings
* New AddedToken class in python
This class let you specify specifique tokenization behaviors for some special tokens. Used in particular for GPT2 and Roberta, to control how white spaces are stripped around special tokens.
* style and quality
* switched to hugginface tokenizers library for AddedTokens
* up to tokenizer 0.8.0-rc3 - update API to use AddedToken state
* style and quality
* do not raise an error on additional or unused kwargs for tokenize() but only a warning
* transfo-xl pretrained model requires torch
* Update src/transformers/tokenization_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Cleaner warning when loading pretrained models
This make more explicit logging messages when using the various `from_pretrained` methods. It also make these messages as `logging.warning` because it's a common source of silent mistakes.
* Update src/transformers/modeling_utils.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Update src/transformers/modeling_utils.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* style and quality
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* fix#5081 and improve backward compatibility (slightly)
* add nlp to setup.cfg - style and quality
* align default to previous default
* remove test that doesn't generalize
* add support for gradient checkpointing in BERT
* fix unit tests
* isort
* black
* workaround for `torch.utils.checkpoint.checkpoint` not accepting bool
* Revert "workaround for `torch.utils.checkpoint.checkpoint` not accepting bool"
This reverts commit 5eb68bb804f5ffbfc7ba13c45a47717f72d04574.
* workaround for `torch.utils.checkpoint.checkpoint` not accepting bool
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Configure all models to use output_hidden_states as argument passed to foward()
* Pass all tests
* Remove cast_bool_to_primitive in TF Flaubert model
* correct tf xlnet
* add pytorch test
* add tf test
* Fix broken tests
* Configure all models to use output_hidden_states as argument passed to foward()
* Pass all tests
* Remove cast_bool_to_primitive in TF Flaubert model
* correct tf xlnet
* add pytorch test
* add tf test
* Fix broken tests
* Refactor output_hidden_states for mobilebert
* Reset and remerge to master
Co-authored-by: Joseph Liu <joseph.liu@coinflex.com>
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>
* Fixed resize_token_embeddings for transfo_xl model
* Fixed resize_token_embeddings for transfo_xl.
Added custom methods to TransfoXLPreTrainedModel for resizing layers of
the AdaptiveEmbedding.
* Updated docstring
* Fixed resizinhg cutoffs; added check for new size of embedding layer.
* Added test for resize_token_embeddings
* Fixed code quality
* Fixed unchanged cutoffs in model.config
* Added feature to move added tokens in tokenizer.
* Fixed code quality
* Added feature to move added tokens in tokenizer.
* Fixed code quality
* Fixed docstring, renamed sym to oken.
Co-authored-by: Rafael Weingartner <rweingartner.its-b2015@fh-salzburg.ac.at>
* Add BERT Loses Patience (Patience-based Early Exit)
* update model archive
* update format
* sort import
* flake8
* Add results
* full results
* align the table
* refactor to inherit
* default per gpu eval = 1
* Formatting
* Formatting
* isort
* modify readme
* Add check
* Fix format
* Fix format
* Doc strings
* ALBERT & BERT for sequence classification don't inherit from the original anymore
* Remove incorrect comments
* Remove incorrect comments
* Remove incorrect comments
* Sync up with new code
* Sync up with new code
* Add a test
* Add a test
* Add a test
* Add a test
* Add a test
* Add a test
* Finishing up!
* add ElectraForMultipleChoice
* add test_for_multiple_choice
* add ElectraForMultipleChoice in auto model
* add ElectraForMultipleChoice in all_model_classes
* add SequenceSummary related parameters
* get rid pooler, use SequenceSummary instead
* add electra multiple choice test
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Added is_fast property on BatchEncoding to indicate if the object comes from a Fast Tokenizer.
* Added __get_state__() & __set_state__() to be pickable.
* Correct tokens() return type from List[int] to List[str]
* Added unittest for BatchEncoding pickle/unpickle
* Added unittest for BatchEncoding is_fast
* More careful checking on BatchEncoding unpickle tests.
* Formatting.
* is_fast should assertTrue on Rust tokenizers.
* Ensure tensorflow has correct way of checking array_equal
* More formatting.
* Update hans data to be able to use Trainer
* Fixes
* Deal with tokenizer that don't have token_ids
* Clean up things
* Simplify data use
* Fix the input dict
* Formatting + proper path in README
* Fixed resize_token_embeddings for transfo_xl model
* Fixed resize_token_embeddings for transfo_xl.
Added custom methods to TransfoXLPreTrainedModel for resizing layers of
the AdaptiveEmbedding.
* Updated docstring
* Fixed resizinhg cutoffs; added check for new size of embedding layer.
* Added test for resize_token_embeddings
* Fixed code quality
* Fixed unchanged cutoffs in model.config
Co-authored-by: Rafael Weingartner <rweingartner.its-b2015@fh-salzburg.ac.at>
* check type before logging to ensure it's a scalar
* log when Trainer attempts to add a non-scalar value using TensorboardX's writer.add_scalar so we know what kinds of fixes are appropriate
* black it
* rephrase log message to clarify attribute was dropped
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* ElectraForQuestionAnswering
* udate __init__
* add test for electra qa model
* add ElectraForQuestionAnswering in auto models
* add ElectraForQuestionAnswering in all_model_classes
* fix outputs, input_ids defaults to None
* add ElectraForQuestionAnswering in docs
* remove commented line
* DOC: Replace instances of ``config.output_attentions`` with function argument ``output_attentions``
* DOC: Apply Black Formatting
* Fix errors where output_attentions was undefined
* Remove output_attentions in classes per review
* Fix regressions on tests having `output_attention`
* Fix further regressions in tests relating to `output_attentions`
Ensure proper propagation of `output_attentions` as a function parameter
to all model subclasses
* Fix more regressions in `test_output_attentions`
* Fix issues with BertEncoder
* Rename related variables to `output_attentions`
* fix pytorch tests
* fix bert and gpt2 tf
* Fix most TF tests for `test_output_attentions`
* Fix linter errors and more TF tests
* fix conflicts
* DOC: Apply Black Formatting
* Fix errors where output_attentions was undefined
* Remove output_attentions in classes per review
* Fix regressions on tests having `output_attention`
* fix conflicts
* fix conflicts
* fix conflicts
* fix conflicts
* fix pytorch tests
* fix conflicts
* fix conflicts
* Fix linter errors and more TF tests
* fix tf tests
* make style
* fix isort
* improve output_attentions
* improve tensorflow
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* add tpu and torchscipt for benchmark
* fix name in tests
* "fix email"
* make style
* better log message for tpu
* add more print and info for tpu
* allow possibility to print tpu metrics
* correct cpu usage
* fix test for non-install
* remove bugus file
* include psutil in testing
* run a couple of times before tracing in torchscript
* do not allow tpu memory tracing for now
* make style
* add torchscript to env
* better name for torch tpu
Co-authored-by: Patrick von Platen <patrick@huggingface.co>
* Better None gradients handling
* Apply Style
* Apply Style
* Create a loss class per task to compute its respective loss
* Add loss classes to the ALBERT TF models
* Add loss classes to the BERT TF models
* Add question answering and multiple choice to TF Camembert
* Remove prints
* Add multiple choice model to TF DistilBERT + loss computation
* Add question answering model to TF Electra + loss computation
* Add token classification, question answering and multiple choice models to TF Flaubert
* Add multiple choice model to TF Roberta + loss computation
* Add multiple choice model to TF XLM + loss computation
* Add multiple choice and question answering models to TF XLM-Roberta
* Add multiple choice model to TF XLNet + loss computation
* Remove unused parameters
* Add task loss classes
* Reorder TF imports + add new model classes
* Add new model classes
* Bugfix in TF T5 model
* Bugfix for TF T5 tests
* Bugfix in TF T5 model
* Fix TF T5 model tests
* Fix T5 tests + some renaming
* Fix inheritance issue in the AutoX tests
* Add tests for TF Flaubert and TF XLM Roberta
* Add tests for TF Flaubert and TF XLM Roberta
* Remove unused piece of code in the TF trainer
* bugfix and remove unused code
* Bugfix for TF 2.2
* Apply Style
* Divide TFSequenceClassificationAndMultipleChoiceLoss into their two respective name
* Apply style
* Mirror the PT Trainer in the TF one: fp16, optimizers and tb_writer as class parameter and better dataset handling
* Fix TF optimizations tests and apply style
* Remove useless parameter
* Bugfix and apply style
* Fix TF Trainer prediction
* Now the TF models return the loss such as their PyTorch couterparts
* Apply Style
* Ignore some tests output
* Take into account the SQuAD cls_index, p_mask and is_impossible parameters for the QuestionAnswering task models.
* Fix names for SQuAD data
* Apply Style
* Fix conflicts with 2.11 release
* Fix conflicts with 2.11
* Fix wrongname
* Add better documentation on the new create_optimizer function
* Fix isort
* logging_dir: use same default as PyTorch
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* ner: add preprocessing script for examples that splits longer sentences
* ner: example shell scripts use local preprocessing now
* ner: add new example section for WNUT’17 NER task. Remove old English CoNLL-03 results
* ner: satisfy black and isort
* Refactor tensor creation in tokenizers.
* Make sure to convert string to TensorType
* Refactor convert_to_tensors_
* Introduce numpy tensor creation
* Format
* Add unittest for TensorType creation from str
* sorting imports
* Added unittests for numpy tensor conversion.
* Do not use in-place version for squeeze as numpy doesn't provide such feature.
* Added extra parameter prepend_batch_axis: bool on prepare_for_model.
* Ensure test_np_encode_plus_sent_to_model is not executed if encoder/decoder model.
* style.
* numpy tests require_torch for now while flax not merged.
* Hopefully will make flake8 happy.
* One more time 🎶
* Ensure tokens in never_split are not splitted when using basic tokenizer before wordpiece.
* never_split only use membership attempt to use a set() which is 10x faster for this operation.
* Use union to concatenate two sets.
* Updated docstring for never_split parameter.
* Avoid set.union() if never_split is None
* Added comments.
* Correct docstring format.
* Added links to more community notebooks
Added links to 3 more community notebooks from the git repo: https://github.com/abhimishra91/transformers-tutorials
Different Transformers models are fine tuned on Dataset using PyTorch
* Update README.md
* Update README.md
* Update README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* [hf_api] Attach all unknown attributes for future-proof compatibility
* [Pipeline] NerPipeline is really a TokenClassificationPipeline
* modelcard.py: I don't think we need to force the download
* Remove config, tokenizer from SUPPORTED_TASKS as we're moving to one model = one weight + one tokenizer
* FillMaskPipeline: also output token in string form
* TextClassificationPipeline: option to return all scores, not just the argmax
* Update docs/source/main_classes/pipelines.rst
* Glue task cleaup
* Enable writing cache to cache_dir in case dataset lives in readOnly
filesystem.
* Differentiate match vs mismatch for MNLI metrics.
* Style
* Fix pytype
* Fix type
* Use cache_dir in mnli mismatch eval dataset
* Small Tweaks
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Kill model archive maps
* Fixup
* Also kill model_archive_map for MaskedBertPreTrainedModel
* Unhook config_archive_map
* Tokenizers: align with model id changes
* make style && make quality
* Fix CI
* pass on tokenizer to pipeline
* order input names when convert to onnx
* update style
* remove unused imports
* make ordered inputs list needs to be mutable
* add test custom bert model
* remove unused imports
* better api
* improve automatic setting of global attention mask
* fix longformer bug
* fix global attention mask in test
* fix global attn mask flatten
* fix slow tests
* update docstring
* update docs and make more robust
* improve attention mask
* add multiple choice for longformer
* add models to docs
* adapt docstring
* add test to longformer
* add longformer for mc in init and modeling auto
* fix tests
The option `--do_lower_case` is currently required by the uncased models (i.e., bert-base-uncased, bert-large-uncased).
Results:
BERT-BASE without --do_lower_case: 'exact': 73.83, 'f1': 82.22
BERT-BASE with --do_lower_case: 'exact': 81.02, 'f1': 88.34
* make transformers-cli cross-platform
Using "scripts" is a useful option in setup.py particularly when you want to get access to non-python scripts. However, in this case we want to have an entry point into some of our own Python scripts. To do this in a concise, cross-platfom way, we can use entry_points.console_scripts. This change is necessary to provide the CLI on different platforms, which "scripts" does not ensure. Usage remains the same, but the "transformers-cli" script has to be moved (be part of the library) and renamed (underscore + extension)
* make style & quality
* added LongformerForQuestionAnswering
* add LongformerForQuestionAnswering
* fix import for LongformerForMaskedLM
* add LongformerForQuestionAnswering
* hardcoded sep_token_id
* compute attention_mask if not provided
* combine global_attention_mask with attention_mask when provided
* update example in docstring
* add assert error messages, better attention combine
* add test for longformerForQuestionAnswering
* typo
* cast gloabl_attention_mask to long
* make style
* Update src/transformers/configuration_longformer.py
* Update src/transformers/configuration_longformer.py
* fix the code quality
* Merge branch 'longformer-for-question-answering' of https://github.com/patil-suraj/transformers into longformer-for-question-answering
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add Type Hints to modeling_utils.py Closes#3911
Add Type Hints to methods in `modeling_utils.py`
Note: The coverage isn't 100%. Mostly skipped internal methods.
* Reformat according to `black` and `isort`
* Use typing.Iterable instead of Sequence
* Parameterize Iterable by its generic type
* Use typing.Optional when None is the default value
* Adhere to style guideline
* Update src/transformers/modeling_utils.py
* Update src/transformers/modeling_utils.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Warn the user about max_len being on the path to be deprecated.
* Ensure better backward compatibility when max_len is provided to a tokenizer.
* Make sure to override the parameter and not the actual instance value.
* Format & quality
* Adds predict stage for glue tasks, and generate result files which could be submitted to gluebenchmark.com website.
* Use Split enum + always output the label name
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* first commit
* bug fixes
* better examples
* undo padding
* remove wrong VOCAB_FILES_NAMES
* License
* make style
* make isort happy
* unit tests
* integration test
* make `black` happy by undoing `isort` changes!!
* lint
* no need for the padding value
* batch_size not bsz
* remove unused type casting
* seqlen not seq_len
* staticmethod
* `bert` selfattention instead of `n2`
* uint8 instead of bool + lints
* pad inputs_embeds using embeddings not a constant
* black
* unit test with padding
* fix unit tests
* remove redundant unit test
* upload model weights
* resolve todo
* simpler _mask_invalid_locations without lru_cache + backward compatible masked_fill_
* increase unittest coverage
* Distributed eval: SequentialDistributedSampler + gather all results
* For consistency only write to disk from world_master
Close https://github.com/huggingface/transformers/issues/4272
* Working distributed eval
* Hook into scripts
* Fix#3721 again
* TPU.mesh_reduce: stay in tensor space
Thanks @jysohn23
* Just a small comment
* whitespace
* torch.hub: pip install packaging
* Add test scenarii
* makes fetching last learning late in trainer backward compatible
* split comment to multiple lines
* fixes black styling issue
* uses version to create a more explicit logic
- add a citation.
- modify the table of the BLUE benchmark.
The table of the first version was not displayed correctly on https://huggingface.co/seiya/oubiobert-base-uncased.
Could you please confirm that this fix will allow you to display it correctly?
* Adding optimizations block from ONNXRuntime.
* Turn off external data format by default for PyTorch export.
* Correct the way use_external_format is passed through the cmdline args.
* Add index to be returned by NerPipeline to allow for the creation of
* Add entity groups
* Convert entity list to dict
* Add entity to entity_group_disagg atfter updating entity gorups
* Change 'group' parameter to 'grouped_entities'
* Add unit tests for grouped NER pipeline case
* Correct variable name typo for NER_FINETUNED_MODELS
* Sync grouped tests to recent test updates
* Added generic ONNX conversion script for PyTorch model.
* WIP initial TF support.
* TensorFlow/Keras ONNX export working.
* Print framework version info
* Add possibility to check the model is correctly loading on ONNX runtime.
* Remove quantization option.
* Specify ONNX opset version when exporting.
* Formatting.
* Remove unused imports.
* Make functions more generally reusable from other part of the code.
* isort happy.
* flake happy
* Export only feature-extraction for now
* Correctly check inputs order / filter before export.
* Removed task variable
* Fix invalid args call in load_graph_from_args.
* Fix invalid args call in convert.
* Fix invalid args call in infer_shapes.
* Raise exception and catch in caller function instead of exit.
* Add 04-onnx-export.ipynb notebook
* More WIP on the notebook
* Remove unused imports
* Simplify & remove unused constants.
* Export with constant_folding in PyTorch
* Let's try to put function args in the right order this time ...
* Disable external_data_format temporary
* ONNX notebook draft ready.
* Updated notebooks charts + wording
* Correct error while exporting last chart in notebook.
* Adressing @LysandreJik comment.
* Set ONNX opset to 11 as default value.
* Set opset param mandatory
* Added ONNX export unittests
* Quality.
* flake8 happy
* Add keras2onnx dependency on extras["tf"]
* Pin keras2onnx on github master to v1.6.5
* Second attempt.
* Third attempt.
* Use the right repo URL this time ...
* Do the same for onnxconverter-common
* Added keras2onnx and onnxconveter-common to 1.7.0 to supports TF2.2
* Correct commit hash.
* Addressing PR review: Optimization are enabled by default.
* Addressing PR review: small changes in the notebook
* setup.py comment about keras2onnx versioning.
* Add QA trainer example for TF
* Make data_dir optional
* Fix parameter logic
* Fix feature convert
* Update the READMEs to add the question-answering task
* Apply style
* Change 'sequence-classification' to 'text-classification' and prefix with 'eval' all the metric names
* Apply style
* Apply style
* Improvements to the wandb integration
* small reorg + no global necessary
* feat(trainer): log epoch and final metrics
* Simplify logging a bit
* Fixup
* Fix crash when just running eval
Co-authored-by: Chris Van Pelt <vanpelt@gmail.com>
Co-authored-by: Boris Dayma <boris.dayma@gmail.com>
* Allow BatchEncoding to be initialized empty.
This is required by recent changes introduced in TF 2.2.
* Attempt to unpin Tensorflow to 2.2 with the previous commit.
* catch gpu len 1 set to gpu0
* Add mpc to trainer
* Add MPC for TF
* fix TF automodel for MPC and add Albert
* Apply style
* Fix import
* Note to self: double check
* Make shape None, None for datasetgenerator output shapes
* Add from_pt bool which doesnt seem to work
* Original checkpoint dir
* Fix docstrings for automodel
* Update readme and apply style
* Colab should probably not be from users
* Colabs should probably not be from users
* Add colab
* Update README.md
* Update README.md
* Cleanup __intit__
* Cleanup flake8 trailing comma
* Update src/transformers/training_args_tf.py
* Update src/transformers/modeling_tf_auto.py
Co-authored-by: Viktor Alm <viktoralm@pop-os.localdomain>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* simplify cache vars and allow for TRANSFORMERS_CACHE env
As it currently stands, "TRANSFORMERS_CACHE" is not an accepted variable. It seems that the these variables were not updated when moving from version pytorch_transformers to transformers. In addition, the fallback procedure could be improved. and simplified. Pathlib seems redundant here.
* Update file_utils.py
"Migrating from pytorch-transformers to transformers" is missing in the main document. It is available in the main `readme` thought. Just move it to the document.
* Fix the issue to properly run the accumulator with TF 2.2
* Apply style
* Fix training_args_tf for TF 2.2
* Fix the TF training args when only one GPU is available
* Remove the fixed version of TF in setup.py
* example updated to use generation pipeline
* Update model_cards/LorenzoDeMattei/GePpeTto/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* Created using Colaboratory
* [examples] reorganize files
* remove run_tpu_glue.py as superseded by TPU support in Trainer
* Bugfix: int, not tuple
* move files around
* Rewritten batch support in pipelines.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix imports sorting 🔧
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Set pad_to_max_length=True by default on Pipeline.
* Set pad_to_max_length=False for generation pipelines.
Most of generation models doesn't have padding token.
* Address @joeddav review comment: Uniformized *args.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Address @joeddav review comment: Uniformized *args (second).
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* first copy & past commit from Bert and morgans LSH code
* add easy way to compare to trax original code
* translate most of function
* make trax lsh self attention deterministic with numpy seed + copy paste code
* add same config
* add same config
* make layer init work
* implemented hash_vectors function for lsh attention
* continue reformer translation
* hf LSHSelfAttentionLayer gives same output as trax layer
* refactor code
* refactor code
* refactor code
* refactor
* refactor + add reformer config
* delete bogus file
* split reformer attention layer into two layers
* save intermediate step
* save intermediate step
* make test work
* add complete reformer block layer
* finish reformer layer
* implement causal and self mask
* clean reformer test and refactor code
* fix merge conflicts
* fix merge conflicts
* update init
* fix device for GPU
* fix chunk length init for tests
* include morgans optimization
* improve memory a bit
* improve comment
* factorize num_buckets
* better testing parameters
* make whole model work
* make lm model work
* add t5 copy paste tokenizer
* add chunking feed forward
* clean config
* add improved assert statements
* make tokenizer work
* improve test
* correct typo
* extend config
* add complexer test
* add new axial position embeddings
* add local block attention layer
* clean tests
* refactor
* better testing
* save intermediate progress
* clean test file
* make shorter input length work for model
* allow variable input length
* refactor
* make forward pass for pretrained model work
* add generation possibility
* finish dropout and init
* make style
* refactor
* add first version of RevNet Layers
* make forward pass work and add convert file
* make uploaded model forward pass work
* make uploaded model forward pass work
* refactor code
* add namedtuples and cache buckets
* correct head masks
* refactor
* made reformer more flexible
* make style
* remove set max length
* add attention masks
* fix up tests
* fix lsh attention mask
* make random seed optional for the moment
* improve memory in reformer
* add tests
* make style
* make sure masks work correctly
* detach gradients
* save intermediate
* correct backprob through gather
* make style
* change back num hashes
* rename to labels
* fix rotation shape
* fix detach
* update
* fix trainer
* fix backward dropout
* make reformer more flexible
* fix conflict
* fix
* fix
* add tests for fixed seed in reformer layer
* fix trainer typo
* fix typo in activations
* add fp16 tests
* add fp16 training
* support fp16
* correct gradient bug in reformer
* add fast gelu
* re-add dropout for embedding dropout
* better naming
* better naming
* renaming
* finalize test branch
* finalize tests
* add more tests
* finish tests
* fix
* fix type trainer
* fix fp16 tests
* fix tests
* fix tests
* fix tests
* fix issue with dropout
* fix dropout seeds
* correct random seed on gpu
* finalize random seed for dropout
* finalize random seed for dropout
* remove duplicate line
* correct half precision bug
* make style
* refactor
* refactor
* docstring
* remove sinusoidal position encodings for reformer
* move chunking to modeling_utils
* make style
* clean config
* make style
* fix tests
* fix auto tests
* pretrained models
* fix docstring
* update conversion file
* Update pretrained_models.rst
* fix rst
* fix rst
* update copyright
* fix test path
* fix test path
* fix small issue in test
* include reformer in generation tests
* add docs for axial position encoding
* finish docs
* Update convert_reformer_trax_checkpoint_to_pytorch.py
* remove isort
* include sams comments
* remove wrong comment in utils
* correct typos
* fix typo
* Update reformer.rst
* applied morgans optimization
* make style
* make gpu compatible
* remove bogus file
* big test refactor
* add example for chunking
* fix typo
* add to README
* First commit to add a TF version of the trainer.
* Make the TF trainer closer to what looks the PT trainer
* Refactoring common code between the PT and TF trainer into an util file.
* Some bugfix + better similarity with the PT trainer
* Add missing class in transformers init
* Bugfix over prediction + use classification report instead of simple metrics
* Fix name error
* Fix optimization tests + style
* Apply style
* Several bugfix for multi-gpu training
* Apply style
* Apply style
* Add glue example for the TF trainer
* Several bugix + address the reviews
* Fix on the TF training args file
* Add a debug mode
* Bugfix in utils_ner.py when segment_ids is None
* Apply style
* Apply style
* Add TPU strategy
* Fix selection strategy
There's an inconsistency right now where:
- we load some models into CACHE_DIR
- and some models in the default cache
- and often, in both for the same models
When running the RUN_SLOW tests, this takes a lot of disk space, time, and bandwidth.
I'd rather always use the default cache
* Update sqrt computation so it can survive a torch.jit.trace
* Update modeling_gpt2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
* Add GenerationPipeline
* Fix parameter names
* Correct parameter __call__ parameters
* Add model type attribute and correct function calls for prepare_input
* Take out trailing commas from init attributes
* Remove unnecessary tokenization line
* Implement support for multiple text inputs
* Apply generation support for multiple input text prompts
* Take out tensor coersion
* Take out batch index
* Add text prompt to return sequence
* Squeeze token tensore before decoding
* Return only a single list of sequences if only one prompt was used
* Correct results variable name
* Add GenerationPipeline to SUPPORTED_TASKS with the alias , initalized w GPT2
* Registedred AutoModelWithLMHead for both pt and t
* Update docstring for GenerationPipeline
* Add kwargs parameter to mode.generate
* Take out kwargs parameter after all
* Add generation pipeline example in pipeline docstring
* Fix max length by squeezing tokens tensor
* Apply ensure_tensor_on_device to pytorch tensor
* Include generation step in torch.no_grad
* Take out input from prepare_xlm_input and set 'en' as default xlm_language
* Apply framework specific encoding during prepare_input
* Format w make style
* Move GenerationPipeline import to follow proper import sorting
* Take out training comma from generation dict
* Apply requested changes
* Change name to TextGenerationPipeline
* Apply TextGenerationPipeline rename to __init___
* Changing alias to
* Set input mapping as input to ensure_tensor_on_device
* Fix assertion placement
* Add test_text_generation
* Add TextGenerationPipeline to PipelineCommonTests
* Take out whitespace
* Format __init__ w black
* Fix __init__ style
* Forman __init___
* Add line to end of __init__
* Correct model tokenizer set for test_text_generation
* Ensure to return list of list, not list of string (to pass test)
* Limit test models to only 3 to limit runtime to address circleCI timeout error
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update tests/test_pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Remove argument docstring, __init__, add additional __call__ arguments, and reformat results to list of dict
* Fix blank result list
* Add TextGenerationPipeline to pipelines.rst
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Fix typos from adding PADDING_TEXT_TOKEN_LENGTH
* Fix incorrectly moved result list
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>
* Add back generation line and make style
* Take out blank whitespace
* Apply new alis, text-generation, to test_pipelines
* Fix text generation alias in test
* Update src/transformers/pipelines.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
* doc
* [tests] Add sample files for a regression task
* [HUGE] Trainer
* Feedback from @sshleifer
* Feedback from @thomwolf + logging tweak
* [file_utils] when downloading concurrently, get_from_cache will use the cached file for subsequent processes
* [glue] Use default max_seq_length of 128 like before
* [glue] move DataTrainingArguments around
* [ner] Change interface of InputExample, and align run_{tf,pl}
* Re-align the pl scripts a little bit
* ner
* [ner] Add integration test
* Fix language_modeling with API tweak
* [ci] Tweak loss target
* Don't break console output
* amp.initialize: model must be on right device before
* [multiple-choice] update for Trainer
* Re-align to 827d6d6ef071029cfe82838a18dab046b5813976
* First pass on utility classes and python tokenizers
* finishing cleanup pass
* style and quality
* Fix tests
* Updating following @mfuntowicz comment
* style and quality
* Fix Roberta
* fix batch_size/seq_length inBatchEncoding
* add alignement methods + tests
* Fix OpenAI and Transfo-XL tokenizers
* adding trim_offsets=True default for GPT2 et RoBERTa
* style and quality
* fix tests
* add_prefix_space in roberta
* bump up tokenizers to rc7
* style
* unfortunately tensorfow does like these - removing shape/seq_len for now
* Update src/transformers/tokenization_utils.py
Co-Authored-By: Stefan Schweter <stefan@schweter.it>
* Adding doc and docstrings
* making flake8 happy
Co-authored-by: Stefan Schweter <stefan@schweter.it>
token_type_id is converted into the segment embedding. For question answering,
this needs to highlight whether a token belongs to sequence 0 or 1.
encode_plus takes care of correctly setting this parameter automatically.
* Refactored use of newstest2013 to newstest2014. Fixed bug where argparse consumed first command line argument as model_size argument rather than using default model_size by forcing explicit --model_size flag inclusion
* More pythonic file handling through 'with' context
* COSMETIC - ran Black and isort
* Fixed reference to number of lines in newstest2014
* Fixed failing test. More pythonic file handling
* finish PR from tholiao
* remove outcommented lines
* make style
* make isort happy
Co-authored-by: Thomas Liao <tholiao@gmail.com>
* remove output_past from pt
* make style
* add optional input length for gpt2
* add use cache to prepare input
* save memory in gpt2
* correct gpt2 test inputs
* make past input optional for gpt2
* finish use_cache for all models
* make style
* delete modeling_gpt2 change in test file
* correct docstring
* correct is true statements for gpt2
* added model_cards for polish squad models
* corrected mistake in polish design cards
* updated model_cards for squad2_dutch model
* added links to benchmark models
Co-authored-by: Henryk Borzymowski <henryk.borzymowski@pwc.com>
* Initial commit to get BERT + run_glue.py on TPU
* Add README section for TPU and address comments.
* Cleanup TPU bits from run_glue.py (#3)
TPU runner is currently implemented in:
https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.
We plan to upstream this directly into `huggingface/transformers`
(either `master` or `tpu`) branch once it's been more thoroughly tested.
* Cleanup TPU bits from run_glue.py
TPU runner is currently implemented in:
https://github.com/pytorch-tpu/transformers/blob/tpu/examples/run_glue_tpu.py.
We plan to upstream this directly into `huggingface/transformers`
(either `master` or `tpu`) branch once it's been more thoroughly tested.
* No need to call `xm.mark_step()` explicitly (#4)
Since for gradient accumulation we're accumulating on batches from
`ParallelLoader` instance which on next() marks the step itself.
* Resolve R/W conflicts from multiprocessing (#5)
* Add XLNet in list of models for `run_glue_tpu.py` (#6)
* Add RoBERTa to list of models in TPU GLUE (#7)
* Add RoBERTa and DistilBert to list of models in TPU GLUE (#8)
* Use barriers to reduce duplicate work/resources (#9)
* Shard eval dataset and aggregate eval metrics (#10)
* Shard eval dataset and aggregate eval metrics
Also, instead of calling `eval_loss.item()` every time do summation with
tensors on device.
* Change defaultdict to float
* Reduce the pred, label tensors instead of metrics
As brought up during review some metrics like f1 cannot be aggregated
via averaging. GLUE task metrics depends largely on the dataset, so
instead we sync the prediction and label tensors so that the metrics can
be computed accurately on those instead.
* Only use tb_writer from master (#11)
* Apply huggingface black code formatting
* Style
* Remove `--do_lower_case` as example uses cased
* Add option to specify tensorboard logdir
This is needed for our testing framework which checks regressions
against key metrics writtern by the summary writer.
* Using configuration for `xla_device`
* Prefix TPU specific comments.
* num_cores clarification and namespace eval metrics
* Cache features file under `args.cache_dir`
Instead of under `args.data_dir`. This is needed as our test infra uses
data_dir with a read-only filesystem.
* Rename `run_glue_tpu` to `run_tpu_glue`
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
* [examples] Generate argparsers from type hints on dataclasses
* [HfArgumentParser] way simpler API
* Restore run_language_modeling.py for easier diff
* [HfArgumentParser] final tweaks from code review
* Big cleanup of `glue_convert_examples_to_features`
* Use batch_encode_plus
* Cleaner wrapping of glue_convert_examples_to_features for TF
@lysandrejik
* Cleanup syntax, thanks to @mfuntowicz
* Raise explicit error in case of user error
* Optimize causal mask using torch.where
Instead of multiplying by 1.0 float mask, use torch.where with a bool mask for increased performance.
* Maintain compatiblity with torch 1.0.0 - thanks for PR feedback
* Fix typo
* reformat line for CI
* Renamed num_added_tokens to num_special_tokens_to_add
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Cherry-Pick: Partially fix space only input without special tokens added to the output #3091
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added property is_fast on PretrainedTokenizer and PretrainedTokenizerFast
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Make fast tokenizers unittests work on Windows.
* Entirely refactored unittest for tokenizers fast.
* Remove ABC class for CommonFastTokenizerTest
* Added embeded_special_tokens tests from allenai @dirkgr
* Make embeded_special_tokens tests from allenai more generic
* Uniformize vocab_size as a property for both Fast and normal tokenizers
* Move special tokens handling out of PretrainedTokenizer (SpecialTokensMixin)
* Ensure providing None input raise the same ValueError than Python tokenizer + tests.
* Fix invalid input for assert_padding when testing batch_encode_plus
* Move add_special_tokens from constructor to tokenize/encode/[batch_]encode_plus methods parameter.
* Ensure tokenize() correctly forward add_special_tokens to rust.
* Adding None checking on top on encode / encode_batch for TransfoXLTokenizerFast.
Avoid stripping on None values.
* unittests ensure tokenize() also throws a ValueError if provided None
* Added add_special_tokens unittest for all supported models.
* Style
* Make sure TransfoXL test run only if PyTorch is provided.
* Split up tokenizers tests for each model type.
* Fix invalid unittest with new tokenizers API.
* Filter out Roberta openai detector models from unittests.
* Introduce BatchEncoding on fast tokenizers path.
This new structure exposes all the mappings retrieved from Rust.
It also keeps the current behavior with model forward.
* Introduce BatchEncoding on slow tokenizers path.
Backward compatibility.
* Improve error message on BatchEncoding for slow path
* Make add_prefix_space True by default on Roberta fast to match Python in majority of cases.
* Style and format.
* Added typing on all methods for PretrainedTokenizerFast
* Style and format
* Added path for feeding pretokenized (List[str]) input to PretrainedTokenizerFast.
* Style and format
* encode_plus now supports pretokenized inputs.
* Remove user warning about add_special_tokens when working on pretokenized inputs.
* Always go through the post processor.
* Added support for pretokenized input pairs on encode_plus
* Added is_pretokenized flag on encode_plus for clarity and improved error message on input TypeError.
* Added pretokenized inputs support on batch_encode_plus
* Update BatchEncoding methods name to match Encoding.
* Bump setup.py tokenizers dependency to 0.7.0rc1
* Remove unused parameters in BertTokenizerFast
* Make sure Roberta returns token_type_ids for unittests.
* Added missing typings
* Update add_tokens prototype to match tokenizers side and allow AddedToken
* Bumping tokenizers to 0.7.0rc2
* Added documentation for BatchEncoding
* Added (unused) is_pretokenized parameter on PreTrainedTokenizer encode_plus/batch_encode_plus methods.
* Added higher-level typing for tokenize / encode_plus / batch_encode_plus.
* Fix unittests failing because add_special_tokens was defined as a constructor parameter on Rust Tokenizers.
* Fix text-classification pipeline using the wrong tokenizer
* Make pipelines works with BatchEncoding
* Turn off add_special_tokens on tokenize by default.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Remove add_prefix_space from tokenize call in unittest.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Style and quality
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Correct message for batch_encode_plus none input exception.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix invalid list comprehension for offset_mapping overriding content every iteration.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* TransfoXL uses Strip normalizer.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Bump tokenizers dependency to 0.7.0rc3
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Support AddedTokens for special_tokens and use left stripping on mask for Roberta.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* SpecilaTokenMixin can use slots to faster access to underlying attributes.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Remove update_special_tokens from fast tokenizers.
* Ensure TransfoXL unittests are run only when torch is available.
* Style.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Style
* Style 🙏🙏
* Remove slots on SpecialTokensMixin, need deep dive into pickle protocol.
* Remove Roberta warning on __init__.
* Move documentation to Google style.
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
* Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py
`convert_examples_to_fes atures` sets `pad_token=0` by default, which is correct for BERT but incorrect for RoBERTa (`pad_token=1`) and XLNet (`pad_token=5`). I think the other arguments to `convert_examples_to_features` are correct, but it might be helpful if someone checked who is more familiar with this part of the codebase.
* Simplifying change to match recent commits
* add some t5 integration tests
* finish summarization and translation integration tests for T5 - results loook good
* add tf test
* fix == vs is bug
* fix tf beam search error and make tf t5 tests pass
* Using loaded checkpoint with --do_predict
Without this fix, I'm getting near-random validation performance for a trained model, and the validation performance differs per validation run. I think this happens since the `model` variable isn't set with the loaded checkpoint, so I'm using a randomly initialized model. Looking at the model activations, they differ each time I run evaluation (but they don't with this fix).
* Update checkpoint loading
* Fixing model loading
* Update the NER TF script to remove the softmax and make the pad token label id to -1
* Reformat the quality and style
Co-authored-by: Julien Plu <julien.plu@adevinta.com>
* make decoder input ids optional for t5 training
* lm_lables should not be shifted in t5
* add tests
* finish shift right functionality for PT T5
* move shift right to correct class
* cleaner code
* replace -100 values with pad token id
* add assert statement
* remove unnecessary for loop
* make style
* Add clear description of how to train T5
* correct docstring in T5
* correct typo
* correct docstring format
* update t5 model docs
* implement collins feedback
* fix typo and add more explanation for sentinal tokens
* delete unnecessary todos
* force bleu
* fix wrong file name
* rename file
* different filenames for each example test
* test files should clean up after themselves
* test files should clean up after themselves
* do not force bleu
* correct typo
* fix isort
* Use tokenizer.num_added_tokens to count number of added special_tokens instead of hardcoded numbers.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* run_ner.py - Do not add a label to the labels_ids if word_tokens is empty.
This can happen when using bert-base-multilingual-cased with an input containing an unique space.
In this case, the tokenizer will output just an empty word_tokens thus leading to an non-consistent behavior
over the labels_ids tokens adding one more tokens than tokens vector.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Add the missing token classification for XLM
* fix styling
* Add XLMForTokenClassification to AutoModelForTokenClassification class
* Fix docstring typo for non-existing class
* Add the missing token classification for XLM
* fix styling
* fix styling
* Add XLMForTokenClassification to AutoModelForTokenClassification class
* Fix docstring typo for non-existing class
* Add missing description for AlbertForTokenClassification
* fix styling
* Add missing docstring for AlBert
* Slow tests should be slow
Co-authored-by: Sakares Saengkaew <s.sakares@gmail.com>
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
* [ci] Also run test_examples in py37
(will revert at the end of the experiment)
* InputExample: use immutable dataclass
* [deps] Install dataclasses for Py<3.7
* [skip ci] Revert "[ci] Also run test_examples in py37"
This reverts commit d29afd9959786b77759b0b8fa4e6b4335b952015.
The CONTRIBUTING file pins to a specific version of isort, so we might as well install that in `dev` . This makes it easier for contributors so they don't have to manually install the specific commit.
I found there are two grammar errors or typo issues in the explanation of the encoding properties.
The original sentences:
If your was made of multiple \"parts\" such as (question, context), then this would be a vector with for each token the segment it belongs to
If your has been truncated into multiple subparts because of a length limit (for BERT for example the sequence length is limited to 512), this will contain all the remaining overflowing parts.
I think "input" should be inserted after the phrase "If your".
* fix conflicts
* update bart max length test
* correct spelling mistakes
* implemented model specific encode function
* fix merge conflicts
* better naming
* save intermediate state -> need to rethink strucuture a bit
* leave tf problem as it is for now
* current version
* add layers.pop
* remove ipdb
* make style
* clean return cut decoding
* remove ipdbs
* Fix restoring layers in the decoders that doesnt exists.
* push good intermediate solution for now
* fix conflicts
* always good to refuse to merge conflicts when rebasing
* fix small bug
* improve function calls
* remove unused file
* add correct scope behavior for t5_generate
Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>
For the tutorial of "How to generate text", the URL link was wrong (it was linked to the tutorial of "How to train a language model").
I fixed the URL.
* added return_token_type_ids argument for tokenizers which do not generate return_type_ids by default
* fixed styling
* Style
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
* first commit
* work in progress
* make language generation task pass
* update to working version for LM
* delete print
* remove dead code
* make style
* passing
* Undo stupid chg
* docs
* undo rename
* delete-cruft
* only import if you have torch
* Dont rely on dict ordering
* Fix dict ordering upstream
* docstring link
* docstring link
* remove trailing comma for 3.5 compat
* new name
* delegate kwarging
* Update kwargs
* ✨ Alter base pl transformer to use automodels
* 🐛 Add batch size env variable to function call
* 💄 Apply black code style from Makefile
* 🚚 Move lightning base out of ner directory
* ✨ Add lightning glue example
* 💄 self
* move _feature_file to base class
* ✨ Move eval logging to custom callback
* 💄 Apply black code style
* 🐛 Add parent to pythonpath, remove copy command
* 🐛 Add missing max_length kwarg
* memory benchmark rss
* have both forward pass and line-by-line mem tracing
* cleaned up tracing
* refactored and cleaning up API
* no f-strings yet...
* add GPU mem logging
* fix GPU memory monitoring
* style and quality
* clean up and doc
* update with comments
* Switching to python 3.6+
* fix quality
This model card is intended to be shared among all models under google/bert_uncased_*
(We'll need some support from HuggingFace to get this card cross-linked from all models)
* Add TF2 version of FlauBERT
* Add TF2 version of FlauBERT
* Add documentation
* Apply style and quality
* Apply style once again
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* add empty model cards for every current DeepPavlov model
* fix: replace cyrillic `с` with `c`
* docs: add model cards for current DeepPavlov BERT models
* docs: add links for arXiv preprints
Be explicit that this is config for the transformers package (as these
layers may coexist with other custom stuff in a Keras model, plus the
Keras container itself is called config, and config["config"] is not
great)
Add explicit error handling for initializer calls that have neither
the `config` nor the `transformers_config` argument, or have both.
This was the beginnings of an attempt to address the test failure on
this layer, and instead I backed out of making this layer
keras-serializable at all ... so it was a mistake to commit this.
* Added transformers-pytorch-cpu and gpu Docker images
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added automatic jupyter launch for Docker image.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Move image from alpine to Ubuntu to align with NVidia container images.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added TRANSFORMERS_VERSION argument to Dockerfile.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added Pytorch-GPU based Docker image
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added Tensorflow images.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Use python 3.7 as Tensorflow doesnt provide 3.8 compatible wheel.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Remove double FROM instructions on transformers-pytorch-cpu image.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added transformers-tensorflow-gpu Docker image.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* use the correct ubuntu version for tensorflow-gpu
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added pipelines example notebook
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added transformers-cpu and transformers-gpu (including both PyTorch and TensorFlow) images.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Docker images doesnt start jupyter notebook by default.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Tokenizers notebook
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Update images links
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Update Docker images to python 3.7.6 and transformers 2.5.1
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added 02-transformers notebook.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Trying to realign 02-transformers notebook ?
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added Transformer image schema
* Some tweaks on tokenizers notebook
* Removed old notebooks.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Attempt to provide table of content for each notebooks
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Second attempt.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Reintroduce transformer image.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Keep trying
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* It's going to fly !
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Remaining of the Table of Content
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix inlined elements for the table of content
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Removed anaconda dependencies for Docker images.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Removing notebooks ToC
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added LABEL to each docker image.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Removed old Dockerfile
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Directly use the context and include transformers from here.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Reduce overall size of compiled Docker images.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Install jupyter by default and use CMD for easier launching of the images.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Reduce number of layers in the images.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added README.md for notebooks.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix notebooks link in README
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix some wording issues.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added blog notebooks too.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing spelling errors in review comments.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
Co-authored-by: MOI Anthony <xn1t0x@gmail.com>
When supplied by Keras deserialization, the config parameter to initializers
will be a dict. So intercept it and convert to PretrainedConfig object (and
store in instance attribute for get_config to get at it) before passing to the
actual initializer. To accomplish this, and repeat as little code as possible,
use a class decorator on TF*MainLayer classes.
* Rename and improve example
* Add test
* slightly faster test
* style
* This breaks remy prolly
* shorter test string
* no slow
* newdir structure
* New tree
* Style
* shorter
* docs
* clean
* Attempt future import
* more import hax
* add first copy past test to tf 2 generate
* add tf top_k_top_p_filter fn
* add generate function for TF
* add generate function for TF
* implemented generate for all models expect transfoXL
* implemented generate for all models expect transfoXL
* implemented generate for all models expect transfoXL
* make style
* change permission of test file to correct ones
* delete ipdb
* delete ipdb
* fix bug and finish simple gpt2 integration test
* clean test file
* clean test file
* make style
* make style
* make style
* make style
* change import style
* change import style
* make style
* make style
* add decorators
* add decorators
* fix tf ctrl bug dim => axis in TF
* make style
* make style
* refactored test file
* refactored test file
* take out test_torch_tf_conversion if nothing is defined
* take out test_torch_tf_conversion if nothing is defined
* remove useless files
* remove useless files
* fix conflicts
* fix conflicts
* fix conflicts
* fix conflicts
* fix conflicts
* solve conflicts
* solve conflicts
* fix conflicts
* fix conflicts
* merge conflicts
* delete ipdb
* exposed top_k_top_p_filtering fns
* delete weirdly created w! file
* add comment to test tf common modeling
* fix conflicts
* fix conflicts
* make style
* merge conflicts
* make style
* change tf.tensor.shape to shape_list(tensor)
* Pipeline doc initial commit
* pipeline abstraction
* Remove modelcard argument from pipeline
* Task-specific pipelines can be instantiated with no model or tokenizer
* All pipelines doc
* Create self-hosted.yml
* Update self-hosted.yml
* Update self-hosted.yml
* Update self-hosted.yml
* Update self-hosted.yml
* Update self-hosted.yml
* do not run slow tests, for now
* [ci] For comparison with circleci, let's also run CPU-tests
* [ci] reorganize
* clearer filenames
* [ci] Final tweaks before merging
* rm slow tests on circle ci
* Trigger CI
* On GPU this concurrency was way too high
* * Added support for Albert when fine-tuning for NER
* Added support for Albert in NER pipeline
* Added command-line options to examples/ner/run_ner.py to better control tokenization
* Added class AlbertForTokenClassification
* Changed output for NerPipeline to use .convert_ids_to_tokens(...) instead of .decode(...) to better reflect tokens
* Added ,
* Now passes style guide enforcement
* Changes from reviews.
* Code now passes style enforcement
* Added test for AlbertForTokenClassification
* Added test for AlbertForTokenClassification
* Usage: Sequence Classification & Question Answering
* Pipeline example
* Language modeling
* TensorFlow code for Sequence classification
* Custom TF/PT toggler in docs
* QA + LM for TensorFlow
* Finish Usage for both PyTorch and TensorFlow
* Addressing Julien's comments
* More assertive
* cleanup
* Favicon
- added favicon option in conf.py along with the favicon image
- udpated 🤗 logo. slightly smaller and should appear more consistent across editing programs (no more tongue on the outside of the mouth)
Co-authored-by: joshchagani <joshua@joshuachagani.com>
* Renamed file generate by tokenizers when calling save_pretrained to match python.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added save_vocabulary tests.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Remove python quick and dirty fix for clean Rust impl.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Bump tokenizers dependency to 0.5.1
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* TransfoXLTokenizerFast uses a json vocabulary file + warning about incompatibility between Python and Rust
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added some save_pretrained / from_pretrained unittests.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Update tokenizers to 0.5.2
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Quality and format.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* flake8
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Making sure there is really a bug in unittest
* Fix TransfoXL constructor vocab_file / pretrained_vocab_file mixin.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* add preprocessing to add space before punctuation for transfo_xl
* improve warning messages
* make style
* compile regex at instantination of tokenizer object
* Add disable_outgoing to pretrained items
Setting disable_outgoing=True disables outgonig traffic:
- etags are not looked up
- models are not downloaded
* parameter name change
* Remove forgotten print
* Testing that encode_plus and batch_encode_plus behave the same way
Spoiler alert: they don't
* Testing rest of arguments in batch_encode_plus
* Test tensor return in batch_encode_plus
* Addressing Sam's comments
* flake8
* Simplified with `num_added_tokens`
* Added support for Albert in NER pipeline
* Added command-line options to examples/ner/run_ner.py to better control tokenization
* Added class AlbertForTokenClassification
* Changed output for NerPipeline to use .convert_ids_to_tokens(...) instead of .decode(...) to better reflect tokens
- I added an example using the model with pipelines to show that we have set```{"use_fast": False}``` in the tokenizer.
- I added a Colab to play with the model and pipelines
- I added a Colab to discover Huggingface pipelines at the end of the document
* enable_padding should pad up to max_length if set.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added more testing on padding.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* improving generation
* finalized special token behaviour for no_beam_search generation
* solved modeling_utils merge conflict
* solve merge conflicts in modeling_utils.py
* add run_generation improvements from PR #2749
* adapted language generation to not use hardcoded -1 if no padding token is available
* remove the -1 removal as hard coded -1`s are not necessary anymore
* add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown
* add slow language generation tests for pretrained models using hardcoded output with pytorch seed
* delete ipdb
* check that all generated tokens are valid
* renaming
* renaming Generation -> Generate
* make style
* updated so that generate_beam_search has same token behavior than generate_no_beam_search
* consistent return format for run_generation.py
* deleted pretrain lm generate tests -> will be added in another PR
* cleaning of unused if statements and renaming
* run_generate will always return an iterable
* make style
* consistent renaming
* improve naming, make sure generate function always returns the same tensor, add docstring
* add slow tests for all lmhead models
* make style and improve example comments modeling_utils
* better naming and refactoring in modeling_utils
* improving generation
* finalized special token behaviour for no_beam_search generation
* solved modeling_utils merge conflict
* solve merge conflicts in modeling_utils.py
* add run_generation improvements from PR #2749
* adapted language generation to not use hardcoded -1 if no padding token is available
* remove the -1 removal as hard coded -1`s are not necessary anymore
* add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown
* add slow language generation tests for pretrained models using hardcoded output with pytorch seed
* delete ipdb
* check that all generated tokens are valid
* renaming
* renaming Generation -> Generate
* make style
* updated so that generate_beam_search has same token behavior than generate_no_beam_search
* consistent return format for run_generation.py
* deleted pretrain lm generate tests -> will be added in another PR
* cleaning of unused if statements and renaming
* run_generate will always return an iterable
* make style
* consistent renaming
* improve naming, make sure generate function always returns the same tensor, add docstring
* add slow tests for all lmhead models
* make style and improve example comments modeling_utils
* better naming and refactoring in modeling_utils
* changed fast random lm generation testing design to more general one
* delete in old testing design in gpt2
* correct old variable name
* temporary fix for encoder_decoder lm generation tests - has to be updated when t5 is fixed
* adapted all fast random generate tests to new design
* better warning description in modeling_utils
* better comment
* better comment and error message
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
* Remove warning when pad_to_max_length is not set.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Move RoberTa warning to RoberTa and not GPT2 base tokenizer.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Correctly return the tuple of generated file(s) when calling save_pretrained
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Quality and format.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Override build_inputs_with_special_tokens for fast impl + unittest.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Quality + format.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Implemented fast version of tokenizers
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Bumped tokenizers version requirements to latest 0.2.1
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added matching tests
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Matching OpenAI GPT tokenization !
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Matching GPT2 on tokenizers
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Expose add_prefix_space as constructor parameter for GPT2
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Matching Roberta tokenization !
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Removed fast implementation of CTRL.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Binding TransformerXL tokenizers to Rust.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Updating tests accordingly.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added tokenizers as top-level modules.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Black & isort.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Rename LookupTable to WordLevel to match Rust side.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Black.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Use "fast" suffix instead of "ru" for rust tokenizers implementations.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Introduce tokenize() method on fast tokenizers.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* encode_plus dispatchs to batch_encode_plus
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* batch_encode_plus now dispatchs to encode if there is only one input element.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Bind all the encode_plus parameter to the forwarded batch_encode_plus call.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Bump tokenizers dependency to 0.3.0
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Formatting.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix tokenization_auto with support for new (python, fast) mapping schema.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Give correct fixtures path in test_tokenization_fast.py for the CLI.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Expose max_len_ properties on BertTokenizerFast
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Move max_len_ properties to PreTrainedTokenizerFast and override in specific subclasses.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* _convert_encoding should keep the batch axis tensor if only one sample in the batch.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Add warning message for RobertaTokenizerFast if used for MLM.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added use_fast (bool) parameter on AutoTokenizer.from_pretrained().
This allows to easily enable/disable Rust-based tokenizer instantiation.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Let's tokenizers handle all the truncation and padding stuff.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Allow to provide tokenizer arguments during pipeline creation.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Update test_fill_mask pipeline to not use fast tokenizers.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix too much parameters for convert_encoding.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* When enabling padding, max_length should be set to None.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Avoid returning nested tensors of length 1 when calling encode_plus
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Ensure output is padded when return_tensor is not None.
Tensor creation requires the inital list input to be of the exact same size.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Disable transfoxl unittest if pytorch is not available (required to load the model)
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* encode_plus should not remove the leading batch axis if return_tensor is set
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Temporary disable fast tokenizers on QA pipelines.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix formatting issues.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Update tokenizers to 0.4.0
* Update style
* Enable truncation + stride unit test on fast tokenizers.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Add unittest ensuring special_tokens set match between Python and Rust.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Ensure special_tokens are correctly set during construction.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Give more warning feedback to the user in case of padding without pad_token.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* quality & format.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added possibility to add a single token as str
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Added unittest for add_tokens and add_special_tokens on fast tokenizers.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix rebase mismatch on pipelines qa default model.
QA requires cased input while the tokenizers would be uncased.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing review comment: Using offset mapping relative to the original string + unittest.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing review comment: save_vocabulary requires folder and file name
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing review comment: Simplify import for Bert.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing review comment: truncate_and_pad disables padding according to the same heuristic than the one enabling padding.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing review comment: Remove private member access in tokenize()
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing review comment: Bump tokenizers dependency to 0.4.2
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* format & quality.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing review comment: Use named arguments when applicable.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing review comment: Add Github link to Roberta/GPT2 space issue on masked input.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing review comment: Move max_len_single_sentence / max_len_sentences_pair to PreTrainedTokenizerFast + tests.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing review comment: Relax type checking to include tuple and list object.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Addressing review comment: Document the truncate_and_pad manager behavior.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Raise an exception if return_offsets_mapping is not available with the current tokenizer.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Ensure padding is set on the tokenizers before setting any padding strategy + unittest.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* On pytorch we need to stack tensor to get proper new axis.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Generalize tests to different framework removing hard written return_tensors="..."
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Bump tokenizer dependency for num_special_tokens_to_add
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Overflowing tokens in batch_encode_plus are now stacked over the batch axis.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Improved error message for padding strategy without pad token.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Bumping tokenizers dependency to 0.5.0 for release.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Optimizing convert_encoding around 4x improvement. 🚀
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* expose pad_to_max_length in encode_plus to avoid duplicating the parameters in kwargs
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Generate a proper overflow_to_sampling_mapping when return_overflowing_tokens is True.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Fix unittests for overflow_to_sampling_mapping not being returned as tensor.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Format & quality.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Remove perfect alignment constraint for Roberta (allowing 1% difference max)
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Triggering final CI
Co-authored-by: MOI Anthony <xn1t0x@gmail.com>
I trained the model for more epochs so I improved the results. This commit will update the results of the model and add a gif using it with **transformers/pipelines**
* Created model card for nlptown/bert-base-multilingual-sentiment
* Delete model card
* Created model card for bert-base-multilingual-uncased-sentiment as README
* Preserve spaces in GPT-2 tokenizers
Preserves spaces after special tokens in GPT-2 and inhereted (RoBERTa)
tokenizers, enabling correct BPE encoding. Automatically inserts a space
in front of first token in encode function when adding special tokens.
* Add tokenization preprocessing method
* Add framework argument to pipeline factory
Also fixes pipeline test issue. Each test input now treated as a
distinct sequence.
PyTorch < 1.3 requires multiplication operands to be of the same type.
This was violated when using default attention mask (i.e.,
attention_mask=None in arguments) given BERT in the decoder mode.
In particular, this was breaking Model2Model and made tutorial
from the quickstart failing.
Tensorflow 2.1.0 introduce a new dependency model where pip install tensorflow would install tf with GPU support.
Before it would just install with CPU support, thus CircleCI is looking for NVidia driver version at initialization of the
tensorflow related tests but fails as their is no NVidia Driver running.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* pass langs parameter to certain XLM models
Adding an argument that specifies the language the SQuAD dataset is in so language-sensitive XLMs (e.g. `xlm-mlm-tlm-xnli15-1024`) don't default to language `0`.
Allows resolution of issue #1799 .
* fixing from `make style`
* fixing style (again)
* add "info" command to CLI
As a convenience, add the info directive to CLI. Running `python transformers-cli info` will return a string containing the transformers version, platform, python version, PT/TF version and GPU support
* Swap f-strings for .format
Still supporting 3.5 so can't use f-strings (sad face)
* Add reference in issue to CLI
* Add the expected fields to issue template
This way, people can still add the information manually if they want. (Though I fear they'll just ignore it.)
* Remove heading from output
* black-ify
* order of imports
Should ensure isort test passes
* use is_X_available over import..pass
* style
* fix copy-paste bug
* Rename command info -> env
Also adds the command to CONTRIBUTING.md in "Did you find a bug" section
The FlauBERT configuration file inherits from XLMConfig, and is recognized as such when loading from AutoModels as the XLMConfig is checked before the FlaubertConfig.
Changing the order solves this problem, but a test should be added.
* fill_mask helper
* [poc] FillMaskPipeline
* Revert "[poc] FillMaskPipeline"
This reverts commit 67eeea55b0f97b46c2b828de0f4ee97d87338335.
* Revert "fill_mask helper"
This reverts commit cacc17b884e14bb6b07989110ffe884ad9e36eaa.
* README: clarify that Pipelines can also do text-classification
cf. question at the AI&ML meetup last week, @mfuntowicz
* Fix test: test feature-extraction pipeline
* Test tweaks
* Slight refactor of existing pipeline (in preparation of new FillMaskPipeline)
* Extraneous doc
* More robust way of doing this
@mfuntowicz as we don't rely on the model name anymore (see AutoConfig)
* Also add RobertaConfig as a quickfix for wrong token_type_ids
* cs
* [BIG] FillMaskPipeline
In batch_encode_plus we have to ensure that the tokenizer has a pad_token_id so that, when padding, no None values are added as padding. That would happen with gpt2, openai, transfoxl.
closes https://github.com/huggingface/transformers/issues/2640
- mostly stylistic streamlining
- removed 'additional context' sections. They seem to be rarely used and might cause confusion. If more details are needed, users can add them to the 'details' section
T5WithLMHeadModel's doc string claims that indices of -1 are
ignored while computing the cross-entropy loss in the forward
pass; however, indices of -1 throw an error while indices of -100
are ignored. This commit updates the doc string to be consistent
with the class's behavior.
- It appears that `tqdm` only introduced `tqdm.auto` in 4.27.
- See https://github.com/tqdm/tqdm/releases/tag/v4.27.0.
- Without the lower bound I received the following stack trace in an environment where I already had tqdm installed:
```
File "/home/brendanr/anaconda3/envs/allennlp/lib/python3.6/site-packages/transformers/__init__.py", line 20, in <module>
from .file_utils import (TRANSFORMERS_CACHE, PYTORCH_TRANSFORMERS_CACHE, PYTORCH_PRETRAINED_BERT_CACHE,
File "/home/brendanr/anaconda3/envs/allennlp/lib/python3.6/site-packages/transformers/file_utils.py", line 24, in <module>
from tqdm.auto import tqdm
ModuleNotFoundError: No module named 'tqdm.auto'
```
Created a link between the linear layer bias and the model attribute bias. This does not change anything for the user nor for the conversion scripts, but allows the `resize_token_embeddings` method to resize the bias as well as the weights of the decoder.
Added a test.
Modified QA pipeline to consider all features for each example before generating topk answers.
Current pipeline only takes one SquadExample, one SquadFeature, one start logit list, one end logit list to retrieve the answer, this is not correct as one SquadExample can produce multiple SquadFeatures.
Use -e only in docs targeted at contributors.
If a user copy-pastes command line with [--editable], they will hit
an error. If they don't know the --editable option, we're giving them
a choice to make before they can move forwards, but this isn't a choice
they need to make right now.
If a user or contributor ran `pip install -e .` on transformers < 3.0,
pip created a transformers.egg-info directory next to the transformers
directory at the root of the repository.
In transformers 3.0, the source is in a `src` subdirectory.
`pip install -e .` creates a transformers.egg-info directory there.
However, pip will still pick transformers.egg-info from the previous
location. This is a bug: https://github.com/pypa/pip/issues/5466
Users and contributors are likely to hit this problem because the
documentation for transformers 3.0 relies heavily on extra_requires
which didn't exist in earlier versions, so aren't defined in a stale
transformers.egg-info directory.
If such a directory exists, remove it. It's autogenerated, gitignored
and not supposed to contain anything of value.
I suspect the wrapper classes were created in order to prevent the
abstract base class (TF)CommonModelTester from being included in test
discovery and running, because that would fail.
I solved this by replacing the abstract base class with a mixin.
Code changes are just de-indenting and automatic reformattings
performed by black to use the extra line space.
This construct isn't used anymore these days.
Running python tests/test_foo.py puts the tests/ directory on
PYTHONPATH, which isn't representative of how we run tests.
Use python -m unittest tests/test_foo.py instead.
This prevents transformers from being importable simply because the CWD
is the root of the git repository, while not being importable from other
directories. That led to inconsistent behavior, especially in examples.
Once you fetch this commit, in your dev environment, you must run:
$ pip uninstall transformers
$ pip install -e .
These libraries aren't always installed in the virtual environment where
isort is running. Declaring them properly avoids mixing these
third-party imports with local imports.
This change is mostly autogenerated with:
$ python -m autoflake --in-place --recursive --remove-all-unused-imports --ignore-init-module-imports examples templates transformers utils hubconf.py setup.py
I made minor changes in the generated diff.
This change is mostly autogenerated with:
$ python -m autoflake --in-place --recursive examples templates transformers utils hubconf.py setup.py
I made minor changes in the generated diff.
This is the result of:
$ black --line-length 119 examples templates transformers utils hubconf.py setup.py
There's a lot of fairly long lines in the project. As a consequence, I'm
picking the longest widely accepted line length, 119 characters.
This is also Thomas' preference, because it allows for explicit variable
names, to make the code easier to understand.
We're already using as many processes in parallel as we have CPU cores.
Furthermore, the number of core may be incorrectly calculated as 36
(we've seen this in pytest-xdist) which make compound the problem.
PyTorch performance craters without this.
Set the number of CPUs manually based on the Circle CI resource class,
or else we're getting 36 CPUs, which is far too much (perhaps that's
the underlying hardware and not what Circle CI allocates to us).
Don't parallelize the custom tokenizers tests because they take less
than one second to run and parallelization actually makes them slower.
Since the file is written to the filesystem, a filesystem lock is the
way to go here. Add a dependency on the third-party filelock library to
get cross-platform functionality.
Caching models across test cases and across runs of the test suite makes
slow tests somewhat more bearable.
Use gettempdir() instead of /tmp in tests. This makes it easier to
change the location of the cache with semi-standard TMPDIR/TEMP/TMP
environment variables.
Fix#2222.
This allows moving the file instead of copying it, which is more
reliable. Also it avoids writing large amounts of data to /tmp,
which may not be large enough to accomodate it.
Refs #2222.
- Empty the output directory, if it contains any files or subdirectories.
- Create the "encoder" directory inside "save_directory", if not exists.
- Create the "decoder" directory inside "save_directory", if not exists.
- Save the encoder and the decoder in the previous two directories, respectively.
* Small clarification
Matches line 431 to line 435 for additional clarity and consistency.
* Fixed minor typo
The letter "s" was previously omitted from the word "docstrings".
We currently save the pretrained_weights of the encoder and decoder in
two separate directories `encoder` and `decoder`. However, for the
`from_pretrained` function to operate with automodels we need to
specify the type of model in the path to the weights.
The path to the encoder/decoder weights is handled by the
`PreTrainedEncoderDecoder` class in the `save_pretrained` function. Sice
there is no easy way to infer the type of model that was initialized for
the encoder and decoder we add a parameter `model_type` to the function.
This is not an ideal solution as it is error prone, and the model type
should be carried by the Model classes somehow.
This is a temporary fix that should be changed before merging.
We currently create encoder attention masks (when they're not provided)
based on the shape of the inputs to the encoder. This is obviously
wrong; sequences can be of different lengths. We now create the encoder
attention mask based on the batch_size and sequence_length of the
encoder hidden states.
* Switch to plain unittest for skipping slow tests.
Add a RUN_SLOW environment variable for running them.
* Switch to plain unittest for PyTorch dependency.
* Switch to plain unittest for TensorFlow dependency.
* Avoid leaking open files in the test suite.
This prevents spurious warnings when running tests.
* Fix unicode warning on Python 2 when running tests.
The warning was:
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
* Support running PyTorch tests on a GPU.
Reverts 27e015bd.
* Tests no longer require pytest.
* Make tests pass on cuda
When evaluating, shouldn't we always use the SequentialSampler instead of DistributedSampler? Evaluation only runs on 1 GPU no matter what, so if you use the DistributedSampler with N GPUs, I think you'll only evaluate on 1/N of the evaluation set. That's at least what I'm finding when I run an older/modified version of this repo.
Whenever target_mapping is provided to the input, XLNet outputs two different attention streams.
Based on that the attention output would be on of the two:
- a list of tensors (usual case for most transformers)
- a list of 2-tuples of tensors, one tesor for each of attention streams
Docs and unit-tests have been updated
Custom schedulers are currently initiated by wrapping Pytorch's LambdaLR
class and passing a method of the wrapping class to the __init__
function of LambdaLR. This approach is not appropriate for several
reasons:
1. one does not need to define a class when it only defines a
__init__() method;
2. instantiating the parent class by passing a method of the child class
creates a cyclical reference which leads to memory leaks. See issues #1742 and #1134.
In this commit we replace the wrapper classes with functions that
instantiate `LambdaLR` with a custom learning rate function. We use a
closure to specify the parameter of the latter. We also do a bit of
renaming within the function to explicit the behaviour and removed
docstrings that were subsequently not necessary.
As pointed out in #1545, when using an uncased model, and adding
a new uncased token, the tokenizer does not correctly identify this
in the case that the input text contains the token in a cased format.
For instance, if we load bert-base-uncased into BertTokenizer, and
then use .add_tokens() to add "cool-token", we get the expected
result for .tokenize('this is a cool-token'). However, we get a
possibly unexpected result for .tokenize('this is a cOOl-Token'),
which in fact mirrors the result for the former from before the new
token was added.
This commit adds
- functionality to PreTrainedTokenizer to handle this
situation in case a tokenizer (currently Bert, DistilBert,
and XLNet) has the do_lower_case=True kwarg by:
1) lowercasing tokens added with .add_tokens()
2) lowercasing text at the beginning of .tokenize()
- new common test case for tokenizers
https://github.com/huggingface/transformers/issues/1545
We currently initialize `encoder_attention_mask` when it is `None`,
whether the stack is that of an encoder or a decoder. Since this
may lead to bugs that are difficult to tracks down, I added a condition
that assesses whether the current stack is a decoder.
Mish is a new activation function proposed here - https://arxiv.org/abs/1908.08681
It has seen some recent success and has been adopted in SpaCy, Thic, TensorFlow Addons and FastAI-dev.
All benchmarks recorded till now (including against ReLU, Swish and GELU) is present in the repository - https://github.com/digantamisra98/Mish
Might be a good addition to experiment with especially in the Bert Model.
- Fix hanging when loading pretrained models from the cache without having internet access. This is a widespread issue on supercomputers whose internal compute nodes are firewalled.
The introduction of a decoder introduces 2 changes:
- We need to be able to specify a separate mask in the cross
attention to mask the positions corresponding to padding tokens in the
encoder state.
- The self-attention in the decoder needs to be causal on top of not
attending to padding tokens.
the definition of `get_masks` would blow with the proper combination of
arguments. It was just a matter of moving a definition outside of a
control structure.
We currenctly instantiate encoders and decoders for the seq2seq by
passing the `is_decoder` keyword argument to the `from_pretrained`
classmethod. On the other hand, the model class looks for the value
of the `is_decoder` attribute in its config.
In order for the value to propagate from the kwarg to the configuration
we simply need to define `is_decoder` as an attribute to the base
`PretrainedConfig`, with a default at `False`.
the data provided by Li Dong et al. were already tokenized, which means
that they are not compatible with all the models in the library. We
thus process the raw data directly and tokenize them using the models'
tokenizers.
We write a function to load an preprocess the CNN/Daily Mail dataset as
provided by Li Dong et al. The issue is that this dataset has already
been tokenized by the authors, so we actually need to find the original,
plain-text dataset if we want to apply it to all models.
In Rothe et al.'s "Leveraging Pre-trained Checkpoints for Sequence
Generation Tasks", Bert2Bert is initialized with pre-trained weights for
the encoder, and only pre-trained embeddings for the decoder. The
current version of the code completely randomizes the weights of the
decoder.
We write a custom function to initiliaze the weights of the decoder; we
first initialize the decoder with the weights and then randomize
everything but the embeddings.
Since the preloading of weights relies on the name of the class's
attributes changing the namespace breaks loading pretrained weights on
Bert and all related models. I reverted `self_attention` to `attention`
and us `crossattention` for the decoder instead.
In the seq2seq model we need to both load pretrained weights in the
encoder and initialize the decoder randomly. Because the
`from_pretrained` method defined in the base class relies on module
names to assign weights, it would also initialize the decoder with
pretrained weights. To avoid this we override the method to only
initialize the encoder with pretrained weights.
The modifications that I introduced in a previous commit did break
Bert's internal API. I reverted these changes and added more general
classes to handle the encoder-decoder attention case.
There may be a more elegant way to deal with retro-compatibility (I am
not comfortable with the current state of the code), but I cannot see it
right now.
There is currently no way to specify the quey, key and value separately
in the Attention module. However, the decoder's "encoder-decoder
attention" layers take the decoder's last output as a query, the
encoder's states as key and value. We thus modify the existing code so
query, key and value can be added separately.
This obviously poses some naming conventions; `BertSelfAttention` is not
a self-attention module anymore. The way the residual is forwarded is
now awkard, etc. We will need to do some refacto once the decoder is
fully implemented.
adding conversion script
adding first draft of modeling & tokenization
adding placeholder for test files
bunch of changes
registering the tokenizer/model/etc
tests
change link; something is very VERY wrong here
weird end-of-word thingy going on
i think the tokenization works now ; wrote the unit tests
overall structure works;load w next
the monster is alive!
works after some cleanup as well
adding emacs autosave to gitignore
currently only supporting the 48 layer one; seems to infer fine on my macbook
cleanup
fixing some documentation
fixing some documentation
tests passing?
now works on CUDA also
adding greedy?
adding greedy sampling
works well
Attention output was in bnij ordering instead of ijbn which everything
else will expect. This was an oversight on my part, and keeps the
attention inputs/outputs identical to the original code.
Also moved back from tensor slicing to index_select in rel_shift_bnij to
make the tracer happy.
Significant performance boost over the original orderings
on an already somewhat optimised branch this gave me > 2x end-to-end
throughput on a squad xlnet fine-tuning task (batch 8, seq-length 612,
fp16)
Lines 183 - 200, fixed indentation. Line 198, replaced `tokenizer_class` with `BertTokenizer`, since `tokenizer_class` is not defined in the loop it belongs to.
`glue_convert_examples_to_features` assumed that tensorflow_dataset
examples contains the features `'sentence1'` and `'sentence2'`. This
commit encapsulates the choice of features in the glue processor and
uses that to parse examples.
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
Now raises a warning when a head to be deleted already has been deleted. An integration test verifying the total pipeline (-> from config -> save model -> load model -> additional head pruning) has been added.
I assume that it should test the `re-load` functionality after testing the `save` functionality, however I'm also surprised that nobody points this out after such a long time, so maybe I've misunderstood the purpose. This PR is just in case :)
Currently the L2 regularization is hard-coded to "0.01", even though there is a --weight_decay flag implemented (that is unused). I'm making this flag control the weight decay used for fine-tuning in this script.
splitlines() does not work as what we expect here for bert-base-chinese because there is a '\u2028' (unicode line seperator) token in vocab file. Value of '\u2028'.splitlines() is ['', ''].
Perhaps we should use readlines() instead.
Reason for issue was that optimzation steps where computed from example size, which is different from actual size of dataloader when an example is chunked into multiple instances.
Solution in this pull request is to compute num_optimization_steps directly from len(data_loader).
This fix is in reference to issue #382. GPT2 can now be trained in mixed precision, which I've confirmed with testing. I also tested unconditional generation on multiple seeds before and after changing 1e10 to 1e4 and there was no difference. Please let me know if there is anything else I can do to make this pull request better. Thanks for all your work!
Fix for
```> > > > 04/09/2019 21:39:38 - INFO - __main__ - device: cuda n_gpu: 1, distributed training: False, 16-bits training: False
Traceback (most recent call last):
File "/home/ubuntu/pytorch-pretrained-BERT/examples/lm_finetuning/simple_lm_finetuning.py", line 642, in <module>
main()
File "/home/ubuntu/pytorch-pretrained-BERT/examples/lm_finetuning/simple_lm_finetuning.py", line 502, in main
raise ValueError("Training is currently the only implemented execution option. Please set `do_train`.")
ValueError: Training is currently the only implemented execution option. Please set `do_train`.
```
If the value of rand_end is returned from the randint function, the value of sampled_doc_index that matches current_idx is returned from searchsorted.
example:
cumsum_max = {int64} 30
doc_cumsum = {ndarray} [ 5 7 11 19 30]
doc_lengths = {list} <class 'list'>: [5, 2, 4, 8, 11]
if current_idx = 1,
rand_start = 7
rand_end = 35
sentence_index = randint(7, 35) % cumsum_max
if randint return 35, sentence_index becomes 5.
if sentence_index is 5, np.searchsorted returns 1 equal to current_index.
The unconditional generation works now but if the seed is fixed, the sample is the same every time.
n_samples > 1 will give different samples though.
I am giving the start token as '<|endoftext|>' for the unconditional generation.
For many applications requiring randomized data access, it's easier to cache the tokenized representations than the words. So why not turn this into a warning?
# Check on the job periodically. Set the status code depending on what
# happened to the job in Kubernetes. If we try max_checks times and
# still the job hasn't finished, give up and return the starting
# non-zero status code.
while [ $i -lt $max_checks ]; do ((i++)); if kubectl get jobs $job_name -o jsonpath='Failed:{.status.failed}' | grep "Failed:1"; then status_code=1 && break; elif kubectl get jobs $job_name -o jsonpath='Succeeded:{.status.succeeded}' | grep "Succeeded:1" ; then status_code=0 && break; else echo "Job not finished yet"; fi; sleep 30; done && \
echo "Done waiting. Job status code: $status_code" && \
pod_name=$(kubectl get po -l controller-uid=`kubectl get job $job_name -o "jsonpath={.metadata.labels.controller-uid}"` | awk 'match($0,!/NAME/) {print $1}') && \
* [ ] the official example scripts: (give details below)
* [ ] my own modified scripts: (give details below)
The tasks I am working on is:
* [ ] an official GLUE/SQUaD task: (give the name)
* [ ] my own task or dataset: (give details below)
## To reproduce
Steps to reproduce the behavior:
1.
2.
3.
<!-- If you have code snippets, error messages, stack traces please provide them here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.-->
## Expected behavior
<!-- A clear and concise description of what you would expect to happen. -->
name: "\U0001F4DA Migration from pytorch-pretrained-bert or pytorch-transformers"
about: Report a problem when migrating from pytorch-pretrained-bert or pytorch-transformers
to transformers
title: ''
labels: Migration
assignees: ''
---
# 📚 Migration
## Information
<!-- Important information -->
Model I am using (Bert, XLNet ...):
Language I am using the model on (English, Chinese ...):
The problem arises when using:
* [ ] the official example scripts: (give details below)
* [ ] my own modified scripts: (give details below)
The tasks I am working on is:
* [ ] an official GLUE/SQUaD task: (give the name)
* [ ] my own task or dataset: (give details below)
## Details
<!-- A clear and concise description of the migration issue.
If you have code snippets, please provide it here as well.
Important! Use code tags to correctly format your code. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are hard to read and (more importantly) don't allow others to copy-and-paste your code.
-->
## Environment info
<!-- You can run the command `python transformers-cli env` and copy-and-paste its output below.
Don't forget to fill out the missing fields in that output! -->
-`transformers` version:
- Platform:
- Python version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?):
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
<!-- IMPORTANT: which version of the former library do you use? -->
*`pytorch-transformers` or `pytorch-pretrained-bert` version (or branch):
## Checklist
- [ ] I have read the migration guide in the readme.
Congratulations! You've made it this far! You're not quite done yet though.
Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution.
Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change.
Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost.
-->
<!-- Remove if not applicable -->
Fixes # (issue)
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md#start-contributing-pull-requests),
Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link
to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes? Here are the
[documentation guidelines](https://github.com/huggingface/transformers/tree/master/docs), and
[here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/master/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?
## Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.
<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of**who to tag**.
If you have already cloned that repo, you might need to `git pull` to get the most recent changes in the `datasets`
library.
5. Develop the features on your branch.
As you work on the features, you should make sure that the test suite
passes:
```bash
$ make test
```
Note, that this command uses `-n auto` pytest flag, therefore, it will start as many parallel `pytest` processes as the number of your computer's CPU-cores, and if you have lots of those and a few GPUs and not a great amount of RAM, it's likely to overload your computer. Therefore, to run the test suite, you may want to consider using this command instead:
#### This guide was heavily inspired by the awesome [scikit-learn guide to contributing](https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md)
### Develop on Windows
One way one can run the make command on Window is to pass by MSYS2:
1. [Download MSYS2](https://www.msys2.org/), we assume to have it installed in C:\msys64
2. Open the command line C:\msys64\msys2.exe (it should be available from the start menu)
3. Run in the shell: `pacman -Syu` and install make with `pacman -S make`
This repository contains an op-for-op PyTorch reimplementation of [Google's TensorFlow repository for the BERT model](https://github.com/google-research/bert) that was released together with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
<h3align="center">
<p>State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0
</h3>
This implementation is provided with [Google's pre-trained models](https://github.com/google-research/bert), examples, notebooks and a command-line interface to load any pre-trained TensorFlow checkpoint for BERT is also provided.
🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, etc in 100+ languages. Its aim is to make cutting-edge NLP easier to use for everyone.
## Content
🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets then share them with the community on our [model hub](https://huggingface.co/models). At the same time, each python module defining an architecture can be used as a standalone and modified to enable quick research experiments.
| Section | Description |
|-|-|
| [Installation](#installation) | How to install the package |
| [Overview](#overview) | Overview of the package |
| [Usage](#usage) | Quickstart examples |
| [Doc](#doc) | Detailed documentation |
| [Examples](#examples) | Detailed examples on how to fine-tune Bert |
| [Notebooks](#notebooks) | Introduction on the provided Jupyter Notebooks |
| [TPU](#tup) | Notes on TPU support and pretraining scripts |
| [Command-line interface](#Command-line-interface) | Convert a TensorFlow checkpoint in a PyTorch dump |
🤗 Transformers is backed by the two most popular deep learning libraries, [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/), with a seamless integration between them, allowing you to train your models with one then load it for inference with the other.
You can test most of our models directly on their pages from the [model hub](https://huggingface.co/models). We also offer an [inference API](https://huggingface.co/pricing) to use those models.
Here are a few examples:
- [Masked word completion with BERT](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France)
- [Name Entity Recognition with Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city)
- [Text generation with GPT-2](https://huggingface.co/gpt2?text=A+long+time+ago%2C+)
- [Natural Langugage Inference with RoBERTa](https://huggingface.co/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal)
- [Summarization with BART](https://huggingface.co/facebook/bart-large-cnn?text=The+tower+is+324+metres+%281%2C063+ft%29+tall%2C+about+the+same+height+as+an+81-storey+building%2C+and+the+tallest+structure+in+Paris.+Its+base+is+square%2C+measuring+125+metres+%28410+ft%29+on+each+side.+During+its+construction%2C+the+Eiffel+Tower+surpassed+the+Washington+Monument+to+become+the+tallest+man-made+structure+in+the+world%2C+a+title+it+held+for+41+years+until+the+Chrysler+Building+in+New+York+City+was+finished+in+1930.+It+was+the+first+structure+to+reach+a+height+of+300+metres.+Due+to+the+addition+of+a+broadcasting+aerial+at+the+top+of+the+tower+in+1957%2C+it+is+now+taller+than+the+Chrysler+Building+by+5.2+metres+%2817+ft%29.+Excluding+transmitters%2C+the+Eiffel+Tower+is+the+second+tallest+free-standing+structure+in+France+after+the+Millau+Viaduct)
- [Question answering with DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species)
- [Translation with T5](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin)
**[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team, is the official demo of this repo’s text generation capabilities.
## Quick tour
To immediately use a model on a given text, we provide the `pipeline` API. Pipelines group together a pretrained model with the preprocessing that was used during that model training. Here is how to quickly use a pipeline to classify positive versus negative texts
```python
>>>fromtransformersimportpipeline
# Allocate a pipeline for sentiment-analysis
>>>classifier=pipeline('sentiment-analysis')
>>>classifier('We are very happy to include pipeline into the transformers repository.')
[{'label':'POSITIVE','score':0.9978193640708923}]
```
The second line of code downloads and caches the pretrained model used by the pipeline, the third line evaluates it on the given text. Here the answer is "positive" with a confidence of 99.8%.
This is another example of pipeline used for that can extract question answers from some context:
On top of the answer, the pretrained model used here returned its confidence score, along with the start position and its end position in the tokenized sentence. You can learn more about the tasks supported by the `pipeline` API in [this tutorial](https://huggingface.co/transformers/task_summary.html).
To download and use any of the pretrained models on your given task, you just need to use those three lines of codes (PyTorch version):
```python
>>> from transformers import AutoTokenizer, AutoModel
The tokenizer is responsible for all the preprocessing the pretrained model expects, and can be called directly on one (or list) of texts (as we can see on the fourth line of both code examples). It will output a dictionary you can directly pass to your model (which is done on the fifth line).
The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) or a [TensorFlow `tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) (depending on your backend) which you can use normally. For instance, [this tutorial](https://huggingface.co/transformers/training.html) explains how to integrate such a model in classic PyTorch or TensorFlow training loop, or how to use our `Trainer` API to quickly fine-tune the on a new dataset.
## Why should I use transformers?
1. Easy-to-use state-of-the-art models:
- High performance on NLU and NLG tasks.
- Low barrier to entry for educators and practitioners.
- Few user-facing abstractions with just three classes to learn.
- A unified API for using all our pretrained models.
1. Lower compute costs, smaller carbon footprint:
- Researchers can share trained models instead of always retraining.
- Practitioners can reduce compute time and production costs.
- Dozens of architectures with over 2,000 pretrained models, some in more than 100 languages.
1. Choose the right framework for every part of a model's lifetime:
- Train state-of-the-art models in 3 lines of code.
- Move a single model between TF2.0/PyTorch frameworks at will.
- Seamlessly pick the right framework for training, evaluation, production.
1. Easily customize a model or an example to your needs:
- Examples for each architecture to reproduce the results by the official authors of said architecture.
- Expose the models internal as consistently as possible.
- Model files can be used independently of the library for quick experiments.
## Why shouldn't I use transformers?
- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving in additional abstractions/files.
- The training API is not intended to work on any model but is optimized to work with the models provided by the library. For generic machine learning loops, you should use another library.
- While we strive to present as many use cases as possible, the scripts in our [examples folder](https://github.com/huggingface/transformers/tree/master/examples) are just that: examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs.
## Installation
This repo was tested on Python 3.5+ and PyTorch 0.4.1
### With pip
PyTorch pretrained bert can be installed by pip as follows:
This repository is tested on Python 3.6+, PyTorch 1.0.0+ (PyTorch 1.3.1+ for [examples](https://github.com/huggingface/transformers/tree/master/examples)) and TensorFlow 2.0.
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
First, create a virtual environment with the version of Python you're going to use and activate it.
Then, you will need to install at least one of TensorFlow 2.0, PyTorch or Flax.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available), [PyTorch installation page](https://pytorch.org/get-started/locally/#start-locally) regarding the specific install command for your platform and/or [Flax installation page](https://github.com/google/flax#quick-install).
When TensorFlow 2.0 and/or PyTorch has been installed, 🤗 Transformers can be installed using pip as follows:
```bash
pip install pytorch-pretrained-bert
pip install transformers
```
### From source
If you'd like to play with the examples, you must [install the library from source](https://huggingface.co/transformers/installation.html#installing-from-source).
Clone the repository and run:
```bash
pip install [--editable].
### With conda
Since Transformers version v4.0.0, we now have a conda channel: `huggingface`.
🤗 Transformers can be installed using conda as follows:
```shell script
conda install -c huggingface transformers
```
A series of tests is included in the [tests folder](https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/tests) and can be run using `pytest` (install pytest if needed: `pip install pytest`).
Follow the installation pages of TensorFlow, PyTorch or Flax to see how to install them with conda.
You can run the tests with the command:
```bash
python -m pytest -sv tests/
```
## Models architectures
## Overview
🤗 Transformers currently provides the following architectures (see [here](https://huggingface.co/transformers/model_summary.html) for a high-level summary of each them):
This package comprises the following classes that can be imported in Python and are detailed in the [Doc](#doc) section of this readme:
1. **[ALBERT](https://huggingface.co/transformers/model_doc/albert.html)** (from Google Research and the Toyota Technological Institute at Chicago) released with the paper [ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942), by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
1. **[BART](https://huggingface.co/transformers/model_doc/bart.html)** (from Facebook) released with the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
1. **[BERT](https://huggingface.co/transformers/model_doc/bert.html)** (from Google) released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
1. **[BERT For Sequence Generation](https://huggingface.co/transformers/model_doc/bertgeneration.html)** (from Google) released with the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
1. **[Blenderbot](https://huggingface.co/transformers/model_doc/blenderbot.html)** (from Facebook) released with the paper [Recipes for building an open-domain chatbot](https://arxiv.org/abs/2004.13637) by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
1. **[CamemBERT](https://huggingface.co/transformers/model_doc/camembert.html)** (from Inria/Facebook/Sorbonne) released with the paper [CamemBERT: a Tasty French Language Model](https://arxiv.org/abs/1911.03894) by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
1. **[CTRL](https://huggingface.co/transformers/model_doc/ctrl.html)** (from Salesforce) released with the paper [CTRL: A Conditional Transformer Language Model for Controllable Generation](https://arxiv.org/abs/1909.05858) by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
1. **[DeBERTa](https://huggingface.co/transformers/model_doc/deberta.html)** (from Microsoft Research) released with the paper [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/abs/2006.03654) by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
1. **[DialoGPT](https://huggingface.co/transformers/model_doc/dialogpt.html)** (from Microsoft Research) released with the paper [DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation](https://arxiv.org/abs/1911.00536) by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
1. **[DistilBERT](https://huggingface.co/transformers/model_doc/distilbert.html)** (from HuggingFace), released together with the paper [DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108) by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into [DistilGPT2](https://github.com/huggingface/transformers/tree/master/examples/distillation), RoBERTa into [DistilRoBERTa](https://github.com/huggingface/transformers/tree/master/examples/distillation), Multilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
1. **[DPR](https://huggingface.co/transformers/model_doc/dpr.html)** (from Facebook) released with the paper [Dense Passage Retrieval
for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon
Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
1. **[ELECTRA](https://huggingface.co/transformers/model_doc/electra.html)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[FlauBERT](https://huggingface.co/transformers/model_doc/flaubert.html)** (from CNRS) released with the paper [FlauBERT: Unsupervised Language Model Pre-training for French](https://arxiv.org/abs/1912.05372) by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
1. **[Funnel Transformer](https://huggingface.co/transformers/model_doc/funnel.html)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
1. **[GPT](https://huggingface.co/transformers/model_doc/gpt.html)** (from OpenAI) released with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/) by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
1. **[GPT-2](https://huggingface.co/transformers/model_doc/gpt2.html)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
1. **[LayoutLM](https://huggingface.co/transformers/model_doc/layoutlm.html)** (from Microsoft Research Asia) released with the paper [LayoutLM: Pre-training of Text and Layout for Document Image Understanding](https://arxiv.org/abs/1912.13318) by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
1. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
1. **[LXMERT](https://huggingface.co/transformers/model_doc/lxmert.html)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
1. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
1. **[MBart](https://huggingface.co/transformers/model_doc/mbart.html)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
1. **[MT5](https://huggingface.co/transformers/model_doc/mt5.html)** (from Google AI) released with the paper [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934) by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
1. **[Pegasus](https://huggingface.co/transformers/model_doc/pegasus.html)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
1. **[ProphetNet](https://huggingface.co/transformers/model_doc/prophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
1. **[RoBERTa](https://huggingface.co/transformers/model_doc/roberta.html)** (from Facebook), released together with the paper a [Robustly Optimized BERT Pretraining Approach](https://arxiv.org/abs/1907.11692) by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
ultilingual BERT into [DistilmBERT](https://github.com/huggingface/transformers/tree/master/examples/distillation) and a German version of DistilBERT.
1. **[SqueezeBert](https://huggingface.co/transformers/model_doc/squeezebert.html)** released with the paper [SqueezeBERT: What can computer vision teach NLP about efficient neural networks?](https://arxiv.org/abs/2006.11316) by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
1. **[T5](https://huggingface.co/transformers/model_doc/t5.html)** (from Google AI) released with the paper [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683) by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
1. **[Transformer-XL](https://huggingface.co/transformers/model_doc/transformerxl.html)** (from Google/CMU) released with the paper [Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860) by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
1. **[XLM](https://huggingface.co/transformers/model_doc/xlm.html)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
1. **[XLM-ProphetNet](https://huggingface.co/transformers/model_doc/xlmprophetnet.html)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
1. **[XLNet](https://huggingface.co/transformers/model_doc/xlnet.html)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
1. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
1. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
- Six PyTorch models (`torch.nn.Module`) for Bert with pre-trained weights (in the [`modeling.py`](./pytorch_pretrained_bert/modeling.py) file):
- [`BertModel`](./pytorch_pretrained_bert/modeling.py#L535) - raw BERT Transformer model (**fully pre-trained**),
- [`BertForMaskedLM`](./pytorch_pretrained_bert/modeling.py#L689) - BERT Transformer with the pre-trained masked language modeling head on top (**fully pre-trained**),
- [`BertForNextSentencePrediction`](./pytorch_pretrained_bert/modeling.py#L750) - BERT Transformer with the pre-trained next sentence prediction classifier on top (**fully pre-trained**),
- [`BertForPreTraining`](./pytorch_pretrained_bert/modeling.py#L618) - BERT Transformer with masked language modeling head and next sentence prediction classifier on top (**fully pre-trained**),
- [`BertForSequenceClassification`](./pytorch_pretrained_bert/modeling.py#L812) - BERT Transformer with a sequence classification head on top (BERT Transformer is **pre-trained**, the sequence classification head **is only initialized and has to be trained**),
- [`BertForQuestionAnswering`](./pytorch_pretrained_bert/modeling.py#L877) - BERT Transformer with a token classification head on top (BERT Transformer is **pre-trained**, the token classification head **is only initialized and has to be trained**).
To cehck if each model has an implementation in PyTorch/TensorFlow/Flax or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to [this table](https://huggingface.co/transformers/index.html#bigtable)
- Three tokenizers (in the [`tokenization.py`](./pytorch_pretrained_bert/tokenization.py) file):
-`BertTokenizer` - perform end-to-end tokenization, i.e. basic tokenization followed by WordPiece tokenization.
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations. You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
- One optimizer (in the [`optimization.py`](./pytorch_pretrained_bert/optimization.py) file):
-`BertAdam` - Bert version of Adam algorithm with weight decay fix, warmup and linear decay of the learning rate.
- A configuration class (in the [`modeling.py`](./pytorch_pretrained_bert/modeling.py) file):
-`BertConfig` - Configuration class to store the configuration of a `BertModel` with utilisities to read and write from JSON configuration files.
## Learn more
The repository further comprises:
- Three examples on how to use Bert (in the [`examples` folder](./examples)):
- [`extract_features.py`](./examples/extract_features.py) - Show how to extract hidden states from an instance of `BertModel`,
- [`run_classifier.py`](./examples/run_classifier.py) - Show how to fine-tune an instance of `BertForSequenceClassification` on GLUE's MRPC task,
- [`run_squad.py`](./examples/run_squad.py) - Show how to fine-tune an instance of `BertForQuestionAnswering` on SQuAD v1.0 task.
These examples are detailed in the [Examples](#examples) section of this readme.
- Three notebooks that were used to check that the TensorFlow and PyTorch models behave identically (in the [`notebooks` folder](./notebooks)):
- [`Comparing-TF-and-PT-models.ipynb`](./notebooks/Comparing-TF-and-PT-models.ipynb) - Compare the hidden states predicted by `BertModel`,
- [`Comparing-TF-and-PT-models-SQuAD.ipynb`](./notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb) - Compare the spans predicted by `BertForQuestionAnswering` instances,
- [`Comparing-TF-and-PT-models-MLM-NSP.ipynb`](./notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb) - Compare the predictions of the `BertForPretraining` instances.
These notebooks are detailed in the [Notebooks](#notebooks) section of this readme.
- A command-line interface to convert any TensorFlow checkpoint in a PyTorch dump:
This CLI is detailed in the [Command-line interface](#Command-line-interface) section of this readme.
## Usage
Here is a quick-start example using `BertTokenizer`, `BertModel` and `BertForMaskedLM` class with Google AI's pre-trained `Bert base uncased` model. See the [doc section](#doc) below for all the details on these classes.
First let's prepare a tokenized input with `BertTokenizer`
Here is a detailed documentation of the classes in the package and how to use them:
| Sub-section | Description |
| Section | Description |
|-|-|
| [Loading Google AI's pre-trained weigths](#Loading-Google-AIs-pre-trained-weigths-and-PyTorch-dump) | How to load Google AI's pre-trained weight or a PyTorch saved instance |
| [PyTorch models](#PyTorch-models) | API of the six PyTorch model classes: `BertModel`, `BertForMaskedLM`, `BertForNextSentencePrediction`, `BertForPreTraining`, `BertForSequenceClassification` or `BertForQuestionAnswering` |
| [Tokenizer: `BertTokenizer`](#Tokenizer-BertTokenizer) | API of the `BertTokenizer` class|
| [Optimizer: `BertAdam`](#Optimizer-BertAdam) | API of the `BertAdam` class |
| [Documentation](https://huggingface.co/transformers/) | Full API documentation and tutorials |
| [Task summary](https://huggingface.co/transformers/task_summary.html) | Tasks supported by 🤗 Transformers |
| [Preprocessing tutorial](https://huggingface.co/transformers/preprocessing.html) | Using the `Tokenizer` class to prepare data for the models |
| [Training and fine-tuning](https://huggingface.co/transformers/training.html) | Using the models provided by 🤗 Transformers in a PyTorch/TensorFlow training loop and the `Trainer` API |
| [Quick tour: Fine-tuning/usage scripts](https://github.com/huggingface/transformers/tree/master/examples) | Example scripts for fine-tuning models on a wide range of tasks |
| [Model sharing and uploading](https://huggingface.co/transformers/model_sharing.html) | Upload and share your fine-tuned models with the community |
| [Migration](https://huggingface.co/transformers/migration.html) | Migrate to 🤗 Transformers from `pytorch-transformers` or `pytorch-pretrained-bert` |
### Loading Google AI's pre-trained weigths and PyTorch dump
## Citation
To load one of Google AI's pre-trained models or a PyTorch saved model (an instance of `BertForPreTraining` saved with `torch.save()`), the PyTorch model classes and the tokenizer can be instantiated as
We now have a [paper](https://arxiv.org/abs/1910.03771) you can cite for the 🤗 Transformers library:
```bibtex
@article{Wolf2019HuggingFacesTS,
title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},
author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush},
journal={ArXiv},
year={2019},
volume={abs/1910.03771}
}
```
where
-`BERT_CLASS` is either the `BertTokenizer` class (to load the vocabulary) or one of the six PyTorch model classes (to load the pre-trained weights): `BertModel`, `BertForMaskedLM`, `BertForNextSentencePrediction`, `BertForPreTraining`, `BertForSequenceClassification` or `BertForQuestionAnswering`, and
-`PRE_TRAINED_MODEL_NAME_OR_PATH` is either:
- the shortcut name of a Google AI's pre-trained model selected in the list:
-`bert-base-chinese`: Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters
- a path or url to a pretrained model archive containing:
-`bert_config.json` a configuration file for the model, and
-`pytorch_model.bin` a PyTorch dump of a pre-trained instance `BertForPreTraining` (saved with the usual `torch.save()`)
If `PRE_TRAINED_MODEL_NAME_OR_PATH` is a shortcut name, the pre-trained weights will be downloaded from AWS S3 (see the links [here](pytorch_pretrained_bert/modeling.py)) and stored in a cache folder to avoid future download (the cache folder can be found at `~/.pytorch_pretrained_bert/`).
-`cache_dir` can be an optional path to a specific directory to download and cache the pre-trained model weights. This option is useful in particular when you are using distributed training: to avoid concurrent access to the same weights you can set for example `cache_dir='./pretrained_model_{}'.format(args.local_rank)` (see the section on distributed training for more information)
`BertModel` is the basic BERT Transformer model with a layer of summed token, position and sequence embeddings followed by a series of identical self-attention blocks (12 for BERT-base, 24 for BERT-large).
The inputs and output are **identical to the TensorFlow model inputs and outputs**.
We detail them here. This model takes as *inputs*:
-`input_ids`: a torch.LongTensor of shape [batch_size, sequence_length] with the word token indices in the vocabulary (see the tokens preprocessing logic in the scripts `extract_features.py`, `run_classifier.py` and `run_squad.py`), and
-`token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type 1 corresponds to a `sentence B` token (see BERT paper for more details).
-`attention_mask`: an optional torch.LongTensor of shape [batch_size, sequence_length] with indices selected in [0, 1]. It's a mask to be used if some input sequence lengths are smaller than the max input sequence length of the current batch. It's the mask that we typically use for attention when a batch has varying length sentences.
-`output_all_encoded_layers`: boolean which controls the content of the `encoded_layers` output as described below. Default: `True`.
This model *outputs* a tuple composed of:
-`encoded_layers`: controled by the value of the `output_encoded_layers` argument:
-`output_all_encoded_layers=True`: outputs a list of the encoded-hidden-states at the end of each attention block (i.e. 12 full sequences for BERT-base, 24 for BERT-large), each encoded-hidden-state is a torch.FloatTensor of size [batch_size, sequence_length, hidden_size],
-`output_all_encoded_layers=False`: outputs only the encoded-hidden-states corresponding to the last attention block, i.e. a single torch.FloatTensor of size [batch_size, sequence_length, hidden_size],
-`pooled_output`: a torch.FloatTensor of size [batch_size, hidden_size] which is the output of a classifier pretrained on top of the hidden state associated to the first character of the input (`CLF`) to train on the Next-Sentence task (see BERT's paper).
An example on how to use this class is given in the `extract_features.py` script which can be used to extract the hidden states of the model for a given input.
#### 2. `BertForPreTraining`
`BertForPreTraining` includes the `BertModel` Transformer followed by the two pre-training heads:
- the masked language modeling head, and
- the next sentence classification head.
*Inputs* comprises the inputs of the [`BertModel`](#-1.-`BertModel`) class plus two optional labels:
-`masked_lm_labels`: masked language modeling labels: torch.LongTensor of shape [batch_size, sequence_length] with indices selected in [-1, 0, ..., vocab_size]. All labels set to -1 are ignored (masked), the loss is only computed for the labels set in [0, ..., vocab_size]
-`next_sentence_label`: next sentence classification loss: torch.LongTensor of shape [batch_size] with indices selected in [0, 1]. 0 => next sentence is the continuation, 1 => next sentence is a random sentence.
*Outputs*:
- if `masked_lm_labels` and `next_sentence_label` are not `None`: Outputs the total_loss which is the sum of the masked language modeling loss and the next sentence classification loss.
- if `masked_lm_labels` or `next_sentence_label` is `None`: Outputs a tuple comprising
- the masked language modeling logits, and
- the next sentence classification logits.
#### 3. `BertForMaskedLM`
`BertForMaskedLM` includes the `BertModel` Transformer followed by the (possibly) pre-trained masked language modeling head.
*Inputs* comprises the inputs of the [`BertModel`](#-1.-`BertModel`) class plus optional label:
-`masked_lm_labels`: masked language modeling labels: torch.LongTensor of shape [batch_size, sequence_length] with indices selected in [-1, 0, ..., vocab_size]. All labels set to -1 are ignored (masked), the loss is only computed for the labels set in [0, ..., vocab_size]
*Outputs*:
- if `masked_lm_labels` is not `None`: Outputs the masked language modeling loss.
- if `masked_lm_labels` is `None`: Outputs the masked language modeling logits.
#### 4. `BertForNextSentencePrediction`
`BertForNextSentencePrediction` includes the `BertModel` Transformer followed by the next sentence classification head.
*Inputs* comprises the inputs of the [`BertModel`](#-1.-`BertModel`) class plus an optional label:
-`next_sentence_label`: next sentence classification loss: torch.LongTensor of shape [batch_size] with indices selected in [0, 1]. 0 => next sentence is the continuation, 1 => next sentence is a random sentence.
*Outputs*:
- if `next_sentence_label` is not `None`: Outputs the next sentence classification loss.
- if `next_sentence_label` is `None`: Outputs the next sentence classification logits.
#### 5. `BertForSequenceClassification`
`BertForSequenceClassification` is a fine-tuning model that includes `BertModel` and a sequence-level (sequence or pair of sequences) classifier on top of the `BertModel`.
The sequence-level classifier is a linear layer that takes as input the last hidden state of the first character in the input sequence (see Figures 3a and 3b in the BERT paper).
An example on how to use this class is given in the `run_classifier.py` script which can be used to fine-tune a single sequence (or pair of sequence) classifier using BERT, for example for the MRPC task.
#### 6. `BertForQuestionAnswering`
`BertForQuestionAnswering` is a fine-tuning model that includes `BertModel` with a token-level classifiers on top of the full sequence of last hidden states.
The token-level classifier takes as input the full sequence of the last hidden state and compute several (e.g. two) scores for each tokens that can for example respectively be the score that a given token is a `start_span` and a `end_span` token (see Figures 3c and 3d in the BERT paper).
An example on how to use this class is given in the `run_squad.py` script which can be used to fine-tune a token classifier using BERT, for example for the SQuAD task.
### Tokenizer: `BertTokenizer`
`BertTokenizer` perform end-to-end tokenization, i.e. basic tokenization followed by WordPiece tokenization.
This class has two arguments:
-`vocab_file`: path to a vocabulary file.
-`do_lower_case`: convert text to lower-case while tokenizing. **Default = True**.
and three methods:
-`tokenize(text)`: convert a `str` in a list of `str` tokens by (1) performing basic tokenization and (2) WordPiece tokenization.
-`convert_tokens_to_ids(tokens)`: convert a list of `str` tokens in a list of `int` indices in the vocabulary.
-`convert_ids_to_tokens(tokens)`: convert a list of `int` indices in a list of `str` tokens in the vocabulary.
Please refer to the doc strings and code in [`tokenization.py`](./pytorch_pretrained_bert/tokenization.py) for the details of the `BasicTokenizer` and `WordpieceTokenizer` classes. In general it is recommended to use `BertTokenizer` unless you know what you are doing.
### Optimizer: `BertAdam`
`BertAdam` is a `torch.optimizer` adapted to be closer to the optimizer used in the TensorFlow implementation of Bert. The differences with PyTorch Adam optimizer are the following:
- BertAdam implements weight decay fix,
- BertAdam doesn't compensate for bias as in the regular Adam optimizer.
The optimizer accepts the following arguments:
-`lr` : learning rate
-`warmup` : portion of `t_total` for the warmup, `-1` means no warmup. Default : `-1`
-`t_total` : total number of training steps for the learning
rate schedule, `-1` means constant learning rate. Default : `-1`
-`schedule` : schedule to use for the warmup (see above). Default : `'warmup_linear'`
-`max_grad_norm` : Maximum norm for the gradients (`-1` means no clipping). Default : `1.0`
## Examples
| Sub-section | Description |
|-|-|
| [Training large models: introduction, tools and examples](#Training-large-models-introduction,-tools-and-examples) | How to use gradient-accumulation, multi-gpu training, distributed training, optimize on CPU and 16-bits training to train Bert models |
| [Fine-tuning with BERT: running the examples](#Fine-tuning-with-BERT-running-the-examples) | Running the examples in [`./examples`](./examples/): `extract_classif.py`, `run_classifier.py` and `run_squad.py` |
| [Fine-tuning BERT-large on GPUs](#Fine-tuning-BERT-large-on-GPUs) | How to fine tune `BERT large`|
### Training large models: introduction, tools and examples
BERT-base and BERT-large are respectively 110M and 340M parameters models and it can be difficult to fine-tune them on a single GPU with the recommended batch size for good performance (in most case a batch size of 32).
To help with fine-tuning these models, we have included five techniques that you can activate in the fine-tuning scripts `run_classifier.py` and `run_squad.py`: gradient-accumulation, multi-gpu training, distributed training, optimize on CPU and 16-bits training . For more details on how to use these techniques you can read [the tips on training large batches in PyTorch](https://medium.com/huggingface/training-larger-batches-practical-tips-on-1-gpu-multi-gpu-distributed-setups-ec88c3e51255) that I published earlier this month.
Here is how to use these techniques in our scripts:
- **Gradient Accumulation**: Gradient accumulation can be used by supplying a integer greater than 1 to the `--gradient_accumulation_steps` argument. The batch at each step will be divided by this integer and gradient will be accumulated over `gradient_accumulation_steps` steps.
- **Multi-GPU**: Multi-GPU is automatically activated when several GPUs are detected and the batches are splitted over the GPUs.
- **Distributed training**: Distributed training can be activated by supplying an integer greater or equal to 0 to the `--local_rank` argument (see below).
- **Optimize on CPU**: The Adam optimizer stores 2 moving average of the weights of the model. If you keep them on GPU 1 (typical behavior), your first GPU will have to store 3-times the size of the model. This is not optimal for large models like `BERT-large` and means your batch size is a lot lower than it could be. This option will perform the optimization and store the averages on the CPU/RAM to free more room on the GPU(s). As the most computational intensive operation is usually the backward pass, this doesn't have a significant impact on the training time. Activate this option with `--optimize_on_cpu` on the `run_squad.py` script.
- **16-bits training**: 16-bits training, also called mixed-precision training, can reduce the memory requirement of your model on the GPU by using half-precision training, basically allowing to double the batch size. If you have a recent GPU (starting from NVIDIA Volta architecture) you should see no decrease in speed. A good introduction to Mixed precision training can be found [here](https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/) and a full documentation is [here](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html). In our scripts, this option can be activated by setting the `--fp16` flag and you can play with loss scaling using the `--loss_scaling` flag (see the previously linked documentation for details on loss scaling). If the loss scaling is too high (`Nan` in the gradients) it will be automatically scaled down until the value is acceptable. The default loss scaling is 128 which behaved nicely in our tests.
Note: To use *Distributed Training*, you will need to run one training script on each of your machines. This can be done for example by running the following command on each server (see [the above mentioned blog post]((https://medium.com/huggingface/training-larger-batches-practical-tips-on-1-gpu-multi-gpu-distributed-setups-ec88c3e51255)) for more details):
```bash
python -m torch.distributed.launch --nproc_per_node=4 --nnodes=2 --node_rank=$THIS_MACHINE_INDEX --master_addr="192.168.1.1" --master_port=1234 run_classifier.py (--arg1 --arg2 --arg3 and all other arguments of the run_classifier script)
```
Where `$THIS_MACHINE_INDEX` is an sequential index assigned to each of your machine (0, 1, 2...) and the machine with rank 0 has an IP address `192.168.1.1` and an open port `1234`.
### Fine-tuning with BERT: running the examples
We showcase the same examples as [the original implementation](https://github.com/google-research/bert/): fine-tuning a sequence-level classifier on the MRPC classification corpus and a token-level classifier on the question answering dataset SQuAD.
Before running these examples you should download the
[GLUE data](https://gluebenchmark.com/tasks) by running
and unpack it to some directory `$GLUE_DIR`. Please also download the `BERT-Base`
checkpoint, unzip it to some directory `$BERT_BASE_DIR`, and convert it to its PyTorch version as explained in the previous section.
This example code fine-tunes `BERT-Base` on the Microsoft Research Paraphrase
Corpus (MRPC) corpus and runs in less than 10 minutes on a single K-80.
```shell
exportGLUE_DIR=/path/to/glue
python run_classifier.py \
--task_name MRPC \
--do_train \
--do_eval \
--data_dir $GLUE_DIR/MRPC/ \
--bert_model bert-base-uncased \
--max_seq_length 128\
--train_batch_size 32\
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--output_dir /tmp/mrpc_output/
```
Our test ran on a few seeds with [the original implementation hyper-parameters](https://github.com/google-research/bert#sentence-and-sentence-pair-classification-tasks) gave evaluation results between 84% and 88%.
The second example fine-tunes `BERT-Base` on the SQuAD question answering task.
The data for SQuAD can be downloaded with the following links and should be saved in a `$SQUAD_DIR` directory.
The options we list above allow to fine-tune BERT-large rather easily on GPU(s) instead of the TPU used by the original implementation.
For example, fine-tuning BERT-large on SQuAD can be done on a server with 4 k-80 (these are pretty old now) in 18 hours. Our results are similar to the TensorFlow implementation results (actually slightly higher):
We include [three Jupyter Notebooks](https://github.com/huggingface/pytorch-pretrained-BERT/tree/master/notebooks) that can be used to check that the predictions of the PyTorch model are identical to the predictions of the original TensorFlow model.
- The first NoteBook ([Comparing-TF-and-PT-models.ipynb](./notebooks/Comparing-TF-and-PT-models.ipynb)) extracts the hidden states of a full sequence on each layers of the TensorFlow and the PyTorch models and computes the standard deviation between them. In the given example, we get a standard deviation of 1.5e-7 to 9e-7 on the various hidden state of the models.
- The second NoteBook ([Comparing-TF-and-PT-models-SQuAD.ipynb](./notebooks/Comparing-TF-and-PT-models-SQuAD.ipynb)) compares the loss computed by the TensorFlow and the PyTorch models for identical initialization of the fine-tuning layer of the `BertForQuestionAnswering` and computes the standard deviation between them. In the given example, we get a standard deviation of 2.5e-7 between the models.
- The third NoteBook ([Comparing-TF-and-PT-models-MLM-NSP.ipynb](./notebooks/Comparing-TF-and-PT-models-MLM-NSP.ipynb)) compares the predictions computed by the TensorFlow and the PyTorch models for masked token language modeling using the pre-trained masked language modeling model.
Please follow the instructions given in the notebooks to run and modify them.
## Command-line interface
A command-line interface is provided to convert a TensorFlow checkpoint in a PyTorch dump of the `BertForPreTraining` class (see above).
You can convert any TensorFlow checkpoint for BERT (in particular [the pre-trained models released by Google](https://github.com/google-research/bert#pre-trained-models)) in a PyTorch save file by using the [`./pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py`](convert_tf_checkpoint_to_pytorch.py) script.
This CLI takes as input a TensorFlow checkpoint (three files starting with `bert_model.ckpt`) and the associated configuration file (`bert_config.json`), and creates a PyTorch model for this configuration, loads the weights from the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can be imported using `torch.load()` (see examples in `extract_features.py`, `run_classifier.py` and `run_squad.py`).
You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow checkpoint (the three files starting with `bert_model.ckpt`) but be sure to keep the configuration file (`bert_config.json`) and the vocabulary file (`vocab.txt`) as these are needed for the PyTorch model too.
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (`pip install tensorflow`). The rest of the repository only requires PyTorch.
Here is an example of the conversion process for a pre-trained `BERT-Base Uncased` model:
You can download Google's pre-trained models for the conversion [here](https://github.com/google-research/bert#pre-trained-models).
## TPU
TPU support and pretraining scripts
TPU are not supported by the current stable release of PyTorch (0.4.1). However, the next version of PyTorch (v1.0) should support training on TPU and is expected to be released soon (see the recent [official announcement](https://cloud.google.com/blog/products/ai-machine-learning/introducing-pytorch-across-google-cloud)).
We will add TPU support when this next release is published.
The original TensorFlow code further comprises two scripts for pre-training BERT: [create_pretraining_data.py](https://github.com/google-research/bert/blob/master/create_pretraining_data.py) and [run_pretraining.py](https://github.com/google-research/bert/blob/master/run_pretraining.py).
Since, pre-training BERT is a particularly expensive operation that basically requires one or several TPUs to be completed in a reasonable amout of time (see details [here](https://github.com/google-research/bert#pre-training-with-bert)) we have decided to wait for the inclusion of TPU support in PyTorch to convert these pre-training scripts.
🤗 Transformers is tested on Python 3.6+, and PyTorch 1.1.0+ or TensorFlow 2.0+.
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're
unfamiliar with Python virtual environments, check out the [user guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). Create a virtual environment with the version of Python you're going
to use and activate it.
Now, if you want to use 🤗 Transformers, you can install it with pip. If you'd like to play with the examples, you
must install it from source.
## Installation with pip
First you need to install one of, or both, TensorFlow 2.0 and PyTorch.
Please refer to [TensorFlow installation page](https://www.tensorflow.org/install/pip#tensorflow-2.0-rc-is-available),
A couple of changes were introduced when the switch from version 3 to version 4 was done. Below is a summary of the
expected changes:
#### 1. AutoTokenizers and pipelines now use fast (rust) tokenizers by default.
The python and rust tokenizers have roughly the same API, but the rust tokenizers have a more complete feature set.
This introduces two breaking changes:
- The handling of overflowing tokens between the python and rust tokenizers is different.
- The rust tokenizers do not accept integers in the encoding methods.
##### How to obtain the same behavior as v3.x in v4.x
- The pipelines now contain additional features out of the box. See the [token-classification pipeline with the `grouped_entities` flag](https://huggingface.co/transformers/main_classes/pipelines.html?highlight=textclassification#tokenclassificationpipeline).
- The auto-tokenizers now return rust tokenizers. In order to obtain the python tokenizers instead, the user may use the `use_fast` flag by setting it to `False`:
#### 2. SentencePiece is removed from the required dependencies
The requirement on the SentencePiece dependency has been lifted from the `setup.py`. This is done so that we may have a channel on anaconda cloud without relying on `conda-forge`. This means that the tokenizers that depend on the SentencePiece library will not be available with a standard `transformers` installation.
This includes the **slow** versions of:
-`XLNetTokenizer`
-`AlbertTokenizer`
-`CamembertTokenizer`
-`MBartTokenizer`
-`PegasusTokenizer`
-`T5Tokenizer`
-`ReformerTokenizer`
-`XLMRobertaTokenizer`
##### How to obtain the same behavior as v3.x in v4.x
In order to obtain the same behavior as version `v3.x`, you should install `sentencepiece` additionally:
In version `v3.x`:
```bash
pip install transformers
```
to obtain the same in version `v4.x`:
```bash
pip install transformers[sentencepiece]
```
or
```bash
pip install transformers sentencepiece
```
#### 3. The architecture of the repo has been updated so that each model resides in its folder
The past and foreseeable addition of new models means that the number of files in the directory `src/transformers` keeps growing and becomes harder to navigate and understand. We made the choice to put each model and the files accompanying it in their own sub-directories.
This is a breaking change as importing intermediary layers using a model's module directly needs to be done via a different path.
##### How to obtain the same behavior as v3.x in v4.x
In order to obtain the same behavior as version `v3.x`, you should update the path used to access the layers.
In version `v3.x`:
```bash
from transformers.modeling_bert import BertLayer
```
to obtain the same in version `v4.x`:
```bash
from transformers.models.bert.modeling_bert import BertLayer
```
#### 4. Switching the `return_dict` argument to `True` by default
The [`return_dict` argument](https://huggingface.co/transformers/main_classes/output.html) enables the return of dict-like python objects containing the model outputs, instead of the standard tuples. This object is self-documented as keys can be used to retrieve values, while also behaving as a tuple as users may retrieve objects by index or by slice.
This is a breaking change as the limitation of that tuple is that it cannot be unpacked: `value0, value1 = outputs` will not work.
##### How to obtain the same behavior as v3.x in v4.x
In order to obtain the same behavior as version `v3.x`, you should specify the `return_dict` argument to `False`, either in the model configuration or during the forward pass.
Attributes that were deprecated have been removed if they had been deprecated for at least a month. The full list of deprecated attributes can be found in [#8604](https://github.com/huggingface/transformers/pull/8604).
Here is a list of these attributes/methods/arguments and what their replacements should be:
In several models, the labels become consistent with the other models:
-`masked_lm_labels` becomes `labels` in `AlbertForMaskedLM` and `AlbertForPreTraining`.
-`masked_lm_labels` becomes `labels` in `BertForMaskedLM` and `BertForPreTraining`.
-`masked_lm_labels` becomes `labels` in `DistilBertForMaskedLM`.
-`masked_lm_labels` becomes `labels` in `ElectraForMaskedLM`.
-`masked_lm_labels` becomes `labels` in `LongformerForMaskedLM`.
-`masked_lm_labels` becomes `labels` in `MobileBertForMaskedLM`.
-`masked_lm_labels` becomes `labels` in `RobertaForMaskedLM`.
-`lm_labels` becomes `labels` in `BartForConditionalGeneration`.
-`lm_labels` becomes `labels` in `GPT2DoubleHeadsModel`.
-`lm_labels` becomes `labels` in `OpenAIGPTDoubleHeadsModel`.
-`lm_labels` becomes `labels` in `T5ForConditionalGeneration`.
In several models, the caching mechanism becomes consistent with the other models:
-`decoder_cached_states` becomes `past_key_values` in all BART-like, FSMT and T5 models.
-`decoder_past_key_values` becomes `past_key_values` in all BART-like, FSMT and T5 models.
-`past` becomes `past_key_values` in all CTRL models.
-`past` becomes `past_key_values` in all GPT-2 models.
Regarding the tokenizer classes:
- The tokenizer attribute `max_len` becomes `model_max_length`.
- The tokenizer attribute `return_lengths` becomes `return_length`.
- The tokenizer encoding argument `is_pretokenized` becomes `is_split_into_words`.
Regarding the `Trainer` class:
- The `Trainer` argument `tb_writer` is removed in favor of the callback `TensorBoardCallback(tb_writer=...)`.
- The `Trainer` argument `prediction_loss_only` is removed in favor of the class argument `args.prediction_loss_only`.
- The `Trainer` attribute `data_collator` should be a callable.
- The `Trainer` method `_log` is deprecated in favor of `log`.
- The `Trainer` method `_training_step` is deprecated in favor of `training_step`.
- The `Trainer` method `_prediction_loop` is deprecated in favor of `prediction_loop`.
- The `Trainer` method `is_local_master` is deprecated in favor of `is_local_process_zero`.
- The `Trainer` method `is_world_master` is deprecated in favor of `is_world_process_zero`.
Regarding the `TFTrainer` class:
- The `TFTrainer` argument `prediction_loss_only` is removed in favor of the class argument `args.prediction_loss_only`.
- The `Trainer` method `_log` is deprecated in favor of `log`.
- The `TFTrainer` method `_prediction_loop` is deprecated in favor of `prediction_loop`.
- The `TFTrainer` method `_setup_wandb` is deprecated in favor of `setup_wandb`.
- The `TFTrainer` method `_run_model` is deprecated in favor of `run_model`.
Regarding the `TrainerArgument` class:
- The `TrainerArgument` argument `evaluate_during_training` is deprecated in favor of `evaluation_strategy`.
Regarding the Transfo-XL model:
- The Transfo-XL configuration attribute `tie_weight` becomes `tie_words_embeddings`.
- The Transfo-XL modeling method `reset_length` becomes `reset_memory_length`.
Regarding pipelines:
- The `FillMaskPipeline` argument `topk` becomes `top_k`.
## Migrating from pytorch-transformers to 🤗 Transformers
Here is a quick summary of what you should take care of when migrating from `pytorch-transformers` to 🤗 Transformers.
### Positional order of some models' keywords inputs (`attention_mask`, `token_type_ids`...) changed
To be able to use Torchscript (see #1010, #1204 and #1195) the specific order of some models **keywords inputs** (`attention_mask`, `token_type_ids`...) has been changed.
If you used to call the models with keyword names for keyword arguments, e.g. `model(inputs_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)`, this should not cause any change.
If you used to call the models with positional inputs for keyword arguments, e.g. `model(inputs_ids, attention_mask, token_type_ids)`, you may have to double check the exact order of input arguments.
## Migrating from pytorch-pretrained-bert
Here is a quick summary of what you should take care of when migrating from `pytorch-pretrained-bert` to 🤗 Transformers
### Models always output `tuples`
The main breaking change when migrating from `pytorch-pretrained-bert` to 🤗 Transformers is that the models forward method always outputs a `tuple` with various elements depending on the model and the configuration parameters.
The exact content of the tuples for each model are detailed in the models' docstrings and the [documentation](https://huggingface.co/transformers/).
In pretty much every case, you will be fine by taking the first element of the output as the output you previously used in `pytorch-pretrained-bert`.
Here is a `pytorch-pretrained-bert` to 🤗 Transformers conversion example for a `BertForSequenceClassification` classification model:
1. Models are now set in evaluation mode by default when instantiated with the `from_pretrained()` method. To train them don't forget to set them back in training mode (`model.train()`) to activate the dropout modules.
2. The additional `*inputs` and `**kwargs` arguments supplied to the `from_pretrained()` method used to be directly passed to the underlying model's class `__init__()` method. They are now used to update the model configuration attribute first which can break derived model classes build based on the previous `BertForSequenceClassification` examples. More precisely, the positional arguments `*inputs` provided to `from_pretrained()` are directly forwarded the model `__init__()` method while the keyword arguments `**kwargs` (i) which match configuration class attributes are used to update said attributes (ii) which don't match any configuration class attributes are forwarded to the model `__init__()` method.
Also, while not a breaking change, the serialization methods have been standardized and you probably should switch to the new method `save_pretrained(save_directory)` if you were using any other serialization method before.
### Optimizers: BertAdam & OpenAIAdam are now AdamW, schedules are standard PyTorch schedules
The two optimizers previously included, `BertAdam` and `OpenAIAdam`, have been replaced by a single `AdamW` optimizer which has a few differences:
- it only implements weights decay correction,
- schedules are now externals (see below),
- gradient clipping is now also external (see below).
The new optimizer `AdamW` matches PyTorch `Adam` optimizer API and let you use standard PyTorch or apex methods for the schedule and clipping.
The schedules are now standard [PyTorch learning rate schedulers](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) and not part of the optimizer anymore.
Here is a conversion examples from `BertAdam` with a linear warmup and decay schedule to `AdamW` and the same schedule:
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.